Dynamo provides deployment and orchestration for inference workloads. This tutorial shows you how to deploy Dynamo on CKS for cluster-wide inference, install its custom resources and platform components, and run an inference model. In this tutorial, you will:Documentation Index
Fetch the complete documentation index at: https://docs.coreweave.com/llms.txt
Use this file to discover all available pages before exploring further.
- Install Dynamo CRDs.
- Install the Dynamo platform with the Kubernetes AI (Kai) scheduler and Grove enabled.
- Deploy an inference model using an example from the Dynamo repository and a Hugging Face token.
- List and delete deployments when you are done.
What you'll need
Before you start, you must have:
- A CKS cluster with GPU nodes
kubectlinstalled and configured to access your clusterhelminstalled- A Hugging Face access token for model access
What you'll use
You’ll use these tools and components:
- Dynamo: Cluster-wide inference orchestration from the Dynamo repository
- Helm: To install Dynamo CRDs and platform charts from NVIDIA NGC
- Kubernetes AI (Kai) scheduler and Grove: Enabled in the platform install for scheduling and routing
1. Set environment
Set the namespace where Dynamo will be installed and the Dynamo release version.echo $NAMESPACE and echo $RELEASE_VERSION if needed.
2. Install CRDs
Skip this step if Dynamo CRDs are already installed on the cluster.-
Fetch and install the Dynamo CRDs chart:
-
When the install completes without error, the CRDs are installed. Optionally verify with:
3. Install platform
Install the Dynamo platform into the chosen namespace with the Kubernetes AI (Kai) scheduler and Grove enabled.-
Fetch the Dynamo platform chart:
-
Install the Dynamo platform:
You should see output similar to the following:
-
When the install completes, Dynamo platform components run in
${NAMESPACE}. Verify with:Ensure the expected pods are running before you deploy a model.
4. Deploy an inference model
This tutorial uses theQwen3-0.6B model, which you deploy using the agg.yaml from the Dynamo repo.
-
Download the
agg.yamland modify the image value by changingmy-tagto0.9.1on lines 16 and 30: -
Create a Kubernetes secret with your Hugging Face token:
Replace
INSERT-TOKEN-HEREwith your Hugging Face token. -
Apply the
agg.yamlto your cluster:More example models
More example models and manifests are available in the Dynamo repository. If you use one of these example models, be sure to create a Kubernetes secret for your Hugging Face token (same as step 2 above). If you’re deploying an example model instead of theagg.yaml, you might need to configure storage:-
Configure storage for the example. In your chosen example, edit
model-cache/cache.yamland replace every instance ofyour-storage-classwithshared-vastso the model cache uses CoreWeave storage. -
Apply the model download manifest to download the model data:
-
Configure storage for the example. In your chosen example, edit
5. Run inference
In another terminal, forward the vllm-agg-frontend service:choices array containing the model reply.
6. Clean up
To list Dynamo graph deployments in your namespace, run:vllm-agg-router: