- Install Dynamo CRDs.
- Install the Dynamo platform with the Kubernetes AI (Kai) scheduler and Grove enabled.
- Deploy an inference model using an example from the Dynamo repository and a Hugging Face token.
- List and delete deployments when you’re done.
What you'll need
Before you start, you must have:
- A CKS cluster with GPU Nodes.
kubectlinstalled and configured to access your cluster.helminstalled.- A Hugging Face access token for model access.
What you'll use
You use these tools and components:
- Dynamo: Cluster-wide inference orchestration from the Dynamo repository.
- Helm: To install Dynamo CRDs and platform charts from NVIDIA NGC.
- Kubernetes AI (Kai) scheduler and Grove: Enabled in the platform install for scheduling and routing.
Set environment
Set the namespace where Dynamo is installed and the Dynamo release version. The remaining steps reference these variables, so setting them once keeps the commands consistent.echo $NAMESPACE and echo $RELEASE_VERSION if needed.
Install CRDs
Dynamo requires custom resource definitions (CRDs) on the cluster before the platform components can run. Skip this step if Dynamo CRDs are already installed on the cluster.-
Fetch and install the Dynamo CRDs chart:
-
When the install completes without error, the CRDs are installed. Optionally verify with:
Install the platform
Install the Dynamo platform into the chosen namespace with the Kubernetes AI (Kai) scheduler and Grove enabled.-
Fetch the Dynamo platform chart:
-
Install the Dynamo platform:
The output is similar to the following:
-
When the install completes, Dynamo platform components run in
${NAMESPACE}. Verify with:Ensure the expected Pods are running before you deploy a model.
Deploy an inference model
With the platform running, you can now deploy a model for inference. This tutorial uses theQwen3-0.6B model, which you deploy using the agg.yaml from the Dynamo repository.
-
Download the
agg.yamland modify the image value by changingmy-tagto0.9.1on lines 16 and 30: -
Create a Kubernetes Secret with your Hugging Face token. Replace
[HF-TOKEN]with your Hugging Face token: -
Apply the
agg.yamlto your cluster:More example models More example models and manifests are available in the Dynamo repository. If you use one of these example models, create a Kubernetes Secret for your Hugging Face token (same as the preceding step). If you’re deploying an example model instead of theagg.yaml, you might need to configure storage:-
Configure storage for the example. In your chosen example, edit
model-cache/cache.yamland replace every instance ofyour-storage-classwithshared-vastso the model cache uses CoreWeave storage. -
Apply the model download manifest to download the model data:
-
Configure storage for the example. In your chosen example, edit
Run inference
With the model deployed, you can send inference requests to it from your local machine. In another terminal, forward thevllm-agg-frontend service:
choices array containing the model reply.
Clean up
When you no longer need the deployment, remove it to free cluster resources. To list Dynamo graph deployments in your namespace, run:vllm-agg-router: