Deploy NVIDIA Dynamo on CKS

Dynamo provides deployment and orchestration for inference workloads. This tutorial shows you how to deploy Dynamo on CKS for cluster-wide inference, install its custom resources and platform components, and run an inference model. In this tutorial, you will:

Install Dynamo CRDs.
Install the Dynamo platform with the Kubernetes AI (Kai) scheduler and Grove enabled.
Deploy an inference model using an example from the Dynamo repository and a Hugging Face token.
List and delete deployments when you are done.

What you'll need

Before you start, you must have:

A CKS cluster with GPU nodes
kubectl installed and configured to access your cluster
helm installed
A Hugging Face access token for model access

What you'll use

You’ll use these tools and components:

Dynamo: Cluster-wide inference orchestration from the Dynamo repository
Helm: To install Dynamo CRDs and platform charts from NVIDIA NGC
Kubernetes AI (Kai) scheduler and Grove: Enabled in the platform install for scheduling and routing

1. Set environment

Set the namespace where Dynamo will be installed and the Dynamo release version.

export NAMESPACE=dynamo-system
export RELEASE_VERSION=0.9.0

You will use these variables in the following steps. Verify with echo $NAMESPACE and echo $RELEASE_VERSION if needed.

2. Install CRDs

Skip this step if Dynamo CRDs are already installed on the cluster.

Fetch and install the Dynamo CRDs chart:

helm fetch https://helm.ngc.nvidia.com/nvidia/ai-dynamo/charts/dynamo-crds-${RELEASE_VERSION}.tgz

helm install dynamo-crds dynamo-crds-${RELEASE_VERSION}.tgz --namespace default

When the install completes without error, the CRDs are installed. Optionally verify with:
```
kubectl get crd | grep dynamo
```

3. Install platform

Install the Dynamo platform into the chosen namespace with the Kubernetes AI (Kai) scheduler and Grove enabled.

Fetch the Dynamo platform chart:

helm fetch https://helm.ngc.nvidia.com/nvidia/ai-dynamo/charts/dynamo-platform-${RELEASE_VERSION}.tgz

Install the Dynamo platform:

helm install dynamo-platform dynamo-platform-${RELEASE_VERSION}.tgz --namespace ${NAMESPACE} --create-namespace \
--set "grove.enabled=true" \
--set "kai-scheduler.enabled=true" \
--set "dynamo-operator.controllerManager.manager.image.tag=${RELEASE_VERSION}" \
--set "dynamo-operator.controllerManager.kubeRbacProxy.image.repository=registry.k8s.io/kubebuilder/kube-rbac-proxy" \
--set "dynamo-operator.controllerManager.kubeRbacProxy.image.tag=v0.15.0"

You should see output similar to the following:

 I0312 13:13:13.800378    5735 warnings.go:110] "Warning: tls: failed to find any PEM data in certificate input"
 NAME: dynamo-platform
 LAST DEPLOYED: Thu Mar 12 13:12:16 2026
 NAMESPACE: dynamo-system
 STATUS: deployed
 REVISION: 1
 TEST SUITE: None
 NOTES:
 SPDX-FileCopyrightText: Copyright (c) 2025-2026 NVIDIA CORPORATION & AFFILIATES. All rights reserved.
 SPDX-License-Identifier: Apache-2.0

 Licensed under the Apache License, Version 2.0 (the "License");
 you may not use this file except in compliance with the License.
 You may obtain a copy of the License at

 http://www.apache.org/licenses/LICENSE-2.0

 Unless required by applicable law or agreed to in writing, software
 distributed under the License is distributed on an "AS IS" BASIS,
 WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
 See the License for the specific language governing permissions and
 limitations under the License.

When the install completes, Dynamo platform components run in ${NAMESPACE}. Verify with:
```
kubectl get pods -n ${NAMESPACE}
```
Ensure the expected pods are running before you deploy a model.

4. Deploy an inference model

This tutorial uses the Qwen3-0.6B model, which you deploy using the agg.yaml from the Dynamo repo.

Download the agg.yaml and modify the image value by changing my-tag to 0.9.1 on lines 16 and 30:
```
image: nvcr.io/nvidia/ai-dynamo/vllm-runtime:0.9.1
```

Create a Kubernetes secret with your Hugging Face token:

kubectl create secret generic hf-token-secret --from-literal=HF_TOKEN="INSERT-TOKEN-HERE" -n ${NAMESPACE}

Replace INSERT-TOKEN-HERE with your Hugging Face token.

Apply the agg.yaml to your cluster:
```
kubectl apply -f agg.yaml -n ${NAMESPACE}
```
More example models
More example models and manifests are available in the Dynamo repository. If you use one of these example models, be sure to create a Kubernetes secret for your Hugging Face token (same as step 2 above). If you’re deploying an example model instead of the agg.yaml, you might need to configure storage:
- Configure storage for the example. In your chosen example, edit model-cache/cache.yaml and replace every instance of your-storage-class with shared-vast so the model cache uses CoreWeave storage.
- Apply the model download manifest to download the model data:
  kubectl apply -f model-cache/model-download.yaml
When the download job completes, the model is cached and ready for inference. Follow any additional instructions in the example for running or exposing the model.

5. Run inference

In another terminal, forward the vllm-agg-frontend service:

kubectl port-forward service/vllm-agg-frontend 8000:8000 -n $NAMESPACE

To test the deployment, run:

curl -X POST http://localhost:8000/v1/chat/completions \
  -H "Content-Type: application/json" \
  -d '{
    "model": "Qwen/Qwen3-0.6B",
    "messages": [
      {
        "role": "user",
        "content": "Hello, how are you?"
      }
    ],
    "stream": false,
    "max_tokens": 100
  }'

The response includes a JSON body with a choices array containing the model reply.

6. Clean up

To list Dynamo graph deployments in your namespace, run:

kubectl get dynamographdeployment -n $NAMESPACE

To delete a deployment, use the resource kind and the deployment name. For example, to delete a deployment named vllm-agg-router:

kubectl delete dynamographdeployment vllm-agg-router -n $NAMESPACE

The deployment is removed when the command succeeds.

CoreWeave Kubernetes Service

Documentation Index

What you'll need

What you'll use

​1. Set environment

​2. Install CRDs

​3. Install platform

​4. Deploy an inference model

​More example models

​5. Run inference

​6. Clean up