Skip to main content
Dynamo provides deployment and orchestration for inference workloads. This tutorial shows you how to deploy Dynamo on CoreWeave Kubernetes Service (CKS) for cluster-wide inference, install its custom resources and platform components, and run an inference model. By the end, you have a working Dynamo deployment on your CKS cluster that serves a model through a local endpoint, which you can adapt to run other inference workloads at scale. This tutorial is for cluster administrators and ML engineers who want to run distributed inference on CKS with GPU Nodes. In this tutorial, you:
  1. Install Dynamo CRDs.
  2. Install the Dynamo platform with the Kubernetes AI (Kai) scheduler and Grove enabled.
  3. Deploy an inference model using an example from the Dynamo repository and a Hugging Face token.
  4. List and delete deployments when you’re done.

What you'll need

Before you start, you must have:

What you'll use

You use these tools and components:
  • Dynamo: Cluster-wide inference orchestration from the Dynamo repository.
  • Helm: To install Dynamo CRDs and platform charts from NVIDIA NGC.
  • Kubernetes AI (Kai) scheduler and Grove: Enabled in the platform install for scheduling and routing.

Set environment

Set the namespace where Dynamo is installed and the Dynamo release version. The remaining steps reference these variables, so setting them once keeps the commands consistent.
export NAMESPACE=dynamo-system
export RELEASE_VERSION=0.9.0
You use these variables in the following steps. Verify with echo $NAMESPACE and echo $RELEASE_VERSION if needed.

Install CRDs

Dynamo requires custom resource definitions (CRDs) on the cluster before the platform components can run. Skip this step if Dynamo CRDs are already installed on the cluster.
  1. Fetch and install the Dynamo CRDs chart:
    helm fetch https://helm.ngc.nvidia.com/nvidia/ai-dynamo/charts/dynamo-crds-${RELEASE_VERSION}.tgz
    
    helm install dynamo-crds dynamo-crds-${RELEASE_VERSION}.tgz --namespace default
    
  2. When the install completes without error, the CRDs are installed. Optionally verify with:
    kubectl get crd | grep dynamo
    

Install the platform

Install the Dynamo platform into the chosen namespace with the Kubernetes AI (Kai) scheduler and Grove enabled.
  1. Fetch the Dynamo platform chart:
    helm fetch https://helm.ngc.nvidia.com/nvidia/ai-dynamo/charts/dynamo-platform-${RELEASE_VERSION}.tgz
    
  2. Install the Dynamo platform:
    helm install dynamo-platform dynamo-platform-${RELEASE_VERSION}.tgz --namespace ${NAMESPACE} --create-namespace \
    --set "grove.enabled=true" \
    --set "kai-scheduler.enabled=true" \
    --set "dynamo-operator.controllerManager.manager.image.tag=${RELEASE_VERSION}" \
    --set "dynamo-operator.controllerManager.kubeRbacProxy.image.repository=registry.k8s.io/kubebuilder/kube-rbac-proxy" \
    --set "dynamo-operator.controllerManager.kubeRbacProxy.image.tag=v0.15.0"
    
    The output is similar to the following:
     I0312 13:13:13.800378    5735 warnings.go:110] "Warning: tls: failed to find any PEM data in certificate input"
     NAME: dynamo-platform
     LAST DEPLOYED: Thu Mar 12 13:12:16 2026
     NAMESPACE: dynamo-system
     STATUS: deployed
     REVISION: 1
     TEST SUITE: None
     NOTES:
     SPDX-FileCopyrightText: Copyright (c) 2025-2026 NVIDIA CORPORATION & AFFILIATES. All rights reserved.
     SPDX-License-Identifier: Apache-2.0
    
     Licensed under the Apache License, Version 2.0 (the "License");
     you may not use this file except in compliance with the License.
     You may obtain a copy of the License at
    
     http://www.apache.org/licenses/LICENSE-2.0
    
     Unless required by applicable law or agreed to in writing, software
     distributed under the License is distributed on an "AS IS" BASIS,
     WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
     See the License for the specific language governing permissions and
     limitations under the License.
    
  3. When the install completes, Dynamo platform components run in ${NAMESPACE}. Verify with:
    kubectl get pods -n ${NAMESPACE}
    
    Ensure the expected Pods are running before you deploy a model.

Deploy an inference model

With the platform running, you can now deploy a model for inference. This tutorial uses the Qwen3-0.6B model, which you deploy using the agg.yaml from the Dynamo repository.
  1. Download the agg.yaml and modify the image value by changing my-tag to 0.9.1 on lines 16 and 30:
    image: nvcr.io/nvidia/ai-dynamo/vllm-runtime:0.9.1
    
  2. Create a Kubernetes Secret with your Hugging Face token. Replace [HF-TOKEN] with your Hugging Face token:
    kubectl create secret generic hf-token-secret --from-literal=HF_TOKEN="[HF-TOKEN]" -n ${NAMESPACE}
    
  3. Apply the agg.yaml to your cluster:
    kubectl apply -f agg.yaml -n ${NAMESPACE}
    
    More example models More example models and manifests are available in the Dynamo repository. If you use one of these example models, create a Kubernetes Secret for your Hugging Face token (same as the preceding step). If you’re deploying an example model instead of the agg.yaml, you might need to configure storage:
    • Configure storage for the example. In your chosen example, edit model-cache/cache.yaml and replace every instance of your-storage-class with shared-vast so the model cache uses CoreWeave storage.
    • Apply the model download manifest to download the model data:
      kubectl apply -f model-cache/model-download.yaml
      
    When the download job completes, the model is cached and ready for inference. Follow any additional instructions in the example for running or exposing the model.

Run inference

With the model deployed, you can send inference requests to it from your local machine. In another terminal, forward the vllm-agg-frontend service:
kubectl port-forward service/vllm-agg-frontend 8000:8000 -n $NAMESPACE
To test the deployment, run:
curl -X POST http://localhost:8000/v1/chat/completions \
  -H "Content-Type: application/json" \
  -d '{
    "model": "Qwen/Qwen3-0.6B",
    "messages": [
      {
        "role": "user",
        "content": "Hello, how are you?"
      }
    ],
    "stream": false,
    "max_tokens": 100
  }'
The response includes a JSON body with a choices array containing the model reply.

Clean up

When you no longer need the deployment, remove it to free cluster resources. To list Dynamo graph deployments in your namespace, run:
kubectl get dynamographdeployment -n $NAMESPACE
To delete a deployment, use the resource kind and the deployment name. For example, to delete a deployment named vllm-agg-router:
kubectl delete dynamographdeployment vllm-agg-router -n $NAMESPACE
The deployment is removed when the command succeeds.
Last modified on June 10, 2026