Skip to main content

Documentation Index

Fetch the complete documentation index at: https://docs.coreweave.com/llms.txt

Use this file to discover all available pages before exploring further.

GPU driver management in CKS has the following limitations:
  • No Cloud Console support: Configuration must be done through Kubernetes manifests.
  • Limited version updates: You can only select major version updates. When minor version updates are available, CKS generates a new pending configuration on the Node Pool. Pending configurations can be found on the Node Pool’s status.pendingNodeConfiguration. See Manage Node Pool Configuration for more details.
  • Release channels are not supported: The latest and stable release channels are not supported in Node Pools.
Please contact CoreWeave support if you have questions about GPU driver management.

Create a new Node Pool with a specific driver version

Driver versions are configured in the Node Pool manifest. To select a driver version, add the gpu section to your Node Pool manifest’s spec section, specifying the desired major version without dots. For example, for an H100 Node Pool, you would specify the driver version as 570:
apiVersion: compute.coreweave.com/v1alpha1
kind: NodePool
metadata:
  name: example-nodepool
spec:
  computeClass: default
  instanceType: gd-8xh100ib-i128
  targetNodes: 1
  gpu:
    version: "570"  # Specify driver version by major version
If no driver is specified, the Node Pool automatically uses the latest available driver.

Update the driver version on an existing Node Pool

If a driver is currently specified on an existing Node Pool, you can update it to a new major version by modifying the existing Node Pool manifest.
# Original Node Pool
apiVersion: compute.coreweave.com/v1alpha1
kind: NodePool
metadata:
  name: test-nodepool
spec:
  computeClass: default
  instanceType: gd-8xh100-i128
  nodeConfigurationUpdateStrategy:
    type: OnSpecUpdate
  targetNodes: 1
  gpu:
    version: "570"
The Node Pool manifest would be updated to:
# Updated Node Pool with new driver version
apiVersion: compute.coreweave.com/v1alpha1
kind: NodePool
metadata:
  name: test-nodepool
spec:
  computeClass: default
  instanceType: gd-8xh100-i128
  nodeConfigurationUpdateStrategy:
    type: OnSpecUpdate
  targetNodes: 1
  gpu:
    version: "580"

Apply GPU driver updates

With the default node configuration update strategy OnSpecUpdate, updating the driver version will automatically stage the new configuration onto the Node Pool. Existing Nodes can then be reconfigure rebooted to take effect. For more information on configuration management, see Manage Node Pool Configuration.

Target driver versions using Node labels and selectors

Driver version information is exposed on Nodes through Kubernetes labels. You can use these labels to get information on current driver versions and to target specific driver versions in your workloads.
# Check the current driver version label on nodes
kubectl get nodes --show-labels | grep driver-version
Node labels are in the format gpu.coreweave.cloud/driver-version=<major>.<minor>.<patch>, where the value (570, in this example) represents the full driver version. For example, a Node with the label gpu.coreweave.cloud/driver-version=570 is running driver version 570.
The gpu.coreweave.cloud/driver-version label is always applied to Nodes, even if no driver version is specified in the Node Pool manifest.

Target specific driver versions in workloads

The gpu.coreweave.cloud/driver-version label allows you to target Nodes with exact driver version matches.
For detailed information about scheduling workloads on Nodes with specific driver versions, see Scheduling Workloads. It is strongly recommended to avoid scheduling across multiple driver versions in a single Node Pool.

Scheduling workloads on Nodes with specific driver versions

For workloads that require a specific driver version, use an exact match with the nodeSelector field:
apiVersion: v1
kind: Pod
metadata:
  name: gpu-workload
spec:
  containers:
  - name: gpu-container
    image: nvidia/cuda:11.8-base
    command: ["nvidia-smi"]
    resources:
      limits:
        nvidia.com/gpu: 1
  nodeSelector:
    gpu.coreweave.cloud/driver-version: "570.172.08-0ubuntu1"

Troubleshooting scheduling issues

If Pods fail to schedule due to driver version constraints, check the available driver versions in your cluster:
# Check available driver versions
kubectl get nodes --show-labels | grep driver-version

# Check Pod events for scheduling failures
kubectl describe pod [POD-NAME] | grep -A 10 Events:
Common scheduling issues may include:
  • No Nodes available with the exact driver version specified
  • Nodes with the required driver version are unavailable due to resource constraints
  • Driver version constraints conflict with other scheduling requirements

Troubleshooting

Common error conditions

If you encounter issues with driver configuration, check the Node Pool status for error conditions:
Status:
  Conditions:
    Last Transition Time: 2025-06-30T19:25:16Z
    Message: unable to create configuration for NodePool
    Reason: InternalError
    Status: False
    Type: Validated
Node Pool errorsFor more information about Node Pool events and possible error conditions, see Node Pool events.

Verify the driver version

To verify your Node Pool configuration and driver status, you can: Describe the Node Pool:
# Check Node Pool status
kubectl describe nodepool your-nodepool-name
Check the Node labels for driver version:
# Check node labels for driver version
kubectl get nodes --show-labels | grep driver-version
Or, check the GPU driver information on the Nodes by running nvidia-smi on a Pod running on the Node:
# Check GPU driver information on nodes
kubectl exec -it pod-name -- nvidia-smi

Next steps

Last modified on April 23, 2026