Documentation Index
Fetch the complete documentation index at: https://docs.coreweave.com/llms.txt
Use this file to discover all available pages before exploring further.
Create a new Node Pool with a specific driver version
Driver versions are configured in the Node Pool manifest. To select a driver version, add thegpu section to your Node Pool manifest’s spec section, specifying the desired major version without dots.
For example, for an H100 Node Pool, you would specify the driver version as 570:
Update the driver version on an existing Node Pool
If a driver is currently specified on an existing Node Pool, you can update it to a new major version by modifying the existing Node Pool manifest.Apply GPU driver updates
With the default node configuration update strategyOnSpecUpdate, updating the driver version will automatically stage the new configuration onto the Node Pool. Existing Nodes can then be reconfigure rebooted to take effect. For more information on configuration management, see Manage Node Pool Configuration.
Target driver versions using Node labels and selectors
Driver version information is exposed on Nodes through Kubernetes labels. You can use these labels to get information on current driver versions and to target specific driver versions in your workloads.gpu.coreweave.cloud/driver-version=<major>.<minor>.<patch>, where the value (570, in this example) represents the full driver version. For example, a Node with the label gpu.coreweave.cloud/driver-version=570 is running driver version 570.
The
gpu.coreweave.cloud/driver-version label is always applied to Nodes, even if no driver version is specified in the Node Pool manifest.Target specific driver versions in workloads
Thegpu.coreweave.cloud/driver-version label allows you to target Nodes with exact driver version matches.
For detailed information about scheduling workloads on Nodes with specific driver versions, see Scheduling Workloads. It is strongly recommended to avoid scheduling across multiple driver versions in a single Node Pool.
Scheduling workloads on Nodes with specific driver versions
For workloads that require a specific driver version, use an exact match with thenodeSelector field:
Troubleshooting scheduling issues
If Pods fail to schedule due to driver version constraints, check the available driver versions in your cluster:- No Nodes available with the exact driver version specified
- Nodes with the required driver version are unavailable due to resource constraints
- Driver version constraints conflict with other scheduling requirements
Troubleshooting
Common error conditions
If you encounter issues with driver configuration, check the Node Pool status for error conditions:Node Pool errorsFor more information about Node Pool events and possible error conditions, see Node Pool events.
Verify the driver version
To verify your Node Pool configuration and driver status, you can: Describe the Node Pool:nvidia-smi on a Pod running on the Node:
Next steps
- Apply the new driver version to the Node Pool by queuing a reconfigure reboot for the Node Pool.