Prerequisites
CoreWeave supports GPUDirect RDMA over InfiniBand for some GPU instance types. To use this feature, you must:- Select a Node Pool with InfiniBand support.
- Install NCCL and the OpenFabrics Enterprise Distribution (OFED) driver in the Pod image.
- Configure the Pods to use GPUDirect RDMA.
Select a Node Pool with InfiniBand support
To use GPUDirect RDMA, make sure the Node Pool has Nodes with InfiniBand, as shown in our list of GPU instance types. All Nodes with InfiniBand have the required kernel drivers pre-installed. CKS manages all the required driver and operator dependencies. To avoid Node instability, you should not install other driver management tools. Once you have a Node Pool that supports InfiniBand, the next section covers the Pod-level configuration that opts workloads into GPUDirect RDMA.Configure the Pods
Configure your Pods to use GPUDirect RDMA over InfiniBand by following these steps:-
Set the value of
spec.containers.resources.requests.rdma/ibto1. This value doesn’t indicate the number of InfiniBand devices requested. It works as a boolean to schedule Pods onto servers with InfiniBand support. Kubernetes schedules resources throughrequestsandlimits. When you specify onlylimits, Kubernetes setsrequeststo the same amount as the limit. For more information, see Resource management for Pods and containers in the Kubernetes documentation. For a full YAML example showing how to set therdma/ibvalue in the Pod spec for bothrequestsandlimits, see Kubernetes example. -
Configure the Pods to use GPUDirect RDMA by setting these environment variables:
NCCL_SOCKET_IFNAME: The network interface name to use for NCCL communication. Set this to the InfiniBand interface name.NCCL_IB_HCA: The InfiniBand host channel adapter (HCA) to use for NCCL communication.UCX_NET_DEVICES: The network devices to use for Unified Communication X (UCX) communication. Set this to the InfiniBand interface name.
-
Optional: Enable extended logging with the
NCCL_DEBUGenvironment variable. To increase the verbosity of NCCL’s logging, set theNCCL_DEBUGenvironment variable toINFOfor extra debug information. This helps diagnose issues with RDMA support, but it increases the log file size, so disable it when testing is complete. SeeNCCL_DEBUGin the NCCL documentation for more logging options.
Kubernetes example
When you deploy a Kubernetes Pod in the cluster, use the highlighted lines in this example to set therdma/ib value in the Pod spec for both requests and limits, and to set the required environment variables.
Kubernetes example with debug logging
NCCL_DEBUG to INFO enables extended logging. Remove it if you don’t need extended logging.
Slurm example
When you deploy a Slurm job, use the highlighted lines in this example to set the required environment variables. RemoveNCCL_DEBUG unless you need extended logging.
Example Slurm sbatch script
Test with NCCL
After you configure your Pods or Slurm jobs, verify that GPUDirect RDMA works as expected by running NCCL tests across multiple Nodes. CoreWeave provides several sample NCCL test jobs designed for use with Message Passing Interface (MPI) Operator or Slurm. To test GPUDirect RDMA support with InfiniBand, see thenccl-tests repository for the test jobs and instructions for running them.