In this guide, learn how to use GPUDirect RDMA with InfiniBand at CoreWeave, and how to test it with NCCL.Documentation Index
Fetch the complete documentation index at: https://docs.coreweave.com/llms.txt
Use this file to discover all available pages before exploring further.
Prerequisites
CoreWeave supports GPUDirect RDMA over InfiniBand for some GPU instance types. To use this feature, you must:- select a Node Pool with InfiniBand support,
- install NCCL and the OFED driver in the Pod image, then,
- configure the Pods to use GPUDirect RDMA.
Select a Node Pool with InfiniBand support
To use GPUDirect RDMA, make sure the Node Pool has Nodes with InfiniBand, as shown in our list of GPU instance types. All Nodes with InfiniBand have the required kernel drivers pre-installed. CKS manages all the required driver and operator dependencies. To avoid Node instability, you should not install other driver management tools.Configure the Pods
The Pods must be configured to use GPUDirect RDMA over InfiniBand. Follow these steps:-
Set the value of
spec.containers.resources.requests.rdma/ibto1. This value does not indicate the number of InfiniBand devices requested, it’s used as a boolean to schedule Pods onto servers with InfiniBand support. Kubernetes schedules resources throughrequestsandlimits. When onlylimitsare specified, therequestsare set to the same amount as the limit. To learn more about container resource management on Kubernetes, see the official Kubernetes documentation. See the full YAML example below for reference showing how to set therdma/ibvalue in the Pod spec for bothrequestsandlimits. -
Configure the Pods to use GPUDirect RDMA by setting these environment variables:
NCCL_SOCKET_IFNAME: The network interface name to use for NCCL communication. This should be set to the InfiniBand interface name.NCCL_IB_HCA: The InfiniBand host channel adapter (HCA) to use for NCCL communication.UCX_NET_DEVICES: The network devices to use for UCX communication. This should be set to the InfiniBand interface name.
-
(Optional) Enable extended logging with the
NCCL_DEBUGenvironment variable. To increase the verbosity of NCCL’s logging, set theNCCL_DEBUGenvironment variable toINFOfor extra debug information. This can help diagnose issues with RDMA support, but it increases the log file size, so it should be disabled when testing is complete. SeeNCCL_DEBUGin the NCCL documentation for more logging options.
Kubernetes example
When deploying a Kubernetes Pod in the cluster, use the highlighted lines below to set therdma/ib value in the Pod spec for both requests and limits, and set the required environment variables.
Kubernetes example with debug logging
NCCL_DEBUG to INFO enables extended logging and can be removed if extended logging is not required.
Slurm example
When deploying a Slurm job, use the highlighted lines below to set the required environment variables. RemoveNCCL_DEBUG unless extended logging is needed.
Example Slurm sbatch script
Testing with NCCL
CoreWeave has several sample NCCL test jobs designed for use with MPI Operator or Slurm. These are in thenccl-tests repository, which you can use to test GPUDirect RDMA support with InfiniBand. For more information, refer to instructions for testing in the repository.