Skip to main content
Request InfiniBand resources in your Pod’s container spec with rdma/ib: 1 (the same resource is used for both InfiniBand and RoCE Nodes). Set the NCCL and UCX environment variables that CoreWeave documents: NCCL_SOCKET_IFNAME=eth0, NCCL_IB_HCA=ibp, UCX_TLS=tcp, and UCX_NET_DEVICES=eth0. These ensure NCCL uses the InfiniBand HCA instead of falling back to TCP. Schedule the Pods onto Nodes with InfiniBand support, and verify from inside the Pod with ibstat that interfaces show State: Active and Physical state: LinkUp. For full details and a complete Pod YAML, see Use GPUDirect RDMA with InfiniBand. If performance is poor after configuration, see Why is my multi-node NCCL training slow?.
Administrator
Last modified on June 18, 2026