With the release of ncore-image v.2.10.1, you can use NVSHMEM and GDRCopy in your container image for high-performance GPU-to-GPU communication.Documentation Index
Fetch the complete documentation index at: https://docs.coreweave.com/llms.txt
Use this file to discover all available pages before exploring further.
Overview
NVSHMEM (NVIDIA SHMEM) and GDRCopy (GPU Direct RDMA Copy) enable direct memory access between GPUs without involving the CPU, significantly reducing latency and increasing throughput for certain workloads.Accessing the image
To gain access to ncore-image v.2.10.1, contact CoreWeave Support.Image modifications
ncore-image v.2.10.1 contains the following modifications to support NVSHMEM usage with ibdga:NVIDIA driver options
nvidia.NVreg_EnableStreamMemOPs=1nvidia.NVreg_RegistryDwords="PeerMappingOverride=1;"
GDRCopy driver
gdrdrv-dkms_2.5-1
Using the image
When using this image:1. Enable GDRCopy environment variable
Make sure the environment variable in the container is set to enable GDRCopy. This allows you to access gdrdrv:If you’re using SLURM, this environment variable is already set.
2. Patch NVSHMEM ibgda
In NVSHMEM version 3.2.5, patchibgda in your container(s). Download NVSHMEM version 3.2.5.
In src/modules/transport/ibgda/ibgda.cpp, line 3659 needs to be changed from mlx5 to ibp to work in SUNK and CKS.
Original code: