June 10, 2025
Support for nvshmem and gdrcopy now available
With the release of ncore-image v.2.10.1, you can use NVSHMEM and GDRCopy in your container image.
ncore-image v.2.10.1 contains the following modifications to support NVSHMEM usage with ibdga:
- NVIDIA driver options:
nvidia.NVreg_EnableStreamMemOPs=1
nvidia.NVreg_RegistryDwords="PeerMappingOverride=1;"
gdrdrv-dkms_2.5-1
When using this image:
-
Make sure the environment variable in the container is set to enable GDRCopy. This allows you to access gdrdrv:
Exampleenv:- name: NVIDIA_GDRCOPYvalue: enabledIf you're using SLURM, this environment variable is already set.
-
In NVSHMEM version 3.2.5, patch
ibgda
in your container(s). Download NVSHMEM version 3.2.5.In
src/modules/transport/ibgda/ibgda.cpp
, line 3659 needs to be changed. Here's the original:Exampleif (!strstr(name, "ibp")) {ftable.close_device(device->context);device->context = NULL;NVSHMEMI_WARN_PRINT("device %s is not enumerated as an mlx5 device. Skipping...\n",name);continue;}Here's what to change it to:
Exampleif (!strstr(name, "mlx5")) {ftable.close_device(device->context);device->context = NULL;NVSHMEMI_WARN_PRINT("device %s is not enumerated as an mlx5 device. Skipping...\n",name);continue;}