IMEX overview - CoreWeave Docs

Internode Memory Exchange (IMEX) is NVIDIA’s service for mapping GPU memory across compute Nodes that belong to the same NVLink domain. IMEX works at the driver level so that CUDA applications and communication libraries such as NCCL can transparently import and export memory over NVLink, rather than treating each Node as an isolated GPU pool. On CoreWeave, IMEX matters most for rack-scale, NVLink-connected systems where a single distributed job must treat GPUs across Nodes as one large memory fabric. If your workload does not rely on cross-Node NVLink memory access, you do not need to think about IMEX.

NVLink domains and placement

An NVLink domain (sometimes called an NVLink cluster) is the set of Nodes that can reach each other over NVLink. Your Pods must land on Nodes that are part of the same domain, or cross-Node NVLink memory access will not work as expected. In practice, that means you must align scheduling with physical connectivity. On NVL72-powered instances, all Nodes in a single rack share the same NVLink domain. The nvidia.com/gpu.clique label identifies the NVLink partition within that domain. Workloads must use this label as a Pod affinity topologyKey so that all related Pods land on Nodes in the same partition. On CKS full-rack deployments, the default partition spans the entire domain, so partition and domain boundaries align. For the mechanics of setting this up, see IMEX with Dynamic Resource Allocation.

IMEX on CoreWeave Kubernetes Service

CoreWeave delivers IMEX channel access to workloads through Kubernetes Dynamic Resource Allocation and the NVIDIA DRA driver. You declare a ComputeDomain and attach ResourceClaims to Pods that need IMEX. The platform provisions the supporting IMEX-related components on the Node. You do not manually configure low-level IMEX services inside your containers. IMEX with DRA is in Limited Availability on supported instance types and continues to evolve. For prerequisites, enablement, and full YAML examples, read IMEX with Dynamic Resource Allocation.

Earlier clusters used a transparent nvidia-imex DaemonSet model for IMEX channels. New work should use DRA and ComputeDomain resources. See Cluster Components for how these pieces fit together.

Where to go next

The following resources provide more context and configuration guidance:

IMEX with Dynamic Resource Allocation: create ComputeDomain objects, claim IMEX channels, and verify ResourceClaim state.
NVIDIA IMEX guide: background on NVLink multi-Node architecture, IMEX behavior, and terminology such as Fabric Manager roles (for administrators who need NVIDIA’s full reference).

Documentation Index

​NVLink domains and placement

​IMEX on CoreWeave Kubernetes Service

​Where to go next

NVLink domains and placement

IMEX on CoreWeave Kubernetes Service

Where to go next