Skip to main content

RDMA and InfiniBand

GPUDirect Remote Direct Memory Access (RDMA) allows data to be transferred directly between the memory of two systems without involving the CPU. This technology significantly reduces latency and increases throughput for certain workloads.

InfiniBand is a high-speed network technology that provides low-latency, high-bandwidth data transfers. InfiniBand is commonly used in high-performance computing (HPC) and machine learning workloads.

GPUDirect RDMA

GPUDirect Remote Direct Memory Access (RDMA) enables direct access between the main memory of two computers without involving the operating system, cache, or storage. This allows for high-throughput, low-latency data transfers with minimal CPU utilization.

In a typical IP data transfer, when an application sends data to a remote machine, the kernel receives the data and then determines which application should receive it. The kernel then wakes up the application, waits for the application to perform a system call into the kernel, and finally copies the data from the kernel's internal memory space into the application's buffer. This process copies data across the system's main memory at least twice, and context switching between the kernel and application can increase CPU load.

In contrast, RDMA bypasses the kernel to reduce CPU overhead. RDMA protocol allows the host adapter to store the data directly in the application's memory space.

Next up