HPC over RDMA with InfiniBand (Quantum-X)
GPUDirect RDMA over InfiniBand provides low-latency, high-reliability communication for GPUs to exchange information in traditional HPC environments. CoreWeave has partnered with NVIDIA to design CoreWeave’s InfiniBand interconnected clusters using NVIDIA Quantum-2 and Quantum-X800 technology. These clusters use GPUDirect to let GPUs communicate directly with each other across the InfiniBand fabric, which bypasses the Node’s CPU and kernel resources. InfiniBand is the industry standard for large-scale LLM training and scientific discovery models that require maximum memory fabric performance.HPC over RDMA with RoCE (Spectrum-X)
RDMA over Converged Ethernet (RoCE) through the NVIDIA Spectrum-X platform provides an Ethernet-native fabric designed for AI. Available on GB300-powered instances, Spectrum-X delivers high-performance RDMA using ConnectX-8 SuperNICs and BlueField-3 DPUs. It provides InfiniBand-like performance, including performance isolation and congestion control, within a standard Ethernet ecosystem. The following sections describe the underlying technologies that both fabrics rely on.About GPUDirect RDMA
GPUDirect Remote Direct Memory Access (RDMA) provides direct access between the main memory of two computers without involving the operating system, cache, or storage. This direct access supports high-throughput, low-latency data transfers with minimal CPU use. In a typical IP data transfer, the kernel must handle data movement, which copies data across system memory multiple times and increases CPU load. RDMA bypasses the kernel and lets the host adapter store data directly in the application’s memory space. To learn more, see:About NVIDIA Mellanox SHARP
Traditionally, communication requirements scale proportionally with the number of Nodes in an HPC cluster. NVIDIA® Mellanox® Scalable Hierarchical Aggregation and Reduction Protocol (SHARP) moves collection operations from the individual Nodes into the network. This shift produces a flat scaling curve and improves the effective interconnect bandwidth. NVIDIA Quantum switches process data as it traverses the network, which reduces the number of times that data traverses between server endpoints. The switches also aggregate large data vectors at wire speed, which is important for machine learning applications. CoreWeave’s InfiniBand topology is fully SHARP compliant, and all components that use SHARP are implemented in the network control-plane, such as Adaptive Routing and Aggregation Managers.SHARP can double the performance of a compliant InfiniBand network compared to similar networks without SHARP. For more information, see Mellanox in-network computing for AI.