Skip to main content

Documentation Index

Fetch the complete documentation index at: https://docs.coreweave.com/llms.txt

Use this file to discover all available pages before exploring further.

CoreWeave Kubernetes Service (CKS) is designed for High Performance Computing (HPC) workloads. Our GPUs and optimized networking fabrics provide the best environment for parallel jobs, with thousands to tens of thousands of GPUs working together in areas such as Neural Net Training, Rendering, and Simulation. For highly parallel workloads, such as training large language models, the network fabric is a critical component. CoreWeave offers two distinct high-performance RDMA-enabled fabrics: NVIDIA Quantum InfiniBand and NVIDIA Spectrum-X (RoCE). Both are optimized for high throughput and low latency, utilizing a non-blocking, Fat Tree architecture to support the most demanding frontier AI models.

HPC over RDMA with InfiniBand (Quantum-X)

GPUDirect RDMA over InfiniBand is the fastest, lowest-latency, and most reliable way for GPUs to exchange information in traditional HPC environments. CoreWeave has partnered with NVIDIA to design our InfiniBand interconnected clusters using NVIDIA Quantum-2 and Quantum-X800 technology. These clusters use GPUDirect to allow GPUs to communicate directly with each other across the InfiniBand fabric, bypassing the Node’s CPU and kernel resources. InfiniBand is the industry standard for large-scale LLM training and scientific discovery models requiring maximum memory fabric performance.

HPC over RDMA with RoCE (Spectrum-X)

RDMA over Converged Ethernet (RoCE) via the NVIDIA Spectrum-X platform provides an Ethernet-native fabric specifically designed for AI. Available on GB300-powered instances, Spectrum-X delivers high-performance RDMA using ConnectX-8 SuperNICs and BlueField-3 DPUs. It is designed to provide InfiniBand-like performance, including performance isolation and congestion control, within a standard Ethernet ecosystem.

About GPUDirect RDMA

GPUDirect Remote Direct Memory Access (RDMA) enables direct access between the main memory of two computers without involving the operating system, cache, or storage. This allows for high-throughput, low-latency data transfers with minimal CPU utilization. In a typical IP data transfer, the kernel must handle data movement, copying data across system memory multiple times and increasing CPU load. In contrast, RDMA bypasses the kernel, allowing the host adapter to store data directly in the application’s memory space. To learn more, see:

About NVIDIA Mellanox SHARP

Traditionally, communication requirements scale proportionally with number of Nodes in a HPC cluster. NVIDIA® Mellanox® Scalable Hierarchical Aggregation and Reduction Protocol (SHARP) moves collection operations from the individual Nodes into the network. This allows for a flat scaling curve and improves the effective interconnect bandwidth. By processing data as it traverses the network, NVIDIA Quantum switches reduce the number of times that data traverses between server endpoints. The switches also aggregate large data vectors at wire speed, which is crucial for machine learning applications. CoreWeave’s InfiniBand topology is fully SHARP compliant, and all components to leverage SHARP are implemented in the network control-plane, such as Adaptive Routing and Aggregation Managers.
SHARP effectively doubles the performance of a compliant InfiniBand network when compared to similar networks without SHARP. Read more about SHARP.
Last modified on April 7, 2026