HPC Interconnect
About Ethernet and InfiniBand High Performance Computing Interconnects
CoreWeave Kubernetes Service (CKS) is designed for High Performance Computing (HPC) workloads. Our GPUs and optimized networking fabric provide the best environment for parallel jobs, with thousands to tens of thousands of GPUs working together in areas such as Neural Net Training, Rendering and Simulation.
For highly parallel workloads, such as training large language models, the network fabric is a critical component. CoreWeave's IP-over-Ethernet and InfiniBand fabrics are optimized for high throughput and low latency. CoreWeave's InfiniBand fabric is designed with a non-blocking, Fat Tree architecture and SHARP optimizations. This supports the most demanding workloads, such as training Large Language Models on clusters with thousands of GPUs.
HPC over IP with Ethernet
Ethernet fabrics use a cut-through design with sub-microsecond switching that allows many HPC workloads peak performance without further optimization or configuration. NVIDIA NCCL over IP transport is supported across all CoreWeave GPU clusters.
HPC over RDMA with InfiniBand
GPUDirect RDMA over InfiniBand is the fastest, lowest-latency, and most reliable way for GPUs to exchange information.
CoreWeave has partnered with NVIDIA to design our InfiniBand interconnected clusters. These clusters use GPUDirect to allow GPUs to communicate directly with each other across the InfiniBand fabric. Latency is reduced because GPUDirect bypasses the Node's CPU and consumes no kernel resources, freeing the Node to perform other tasks.
Fabric Topology
The InfiniBand fabric is assembled with NVIDIA Quantum HDR and NDR InfiniBand Switches. The network topology is laid out in a non-blocking, Fat Tree architecture with no oversubscribed links. This topology is rail-optimized, to allow further latency optimizations in all-reduce style operations.
InfiniBand fabrics provide hundreds of terabits of aggregate bandwidth. CoreWeave monitors and optimizes each link with NVIDIA's best-of-breed tooling.
NVIDIA Mellanox SHARP
Traditionally, communication requirements scale proportionally with number of Nodes in a HPC cluster. NVIDIA® Mellanox® Scalable Hierarchical Aggregation and Reduction Protocol (SHARP) moves collection operations from the individual Nodes into the network. This allows for a flat scaling curve and improves the effective interconnect bandwidth.
By processing data as it traverses the network, NVIDIA Quantum switches reduce the number of times that data traverses between server endpoints. The switches also aggregate large data vectors at wire speed, which is crucial for machine learning applications. CoreWeave's InfiniBand topology is fully SHARP compliant, and all components to leverage SHARP are implemented in the network control-plane, such as Adaptive Routing and Aggregation Managers.
SHARP effectively doubles the performance of a compliant InfiniBand network when compared to similar networks without SHARP. Read more about SHARP.