GB200 instances
CoreWeave’s GB200 instances are powered by 4x NVIDIA GB200 GPUs and connected with 400 Gb/s NDR InfiniBand, built on the NVIDIA Quantum-2 InfiniBand fabric.GB300 instances
CoreWeave’s GB300 instances are offered in two specialized networking configurations that deliver 800 Gbps of bandwidth:- GB300 instances with Quantum-X InfiniBand are optimized for low latency in traditional HPC and large-scale AI training.
- GB300 instances with Spectrum-X RoCE (RDMA over Converged Ethernet) use BlueField-3 and ConnectX-8 SuperNICs for large-scale AI in Ethernet-based cloud environments.
Deploy NVL72-powered instances as full racks
NVL72-powered instances must be deployed as full racks of 18 Nodes to ensure optimal performance. CKS enforces full-rack deployment and won’t allow requesting partial racks. When deploying Node Pools for rack-based instances, usetargetRacks to request Nodes at the rack level. You can still use targetNodes, but the value must be a multiple of 18, such as 36 or 54.
Use targetRacks to specify the number of racks directly, where each rack contains 18 Nodes:
NodePool using targetRacks
targetNodes set to a multiple of 18, such as 18, 36, or 54:
NodePool using targetNodes
Autoscaling is not supported for rack-based instance types. Setting
autoscaling: true on a Node Pool with a GB200 or GB300 instance type is rejected by CKS.Manage Pod affinity
To take full advantage of the NVL72 architecture’s shared NVLink fabric, all Nodes from the same job must be scheduled onto the same rack with the same NVLink domain for optimal performance. This is especially important for large-scale distributed computing tasks, where efficient communication between GPUs reduces processing times.Slurm users should use the Topology/Block Plugin for Slurm to control job placement.
Control placement with NVLink domain
Kubernetes controls Pod placement with affinity rules that steer Pods toward Nodes with specific labels. In CKS, all Nodes are labeled with their NVLink domain, allowing precise control over Pod placement. To ensure multiple Pods are scheduled onto the same NVL72 rack, set their affinity toward Nodes within the same NVLink domain. In the NVL72 architecture, all Nodes within the same rack share the same uniqueds.coreweave.com/nvlink.domain label. If a Node Pool spans multiple racks, Pods can reference multiple NVLink domains in matchExpressions.values.
For example, this Pod affinity rule targets a single NVLink domain: