Network security
CoreWeave’s network architecture delivers a secure, high-performance framework for deploying bare-metal Kubernetes clusters. It uses overlays and NVIDIA BlueField-3 DPUs to support strong tenant isolation, advanced observability, and minimal overhead, without relying on hypervisors. The core fabric is based on a Clos topology, with leaf and spine switches interconnected via BGP unnumbered EVPN. This design enables scalable Layer 3 segmentation using VXLAN encapsulation. EVPN Type 5 routes distribute IP prefixes, allowing each Kubernetes tenant or namespace to operate within an isolated VRF and VXLAN VNI. Each bare-metal CoreWeave Kubernetes Service (CKS) Node is equipped with a BlueField-3 DPU. These DPUs run independently from the host OS in their own Linux environments with DOCA-based applications. They handle PXE-based network bootstrapping, enforce security policies, and offload CNI functions such as routing, firewalling, and VXLAN termination. This architecture enables secure multi-tenancy and policy enforcement without a hypervisor. Network security is organized into three zones:- Zone 0: DPU management and Kubernetes Control Plane
- Zone 1: Application Data Plane
- Zone 2: External ingress and egress
Scalability and performance
This architecture is highly scalable and reliable for demanding model training and inference workloads because it offloads infrastructure operations from the main compute resources. BlueField-3 DPUs decouple networking, storage, and security from the host CPU, allowing full resource dedication to training and inference tasks. This reduces latency and jitter, and allows predictable performance scaling across many Nodes. EVPN Type 5 overlays enable efficient Layer 3 multi-tenancy without complex NAT or overlay stitching. VXLAN encapsulation supports cluster expansion across racks and data centers, while BGP-based routing optimizes data flows. The architecture supports consistent, low-latency packet handling and bandwidth prioritization, which is critical for real-time inference and distributed training.