Skip to main content

Security Architecture

Learn how CoreWeave security architecture works

This page provides an overview of CoreWeave's security architecture, including network design, scalability, identity management, and data protection.

Network security

CoreWeave's network architecture delivers a secure, high-performance framework for deploying bare-metal Kubernetes clusters. It uses EVPN Type 5 overlays and NVIDIA BlueField-3 DPUs to support strong tenant isolation, advanced observability, and minimal overhead—without relying on hypervisors.

The core fabric is based on a Clos topology, with leaf and spine switches interconnected via BGP unnumbered EVPN. This design enables scalable Layer 3 segmentation using VXLAN encapsulation. EVPN Type 5 routes distribute IP prefixes, allowing each Kubernetes tenant or namespace to operate within an isolated VRF and VXLAN VNI.

Each bare-metal CoreWeave Kubernetes Service (CKS) Node is equipped with a BlueField-3 DPU. These DPUs run independently from the host OS in their own Linux environments with DOCA-based applications. They handle PXE-based network bootstrapping, enforce security policies, and offload CNI functions such as routing, firewalling, and VXLAN termination. This architecture enables secure multi-tenancy and policy enforcement without a hypervisor.

Network security is organized into three zones:

  • Zone 0: DPU management and Kubernetes Control Plane
  • Zone 1: Application Data Plane
  • Zone 2: External ingress and egress

North-south traffic is filtered at the DPU using Layer 4 and optional Layer 7 policies. East-west traffic is managed by the CNI (such as Cilium with eBPF), enforced at the DPU level to isolate workloads by namespace or identity.

DPUs interface with the CNI plugin to map Pod interfaces to their correct VXLAN segments. Integration with SPIFFE/SPIRE and cert-manager can enable secure workload identities and mTLS lifecycle management within CKS clusters.

Observability is natively supported through the DPU, which exports logs and metrics. CoreWeave provides telemetry via VictoriaMetrics (PromQL-compatible), Loki, and customer-deployed security observability tools like Falco or eBPF-based Cilium Tetragon for runtime enforcement.

This architecture supports robust isolation and scalability for containerized workloads on bare metal, with networking and security operations offloaded to SmartNICs.

Scalability and performance

This architecture is highly scalable and reliable for demanding model training and inference workloads because it offloads infrastructure operations from the main compute resources.

BlueField-3 DPUs decouple networking, storage, and security from the host CPU, allowing full resource dedication to training and inference tasks. This reduces latency and jitter, and allows predictable performance scaling across many Nodes. EVPN Type 5 overlays enable efficient Layer 3 multi-tenancy without complex NAT or overlay stitching. VXLAN encapsulation supports cluster expansion across racks and data centers, while BGP-based routing optimizes data flows. The architecture supports consistent, low-latency packet handling and bandwidth prioritization, which is critical for real-time inference and distributed training.

Identity and Access Management

CoreWeave's Identity and Access Management (IAM) framework enforces granular, role-based access controls across the entire stack, spanning administrative interfaces, CKS workloads, and CoreWeave AI Object Storage. At the management plane (Console, API, Terraform provider, Grafana, Logs and Metrics APIs), IAM uses Role-Based Access Control (RBAC) with permissions based on roles like admin, write, read, or metrics. Access is authenticated via identity providers with Single Sign-On (SSO) and Multi-Factor Authentication (MFA) support, ensuring secure authentication and auditability.

Within CKS, IAM integrates with OpenID Connect (OIDC) for federated identity from providers like Okta. SAML federation is supported for Console (browser-based) login. Users and service accounts authenticate with OIDC tokens mapped to Kubernetes RBAC policies, enabling fine-grained, namespace-level access. Kubernetes service accounts can be mapped to external identity tokens using tools like kube-oidc-proxy or SPIFFE/SPIRE, supporting mTLS-based workload identities and zero-trust access.

For S3-compatible storage, IAM policies are defined in JSON, specifying principals, actions, and conditions for granular access control. These policies are evaluated at request time, ensuring strict authorization and centralized or delegated management.

Data Security

Data security is essential for privacy, compliance, and trust. In AI and cloud-native environments, security must cover the entire data lifecycle, from ingestion and storage to model training and inference. Effective data security combines encryption, access controls, workload isolation, and observability.

Using a KMS-backed setup, CoreWeave provides encryption at rest to securely deliver secrets to workloads running in CKS clusters. Encryption in transit (TLS or mTLS) protects data between services in Kubernetes clusters. You can manage policy enforcement with tools such as OPA/Gatekeeper, and network segmentation with Cilium, ensuring least-privilege access inside of CKS clusters. Data classification and tokenization de-identify sensitive data, reducing risk.

Immutable logging pipelines, like Kafka and Loki, ensure traceable data lineage and access. Container isolation (like with Kata Containers) and image vulnerability scanning prevent lateral movement and enforce security from build to deployment. CrowdStrike is provided by default in CKS clusters for endpoint protection.

This defense-in-depth approach supports compliance with regulatory requirements (GDPR, HIPAA, CCPA), backed by enforceable, technical controls for trustworthy, secure AI adoption at scale.