Kubernetes on bare metal
The following sections describe what running Kubernetes on bare metal means and the benefits it provides. CKS runs Kubernetes directly on bare metal GPU Nodes without any intermediary software, such as a hypervisor or virtual machine. This direct hardware access is the defining difference of CKS. Nodes in your cluster access their GPUs without any virtualization overhead.
- Better performance: Virtualization layers aren’t competing for hardware resources.
- Increased efficiency: Quicker data access and processing.
- Reduced latency: Applications have direct access to the underlying hardware.
- Better observability: More fine-grained access to kernel logs and Node behavior.
CKS clusters
CKS clusters grant you privileged access to Kubernetes resources. You run workloads on your own dedicated hardware. These servers operate independently of other operations and connect through isolated Virtual Private Clouds (VPCs). Host permissions to Nodes are also securely managed within VPCs, so you don’t need to manage the underlying infrastructure. CKS uses clusters to provide efficient, scalable, secure Kubernetes workload orchestration. Containers access server hardware directly, which suits workloads such as model training and VFX rendering. The architecture and components are designed for efficiency, low latency, and customization.Cluster architecture
The following sections describe the two planes that make up a CKS cluster and the role each plays in cluster operation. clusters use two planes:- The Managed Data Plane
- The Managed Control Plane
Learn more about the specific components within the Managed Control Plane and Managed Data Plane, and how they support CKS cluster performance.
Region-level image proxy
CoreWeave operates a region-level registry proxy that accelerates container image pulls and reduces exposure to public registry rate limits for your cluster’s Nodes. The proxy has the following advantages:- Caches some image layers to improve startup time and lower bandwidth consumption.
- Caches image metadata (manifests) to mitigate external rate limiting.
Avoid mutable tags
To load the correct image, you must use immutable tags or pin by digest for production workloads to avoid “sticky” results from proxy metadata caching.Digest pinning example
Pinning by digest ensures every Node pulls the exact same artifact, independent of tag changes.Advantages
The following list summarizes the key benefits CKS clusters offer for compute-intensive workloads:- Strong security: CKS clusters exist in their own VPCs, created by on-site networking. Isolation is built in from start to finish. Each server is equipped with a Data Processing Unit (DPU) that aids in VPC orchestration, further supporting security and performance.
- Suited to compute-intensive workloads: CoreWeave clusters provide Reserved Node Pools for running large jobs with static Node requirements, such as model training. Nodes are available when and where they’re needed.
- Visibility into systems: CKS lets you run your own metrics exporters, gather detailed logs, and use low-level access for insight into system health.
- Observable infrastructure: CKS oversees your compute infrastructure, taking care of the heavy lifting while still allowing you to manage and observe resources at the Control Plane level.