CoreWeave Kubernetes Service (CKS) provides high-performance computing and a flexible Kubernetes experience by running Kubernetes on bare metal.Documentation Index
Fetch the complete documentation index at: https://docs.coreweave.com/llms.txt
Use this file to discover all available pages before exploring further.
Kubernetes on bare metal
CKS runs Kubernetes directly on bare metal GPU Nodes without any intermediary software, such as a hypervisor or virtual machine. This direct hardware access is the defining difference of CKS: Nodes in your cluster access their GPUs without any virtualization overhead.
- Better performance: Virtualization layers aren’t competing for hardware resources.
- Increased efficiency: Quicker data access and processing.
- Reduced latency: Applications have direct access to the underlying hardware.
- Better observability: More fine-grained access to kernel logs and Node behavior.
CKS clusters
CKS clusters grant you privileged access to Kubernetes resources. You run workloads on your own dedicated hardware. These servers operate independently of other operations and are connected via isolated Virtual Private Clouds (VPCs). Host permissions to Nodes are also securely managed within VPCs, so you don’t need to manage the underlying infrastructure. CoreWeave Kubernetes Service leverages clusters to provide efficient, scalable, secure Kubernetes workload orchestration. Containers access server hardware directly, making CKS ideal for model training and VFX rendering. The architecture and components are purpose-built to offer maximum efficiency, minimal latency, and extensive customization.Cluster architecture
clusters are created by leveraging two planes:- the Managed Data Plane, and
- the Managed Control Plane.
Learn more about the specific components within the Managed Control Plane and Managed Data Plane, and how they support CKS cluster performance.
Region-level image proxy
CoreWeave operates a region-level registry proxy that accelerates container image pulls and reduces exposure to public registry rate limits for your cluster’s Nodes. The proxy has the following advantages:- Caches some image layers to improve startup time and lower bandwidth consumption
- Caches image metadata (manifests) to mitigate external rate limiting
Avoid mutable tags
Mutable tags behave as if they are immutable. For example, using:latest will not pull the latest image since the proxy serves only the cached manifest, which might not be the latest. Therefore, avoid reusing tags like :latest across different image builds; doing so can result in Nodes pulling an older cached manifest until the cache expires.
To load the correct image, use immutable tags or pin by digest for production workloads to avoid “sticky” results from proxy metadata caching.
Example: digest pinning
Pinning by digest ensures every Node pulls the exact same artifact, independent of tag changes.Advantages
CKS clusters feature…- Exceptional security. CKS clusters exist in their own VPCs, created by on-site networking. There’s built-in isolation from start to finish. Each server is equipped with a DPU (Data Processing Unit) that aids in VPC orchestration, further enhancing security and performance.
- Ideal for compute intensive workloads. Our clusters provide Reserved Node Pools, ideal for running large jobs with static Node requirements, such as model training. Nodes are available when and where they’re needed.
- Deep visibility into systems. CoreWeave Kubernetes Service allows you to run your own metrics exporters, gather detailed logs, and leverage low-level access for a high degree of insight into system health.
- Highly observable infrastructure. CKS oversees all aspects of your compute infrastructure, taking care of the heavy lifting while still allowing you to manage and observe resources at the Control Plane level.