CoreWeave Kubernetes Service (CKS) offers a managed Kubernetes service that allows you to run clusters on bare metal servers in CoreWeave Cloud. CKS is built from the ground up to offer granular control, high performance, enhanced security, and high reliability, as well as high visibility into cluster metrics. CKS is designed to serve as a managed workload orchestration solution for High-Performance Computing (HPC) workloads, leveraging bare-metal performance and HPC networking. On CKS, clusters utilize technology to provide enhanced isolation and performance. Each CKS cluster operates within its own private Virtual Private Cloud (VPC), offering a level of security and acceleration among managed Kubernetes solutions that is unique among major Cloud providers.Documentation Index
Fetch the complete documentation index at: https://docs.coreweave.com/llms.txt
Use this file to discover all available pages before exploring further.
How is CKS different?
High-performance Kubernetes on high-performance compute
CKS is engineered specifically to orchestrate and serve the world’s most computationally intensive workloads, designed for model training, inference, and HPC tasks.- CKS harnesses CoreWeave’s extensive fleet of high-performance GPU and CPU servers and advanced HPC networking infrastructure to maximize throughput and minimize latency.
- CKS runs Kubernetes directly on bare metal Nodes, without a hypervisor - customer clusters do not run Virtual Machines.
- CKS clusters use NVIDIA® BlueField® Data Processing Units () attached to each Node to offload processing tasks, allowing the Node to focus primarily on executing application workloads.
- CKS Nodes are stateless - at each boot, they load a clean Operating System image. This allows Nodes to be rapidly scaled and re-provisioned, and ensures that all Nodes use the correct software versions.
- Clusters integrate seamlessly with an InfiniBand fabric, featuring a non-blocking, Fat-Tree architecture with NVIDIA® Scalable Hierarchical Aggregation and Reduction Protocol (SHARP)™ optimizations capable of supporting the demand of tasks like training Large Language Models (LLMs) across clusters comprised of thousands of GPU instances.
Do not install the NVIDIA GPU Operator on CKS clusters
Hyper-secure infrastructure
CKS is engineered with a strong emphasis on security.- The DPU-based architecture used by CKS enables advanced security features including custom network and security policies, dedicated Virtual Private Clouds (VPCs) and privileged network access controls.
- CoreWeave houses physical infrastructure in high-surveillance data centers, providing comprehensive security. Within the CKS platform, your data is operationally isolated, ensuring complete separation and confidentiality.
Data Plane flexibility
CKS provides a managed Control Plane alongside configurable Data Plane elements, striking a balance between operational simplicity and the flexibility to tailor the environment to specific workload requirements.- CKS clusters include a pre-installed Container Storage Interface (CSI) and Container Network Interface (CNI) to standardize storage and networking across container environments.
- Unlike other managed Kubernetes services, CKS extends your control beyond the Control Plane by allowing direct management of Data Plane components. This approach minimizes your infrastructure management burden while still offering extensive customization possibilities.
Additional advantages
Privilege and access management
CKS offers comprehensive privilege management, balancing managed solutions with the flexibility of self-managed ones, giving you the freedom to choose the most suitable option for your security needs.- CKS Managed Auth provides a managed RBAC-backed solution for cluster-wide access control and organization management. Third-party RBAC providers are supported for granular in-cluster permissions.
Metrics and observability
CKS provides comprehensive support for auditing and compliance requirements. You can also integrate performance and data metrics with your existing infrastructure, enabling a flexible observability solution.- For enhanced monitoring, CKS grants access to the CKS cluster API server, enabling Control Plane audit logs.
- This extensive access also allows you to deploy your own metrics stacks with custom collection tools, including DaemonSet or Loki logging.
- You can also monitor Node logs, GPU utilization, and other Node-level metrics through CoreWeave Grafana.