Introduction to CoreWeave Kubernetes Service (CKS)

CoreWeave Kubernetes Service (CKS) offers a managed Kubernetes service that lets you run clusters on bare metal servers in CoreWeave Cloud. CKS is built to offer granular control, high performance, enhanced security, and high reliability, as well as high visibility into cluster metrics. CKS is designed to serve as a managed workload orchestration solution for High-Performance Computing (HPC) workloads, taking advantage of bare-metal performance and HPC networking. On CKS, clusters use technology to provide enhanced isolation and performance. Each CKS cluster operates within its own private Virtual Private Cloud (VPC). This offers a level of security and acceleration among managed Kubernetes solutions that is unique among major cloud providers.

How CKS is different

The following sections describe how CKS differs from other managed Kubernetes services.

High-performance Kubernetes on high-performance compute

CKS is engineered to orchestrate and serve highly computationally intensive workloads, designed for model training, inference, and HPC tasks.

CKS harnesses CoreWeave’s extensive fleet of high-performance GPU and CPU servers and advanced HPC networking infrastructure to maximize throughput and minimize latency.
CKS runs Kubernetes directly on bare metal Nodes, without a hypervisor. Customer clusters don’t run Virtual Machines.
CKS clusters use NVIDIA® BlueField® Data Processing Units () attached to each Node to offload processing tasks. This allows the Node to focus primarily on executing application workloads.
CKS Nodes are stateless. At each boot, they load a clean Operating System image. This allows Nodes to be rapidly scaled and re-provisioned, and ensures that all Nodes use the correct software versions.
Clusters integrate with an InfiniBand fabric that features a non-blocking, Fat-Tree architecture with NVIDIA® Scalable Hierarchical Aggregation and Reduction Protocol (SHARP)™ optimizations. These optimizations are capable of supporting the demand of tasks like training Large Language Models (LLMs) across clusters comprised of thousands of GPU instances.

Do not install the NVIDIA GPU Operator on CKS clusters

CoreWeave manages the NVIDIA GPU Operator on your behalf. Do not install the NVIDIA GPU Operator on CKS clusters. Doing so conflicts with the platform-managed deployment and is not supported.

Hyper-secure infrastructure

CKS is engineered with a strong emphasis on security.

The DPU-based architecture used by CKS enables advanced security features including custom network and security policies, dedicated Virtual Private Clouds (VPCs), and privileged network access controls.
CoreWeave houses physical infrastructure in high-surveillance data centers, providing comprehensive security. Within the CKS platform, your data is operationally isolated, ensuring complete separation and confidentiality.

Data Plane flexibility

CKS provides a managed Control Plane alongside configurable Data Plane elements. This strikes a balance between operational simplicity and the flexibility to tailor the environment to specific workload requirements.

CKS clusters include a pre-installed Container Storage Interface (CSI) and Container Network Interface (CNI) to standardize storage and networking across container environments.
Unlike other managed Kubernetes services, CKS extends your control beyond the Control Plane by allowing direct management of Data Plane components. This approach minimizes your infrastructure management burden while still offering extensive customization possibilities.

Additional advantages

The following sections describe additional advantages that CKS offers.

Privilege and access management

CKS offers comprehensive privilege management, balancing managed solutions with the flexibility of self-managed ones. This gives you the freedom to choose the most suitable option for your security needs. CKS Managed Auth provides a managed solution backed by role-based access control (RBAC) for cluster-wide access control and organization management. CKS supports third-party RBAC providers for granular in-cluster permissions.

Metrics and observability

CKS provides comprehensive support for auditing and compliance requirements. You can also integrate performance and data metrics with your existing infrastructure, enabling a flexible observability solution.

For enhanced monitoring, CKS grants access to the CKS cluster API server, enabling Control Plane audit logs.
This extensive access also lets you deploy your own metrics stacks with custom collection tools, including DaemonSet or Loki logging.
You can also monitor Node logs, GPU utilization, and other Node-level metrics through CoreWeave Grafana.

​How CKS is different

​High-performance Kubernetes on high-performance compute

​Do not install the NVIDIA GPU Operator on CKS clusters

​Hyper-secure infrastructure

​Data Plane flexibility

​Additional advantages

​Privilege and access management

​Metrics and observability

How CKS is different

High-performance Kubernetes on high-performance compute

Do not install the NVIDIA GPU Operator on CKS clusters

Hyper-secure infrastructure

Data Plane flexibility

Additional advantages

Privilege and access management

Metrics and observability