Skip to main content

Documentation Index

Fetch the complete documentation index at: https://docs.coreweave.com/llms.txt

Use this file to discover all available pages before exploring further.

The changelog encompasses any and all changes to CoreWeave products and new products or features. This page surfaces all customer-facing product changes with links to relevant documentation and more detailed release notes, where applicable.
For changelog entries prior to December 2024, please see the CoreWeave Classic documentation.
May 14, 2026
UpdatePlatform
The supported GPU drivers have changed. See Supported driver versions for the full compatibility table.
April 30, 2026
UpdatePlatform
The supported GPU drivers have changed. See Supported driver versions for the full compatibility table.
April 27, 2026
UpdateObservability
The Cabinet Wrangler and Cabinet Visualizer dashboards now display rack name as the primary label for filtering and identification, replacing NVLink domain. Both metrics remain available in the dashboards. See the Cabinet Wrangler release note for more information.
April 21, 2026
UpdateCKS
Version 1.18.0 of the CoreWeave cert-manager Helm chart switches the bundled Let’s Encrypt ClusterIssuers from HTTP01 to DNS01 challenges, resolved through a CoreWeave webhook at acme.coreweave.com. An ingress controller is no longer required for certificate issuance, and wildcard certificates are now supported. See the cert-manager DNS01 release notes for more information.
April 21, 2026
NewBilling
The Cloud Console now includes a native Billing insights page that shows billable usage, measured usage, and exclusions across your workloads without leaving the console. See the Billing insights release notes for more information.
April 8, 2026
NewObservability
CoreWeave Grafana now opens to a home page that gives you an immediate view of your environment without navigating to individual dashboards. The home page includes the latest platform announcement, an environment overview with GPU node counts and allocation, quick links to top dashboards, and a live feed from the CoreWeave status page. See the CoreWeave Grafana home page release notes for more information.
April 7, 2026
UpdateStorage
CoreWeave AI Object Storage’s Local Object Transport Accelerator (LOTA) now runs on CPU Nodes in addition to GPU Nodes. CPU-only CKS clusters can now use the LOTA endpoint for accelerated object storage access, and cache capacity scales with cluster size across all Node types. See the LOTA on CPU Nodes release notes for more information.
April 3, 2026
NewPlatform
CoreWeave Alerts is now available, delivering real-time notifications about your clusters, deployments, and operations. Route alerts to Slack through an OAuth integration or incoming webhook, or to any HTTPS endpoint using a generic webhook with optional signature verification. Setting up these integrations requires the new Notifications Admin IAM role, while viewing the notifications in Cloud Console requires the new Notifications Viewer role. See the CoreWeave Alerts release notes for more information.
March 31, 2026
NewInference
CoreWeave Inference is now available, providing multiple ways to deploy and serve AI models on CoreWeave GPU infrastructure. Serverless Inference lets you deploy models without managing infrastructure. Dedicated Inference lets you deploy custom model weights on dedicated GPU infrastructure with OpenAI-compatible API endpoints, using runtimes such as vLLM or SGLang. Inference on CKS gives you full control over your inference deployment stack using CoreWeave Kubernetes Service. See the CoreWeave Inference release notes for more information.
March 31, 2026
NewCKS
Two new tutorials are available for running GPU workloads in interactive marimo notebooks on CKS: a JAX training tutorial that streams live loss charts to the browser as training progresses, and a TensorRT-LLM inference tutorial with an interactive model picker supporting models including TinyLlama, Phi-3.5-mini, Mistral 7B, and Llama-3.1 8B FP8. Both tutorials use the kubectl-marimo CLI plugin and require the marimo operator installed on your cluster. See the marimo JAX and TensorRT-LLM release notes for more information.
March 30, 2026
NewStorage
Documentation is now available for Dedicated VAST Storage, CoreWeave’s single-tenant VAST clusters co-located with your GPU infrastructure. Each cluster is physically isolated to a single tenant with direct access to the VAST Management System (VMS), multi-protocol support (NFS, S3, and SQL), and advanced data services including VAST Catalog, DataBase, DataEngine, and cross-cluster replication. See the Dedicated VAST Storage release note for more information.
March 26, 2026
NewSecurity
Support Access Management gives you visibility and control over CoreWeave employee access to your CKS environment. All CoreWeave support access is request-based, requiring approval from a member of your organization with the Access Request Approver role. Approved access automatically expires after 8 hours, and all sessions are fully auditable. Teleport audit logs and Kubernetes audit logs can be forwarded automatically through CoreWeave Telemetry Relay. See the Support Access Management release notes for more information.
March 23, 2026
NewStorage
CoreWeave AI Object Storage now supports conditional writes. Attach HTTP precondition headers (If-None-Match or If-Match) to PutObject, CompleteMultipartUpload, and CopyObject requests to make writes atomic and prevent accidental overwrites without client-side locking. See the conditional writes release notes for more information.
March 20, 2026
NewPlatform
CoreWeave Omni is now available. CoreWeave Omni is a cloud-as-a-service model in which CoreWeave deploys and operates the full CoreWeave cloud stack inside your data center. You retain ownership of the facility and hardware while CoreWeave delivers a managed region, the CoreWeave Cloud Platform, and day-to-day operational management. For availability, sizing, and pricing, contact your CoreWeave account team. See the CoreWeave Omni release notes for more information.
March 20, 2026
UpdateObservability
Logs are now available from Super Regional data sources in US East, US West, and EU South, alongside the Global logs source. Super Regional Grafana data sources let you query application and platform logs in the region where they were generated, with separate Super Regional sources for CKS audit logs. See Metrics and logs data sources for endpoints, data source names, and when to use Global versus Super Regional queries.
March 20, 2026
NewCKS
A new tutorial is available for deploying Spegel, a stateless peer-to-peer OCI registry mirror, on CKS. Spegel speeds up container image pulls by sharing image layers across cluster nodes using a distributed hash table, reducing external registry traffic. CKS clusters are pre-configured with the required containerd settings. See the Spegel release note for more information.
March 16, 2026
NewInstances
B300 (InfiniBand) instances are now available. These instances are powered by eight NVIDIA B300 Blackwell GPUs and deliver higher performance per GPU, 50% more GPU memory, and double the InfiniBand speed compared to B200 systems. B300 instances are available in US-EAST-13A and US-WEST-01A. Contact us for pricing.
March 16, 2026
UpdateCKS
The ncore image and GPU driver compatibility table has been updated. B300 (InfiniBand) is now included with latest supported ncore image ncore-image-2.33.0 and compatible GPU drivers 580 and 590. See GPU driver management and Update GPU driver version for details.
March 16, 2026
NewStorage
CoreWeave AI Object Storage now supports pre-staging objects into the LOTA cache. A single HeadObject call triggers LOTA to fetch the complete object from backend storage and place it in the distributed NVMe cache, eliminating cold-start latency for training, inference, and checkpoint-restore workloads. See the pre-stage cache release note for more information.
March 12, 2026
NewSUNK
SUNK v7.3.0 has been released. This release enables explicit naming of Slurm nodes through configurable Kubernetes labels. Additionally, nodes in drain with the duplicate job id reason will now be picked up by the automatic HPC verification workflow. This release also adds the ability to capture logs from slurmd and slurmstepd, and includes several bug fixes. For more information, see the SUNK v7.3.0 release note.
March 10, 2026
NewCKS
Spot Node Pools are now available in CKS, providing pay-as-you-go access to high-performance, preemptible compute resources without long-term commitments. A new Capacity plans overview page compares all four CKS capacity models side by side. See the Spot Node Pools release note for more information.
February 23, 2026
NewCKS
A new tutorial is available for deploying an OCI container registry on CKS using Zot, with image storage in CoreWeave AI Object Storage and LOTA for in-cluster performance. CoreWeave has validated Zot against the OCI Distribution Specification conformance suite. See the Deploy a container registry on CKS with Zot release notes for more information.
February 4, 2026
NewCKS
CKS Node Pools now support comprehensive configuration management with staged updates, rollback capabilities, and full visibility into configuration history.Node Pool configurations define the desired state for Nodes, including ncore image, GPU driver, and Kubernetes versions. The Node Pool Status now tracks the active configuration, staged pending configurations awaiting user approval, and a history of all applied configurations.Use the CoreWeave Intelligent CLI (cwic) to upgrade Node Pools to pending configurations or roll back to previous configurations. See the Node configuration visibility and management release notes for more information.
January 30, 2026
NewStorage
CoreWeave AI Object Storage now supports the RenameObject API for atomic, server-side object renaming.The RenameObject API provides atomic rename operations within the same bucket, completing in milliseconds regardless of object size. Unlike copy-and-delete workflows, RenameObject updates metadata only, no data is copied and no temporary storage duplication occurs.For more information, see the RenameObject release notes.
January 29, 2026
UpdateObservability
The Usage by Product and Zone dashboard now shows two usage layers for each resource type, making the difference between total metered usage and billing-basis usage visible.The Usage by Product and Zone dashboard in CoreWeave Observe™ now displays two sets of cards for each resource type (GPUs, CPUs, Storage, IP Addresses):
  • Measured usage: All metered usage before any exclusions are applied.
  • Net usage: Usage after CoreWeave-level exclusions and adjustments. This is the basis for billing, subject to your contract terms (rates, discounts, and credits).
For more information, see the Usage by Product and Zone dashboard release notes.
January 27, 2026
UpdateStorage
Distributed File Storage documentation now includes instructions for rebinding PVCs to new namespaces using the rebind-pvc.sh utility script. This automated tool simplifies making persistent volumes available across different namespaces. See the PVC namespace rebinding release notes for more information.
January 20, 2026
NewSUNK
A new tutorial is now available for deploying self-hosted GitHub Actions runners on SUNK.The Run GitHub Actions Runners on SUNK guide walks through the process of using the Actions Runner Controller (ARC) to create and manage runners for both CPU and GPU workloads. See the release notes for more information.
January 13, 2026
UpdateCKS
CKS now supports Kubernetes v1.35.For all new CKS clusters, v1.35 is now the default version.Support for v1.32 for new clusters has been deprecated, but existing clusters running v1.32 will continue to work. See Cluster Components for the full list of supported versions.
January 12, 2026
UpdateCKS
CKS documentation now includes a tutorial on running marimo notebooks. The tutorial covers deploying and managing marimo notebooks on CKS and connecting them to AI Object Storage for data access. To get started, see Run marimo notebooks on CKS.
January 8, 2026
UpdateSUNK
SUNK v7.2.0 has been released. This release adds a default timeout for MySQL probes and adds the ability to configure image registries. Images are now published to the new default registry at ghcr.io/coreweave/slurm-containers; images are no longer published to docker.artifacts.coreweave.com. See the SUNK v7.2.0 release notes for more information.
December 26, 2025
UpdateNetworking
The Direct Connect documentation has been updated with new CoreWeave DX locations in North America and Europe. For more information, see CoreWeave DX locations.
December 22, 2025
UpdateSUNK
SUNK v7.1.1 has been released. This release improves error handling when syncing users, enables reporting for nil plugin types, and fixes other bugs within the Slurm chart.
December 19, 2025
UpdateNetworking
The Direct Connect documentation has been enhanced with more comprehensive information about Dedicated and Virtual DX options. Updates include:
  • Detailed comparison table between Dedicated DX and Virtual DX connectivity options
  • New sections on distinguishing features for each connectivity option
  • Updated connection process information for Virtual DX through Equinix Fabric and Megaport
  • Added information about on-demand provisioning and network expansion
For more information, see Direct Connect.
December 17, 2025
UpdateObservability
December 17, 2025
UpdateStorage
Non-admin users can now perform AI Object Storage actions in the Cloud Console when granted specific permissions via organization access policies. See the AI Object Storage Console Access release notes for more information, and the Console Permissions Reference for the list of permissions required.
December 10, 2025
UpdateSUNK
SUNK v6.10.0 has been released. This release adds additional metrics, updates the Slurm container version, updates queue counts to include array jobs, and includes multiple bug fixes. For more information, see SUNK releases v6.10.0 and v7.1.0.
December 10, 2025
UpdateSUNK
SUNK v7.1.0 has been released. This release adds additional metrics, updates queue counts to include array jobs, and includes multiple bug fixes. For more information, see SUNK releases v6.10.0 and v7.1.0.
December 9, 2025
UpdateObservabilitySUNK
CoreWeave introduces three new features:
  • CoreWeave Mission Control Agent is in Private Preview.
  • Telemetry Relay is now Generally Available.
  • Mission Control’s GPU Straggler Detection is now in Private Preview.
For more information, see the Mission Control and Telemetry Relay release note.
November 20, 2025
UpdateCKS
CKS no longer maintains the allocatedNodes Node PoolStatus. See Node PoolStatus for the list of Node PoolStatus fields.
November 20, 2025
ReleaseSecurity
IAM Access Policies are now available. This feature allows you to control access to resources in the CoreWeave platform. For more information, see IAM Access Policies.Automated User Provisioning (AUP) is now available. This feature allows you to synchronize users and groups from an Identity Provider (IdP) to the CoreWeave platform. For more information, see Automated User Provisioning.SUNK User Provisioning (SUP) is now available. This feature allows you to synchronize users and groups from an Identity Provider (IdP) or directly from CoreWeave IAM to a SUNK cluster. For more information, see SUNK User Provisioning.OIDC Workload Identity Federation is now available. This feature allows you to authenticate CKS workloads to external cloud services and AI Object Storage using OIDC tokens. For more information, see OIDC Workload Identity Federation.
November 19, 2025
UpdatePlatform
Flexify Inc. has been added to CoreWeave’s Sub-processors list.
November 13, 2025
UpdateSUNK
SUNK v7.0.0 has been released. This release introduces a major version upgrade to Slurm 25.05.3, more consistent node scaling behavior, new default Slurm chart configurations, improved memory management, and bug fixes. For more information, see SUNK v7.0.0 release notes.
November 12, 2025
UpdateObservability
CoreWeave Observe™ documentation now lists Node alerts in the Kubernetes Training Jobs and Slurm Job Metrics pages.You can also view the list of Node alerts in the Node Pool reference documentation.
November 11, 2025
UpdateCKS
CKS no longer maintains the following Node Pool conditions:
  • Accepted
  • Allocated
  • SufficientCapacity
For the list of Node conditions, go to Node Pool Conditions.
November 6, 2025
UpdateCKS
CKS can enable workloads to use IMEX with Dynamic Resource Allocation (DRA). This is a limited availability feature. To learn more, go to Enabling IMEX Compute Domains with Dynamic Resource Allocation. See the IMEX with DRA release notes for more information.
October 31, 2025
FixSUNK
SUNK v6.9.1. has been released. This is a patch release that fixes an issue related to excessive reconfigures triggered by watching the topology.conf file. This patch also improves handling of large numbers of jobs in the completing state, and increases the time a job is allowed to be in the completing state to prevent early termination of completing jobs. For more information, see SUNK v6.9.1 release notes.
October 31, 2025
UpdateStorage
CoreWeave AI Object Storage Usage-Based Billing is now available. This introduces a third tier of storage pricing for AI Object Storage: Hot, Warm, and Cold. Usage-Based Billing replaces the previous Automated Archive feature, and is enabled by default starting October 31, 2025.
October 31, 2025
UpdateStorage
CoreWeave AI Object Storage Inventory Reports are now available. This feature allows you to view and download reports on your AI Object Storage usage and inventory. Learn how to generate inventory reports for your AI Object Storage buckets.
October 22, 2025
UpdateSUNK
SUNK documentation now contains a tutorial on running torchforge on SUNK. To view and complete the tutorial, go to Run torchforge on SUNK.
October 20, 2025
UpdateSUNK
SUNK now includes a new tutorial on running Ray on SUNK. To access and complete the tutorial, go to Run Ray on SUNK.
October 17, 2025
UpdatePlatform
General Access Region EU-SOUTH-04 in Alava, Spain is now available and supports CoreWeave AI Object Storage.
October 15, 2025
UpdateSUNK
SUNK v6.9.0 has been released. This release introduces automatic job requeueing during rolling upgrades, new configuration options for slurmrestd, improved resource optimization, and a new command alias. For more information, see SUNK v6.9.0 release notes.
October 15, 2025
UpdatePlatform
Dedicated Access Region US-EAST-11 in North Carolina, USA is now available.
October 9, 2025
UpdateStorage
CoreWeave AI Object Storage quota limits have been increased to 100 TiB per Availability Zone.
October 1, 2025
UpdatePlatform
Dedicated Access Region US-CENTRAL-04 in Texas, USA is now available.
October 1, 2025
UpdateStorage
CoreWeave AI Object Storage Automated Archive is now available. This feature automatically archives inactive objects after 30 days.
September 17, 2025
UpdateSUNK
SUNK v6.8.0 has been released. This release adds a cleanup script for jobs stuck in a completing state, introduces a cache-dropper sidecar for compute pods, updates SCIM parameters to filter inactive users, fixes a race condition in slurmd startup when using cgroupv2, and enhances the syncer to more appropriately issue a scontrol reconfigure when nodes are added, as well as a number of bug fixes. For more information, see SUNK v6.8.0 release notes.
September 12, 2025
UpdatePlatform
CoreWeave Security documentation has been added. See CoreWeave Security for more information.
September 8, 2025
UpdatePlatform
Dedicated Access Region CA-EAST-01 in Ontario, Canada is now available and supports CoreWeave AI Object Storage.
September 5, 2025
UpdateCKS
CKS documentation now includes instructions for setting up and running Kubeflow on CKS.
September 3, 2025
UpdateCKS
CKS now supports Kubernetes v1.34, bringing the latest features and security updates to CKS clusters. v1.34 is now the default version for all new CKS clusters. Support for v1.31 for new clusters has been deprecated, but existing clusters running v1.31 will continue to work.
August 29, 2025
UpdateCKS
All newly created CKS clusters will have Cilium v1.18.1 as their default Container Network Interface (CNI).
August 28, 2025
UpdateCKS
CKS documentation now contains instructions for using third-party frameworks on CKS. For more information, go to Introduction to Third-Party Frameworks.
August 28, 2025
UpdateObservability
CoreWeave Observe™ includes two new dashboards: Slurm Block Topology and Kueue Metrics. See CoreWeave Observe™: Slurm Block Topology and Kueue Metrics for more information.
August 26, 2025
UpdateCKS
CKS now supports Kubernetes v1.33, bringing the latest features and security updates to your Kubernetes clusters. This release also includes cgroup v2 as the default control group version. See the August 26, 2025 release notes for detailed information.
August 25, 2025
UpdateCKS
The CKS External Hostname Controller has changed the way it reports DNS names for services running on CKS. See CKS External Hostname Controller changes for more information.
August 19, 2025
UpdateCKS
In preview: CKS now offers cluster autoscaling. For more information, see Autoscale Node Pools.
August 19, 2025
UpdateInstances
CoreWeave’s GB300 NVL72-powered cloud instances are now available in select Regions.
August 15, 2025
UpdateCKS
CKS now includes a new tutorial on scaling vLLM inference workloads. To access and complete the tutorial, go to Deploy vLLM for Inference.GPU driver management features are now available in CKS Node Pools, allowing you to specify and target specific GPU driver versions for your workloads. This feature provides better control over driver compatibility and enables homogeneous driver environments across your clusters. See GPU driver management features release notes for detailed information.
August 14, 2025
UpdateSUNK
SUNK v6.7.0 has been released, introducing support for CUDA 12.9, enhanced SCIM and nsscache functionality - such as filtering and home directory overrides - HDF5 plugin support, and various bug fixes for directory service integration, GPU detection, and Slurm task management. See SUNK v6.7.0 release notes for more information.
August 11, 2025
UpdateObservability
CoreWeave Observe™ includes three new dashboards and reorganized folders. See CoreWeave Observe™: New dashboards and reorganized folders for more information.
August 6, 2025
UpdateStorage
The CoreWeave Terraform provider now supports AI Object Storage, enabling infrastructure-as-code management of buckets, policies, lifecycle configurations, and versioning. See CoreWeave AI Object Storage Terraform Provider Support for more information.
August 4, 2025
UpdateStorage
CoreWeave AI Object Storage now supports server-side encryption with customer keys (SSE-C), providing enhanced data security and control for stored objects. See CoreWeave AI Object Storage SSE-C support for detailed information.
July 31, 2025
UpdateCKS
CKS now supports Kubernetes upgrades to take advantage of the latest features and security updates. See CKS Kubernetes upgrade support for detailed information.Default control group version changed to v2 for CKS clusters targeting Kubernetes v1.33, aligning with upstream Kubernetes support policy. See CKS Kubernetes upgrade support for more information.
July 31, 2025
UpdateObservability
CoreWeave Telecaster™ is now available in CoreWeave Observe™, providing fully-managed log and metric forwarding to external destinations. See the Telecaster release notes for more information.
July 12, 2025
FixSUNK
Added default value for pool size in Helm charts to prevent configuration issues.Segment-calc now skips Nodes already in DRAIN state to prevent skewed capacity charts.
July 12, 2025
UpdateSUNK
SCIM provisioning for SUNK is now available via nsscache. This enables automated, standards-based user and group management from your IdP to CoreWeave clusters. See SUNK v6.6.0 release notes for detailed information.
July 12, 2025
UpdateObservability
Slurm job and Node outputs now include direct links to their corresponding Grafana dashboards, giving operators one-click visibility into live job metrics. See CoreWeave Grafana for more information.
July 12, 2025
UpdateInstances
Added two new compute definitions: rtxp8x (NVIDIA RTX Pro 6000 Blackwell Server Edition). See Instances and the RTX Pro 6000 release notes for detailed specifications.
July 12, 2025
ChangeObservability
Slurm metrics now carry the slurm_cluster label, simplifying multi-cluster dashboards. See CoreWeave Grafana for monitoring capabilities.MySQL exporter metrics are automatically scraped and ingested. See CoreWeave Logs and Metrics for querying capabilities.
July 12, 2025
ChangeSUNK
NCCL-test base image updated to nccl-tests/d5a135d, ensuring compatibility with the latest CUDA toolchain.CoreWeave IAM is now fully integrated with the Slurm Helm chart. See SUNK for more information.Optional SSSD mounts are intelligently gated, reducing unnecessary container overhead. See Directory Services for configuration details.
July 12, 2025
ChangePlatform
Nodes that stay “busy” inside a Reservation are automatically re-evaluated after 30 minutes, reducing orphaned allocations. See Node Lifecycle for more information.
July 12, 2025
FixPlatform
Disabled NVIDIA device-plugin health checks that could cause false Nodedrains.Multiple operator dependencies updated (chi v5, viper v2, Go Slurm) to incorporate upstream security and stability patches.
July 12, 2025
FixObservability
PodMonitor and VMPodScrape templates now use consistent relabeling syntax.
July 12, 2025
FixInstances
Removed the InfiniBand requirement for A100-based Nodes where it is not present. See Instances for A100 specifications.
July 11, 2025
UpdateSUNK
SUNK v6.6.0 has been released with SCIM provisioning via nsscache, enhanced monitoring with dashboard links, improved node reconciliation, new GPU compute definitions (rtxp8x), metrics improvements, segment-calc script enhancements, and base image upgrades. This release also includes automatic scraping of MySQL metrics, fixes for metrics labeling, and improved segment-calc handling for DRAIN nodes. See SUNK v6.6.0 release notes for detailed information.
July 9, 2025
UpdateInstances
RTX Pro 6000 Blackwell Server Edition cloud instances are now available in select CoreWeave Availability Zones. These instances combine NVIDIA’s RTX Pro 6000 Blackwell Server Edition with CoreWeave’s managed services, observability, and high-performance networking. See RTX Pro 6000 Blackwell Server Edition release notes for detailed information.
July 9, 2025
ChangeCKS
Encryption at rest for Kubernetes Secrets is now enabled by default in all CoreWeave Kubernetes Service (CKS) clusters. This feature uses a KMS-backed integration to encrypt etcd data automatically. See the CKS encryption at rest release notes for more information.
July 7, 2025
UpdateAPIs
New Kubernetes API endpoint for unmanaged auth is now available in CKS, enabling custom authentication workflows. See the Unmanaged auth API release notes for more information.
July 7, 2025
UpdateCKS
Control Plane Node Pools are no longer provisioned in CKS clusters. These changes improve cluster provisioning speed and reliability while enabling custom authentication workflows. See the July 7, 2025 release notes for detailed information.
June 30, 2025
ChangeCKS
Node Pool condition transition improvements for better cluster management and monitoring. See Node Pool condition transition release notes for detailed information.
June 17, 2025
UpdateSUNK
Support for NVSHMEM and GDRCopy is now available, enabling high-performance GPU-to-GPU communication. See NVSHMEM and GDRCopy release notes for detailed information.
June 15, 2025
UpdateCKS
CKS cluster management improvements with enhanced Node Pool management. See CKS Clusters for more information.
June 13, 2025
UpdateSUNK
SUNK v6.5.0 has been released with major improvements to monitoring, system stability, and resource management. This release introduces enhanced dashboard integration for Slurm jobs and nodes, improved metrics labeling, automatic MySQL metrics scraping, and new compute definitions. It also includes fixes for NVIDIA device-plugin health checks, segment-calc handling for DRAIN nodes, and updates to operator dependencies. See SUNK v6.5.0 release notes for detailed information.Slurm upgraded to 24.11.05, bringing in the latest upstream fixes and enhancements. See SUNK for more information.NCCL bumped to 2.26.5, improving GPU communication performance.Added new CUDA runtime images for 12.8.1 and 12.9.0.Introduced nsscache as an alternative option to SSSD for user caching. See Directory Services for configuration details.Enabled timeout-based forced deletion of compute pods (disabled by default), allowing cleanup even when jobs are still running.Backported Slurm 25.05 SlurmdSpecOverride and container awareness features to correctly configure CPUSpecList and MemSpecList, so static pod workloads no longer enter an invalid state after scontrol reconfigure.Enhanced controller.etcConfigMap to accept either a single string or a list of multiple ConfigMaps.Added the segment-calc script for visualizing block-topology segment allocations. See Topology/Block Scheduling in Slurm for more information.
June 13, 2025
UpdateCKS
CKS now supports Kubernetes v1.32.
June 13, 2025
ChangePlatform
Defaulted PasswordAuthentication to no in sshd for improved security.Charts now manage the ns.coreweave.cloud/managed namespace label.
June 13, 2025
UpdateObservability
Added support for VMPodScrape as an alternative to PodMonitor for metric gathering.
June 3, 2025
UpdateObservability
Cabinet Wrangler is now available for managing cabinet-level operations and monitoring. See Cabinet Wrangler release notes for detailed information.
June 2, 2025
UpdateSUNK
SUNK v6.4.1 has been released as a patch release with critical memory parsing fixes, improved MOTD script handling, container runtime enhancements, and RDMA configuration cleanup. This release addresses important issues discovered in v6.4.0, including a critical memory parsing fix, improved login template configuration, and enhanced container runtime stability. All v6.4.0 deployments should upgrade to v6.4.1 to resolve these issues. See SUNK v6.4.1 release notes for detailed information.
May 29, 2025
UpdateInstances
NVIDIA HGX B200 instances are now Generally Available, providing next-generation AI compute capabilities. See NVIDIA HGX B200 instances GA release notes for detailed information.
May 26, 2025
UpdateSUNK
SUNK v6.4.0 has been released with significant improvements to login pod management, configuration capabilities, and user experience. This release introduces external MySQL database configuration in the Slurm Helm chart, improved hostname resolution for login pods, customizable MOTD display, user-controlled pod reboot, enhanced error handling, and dashboard integration features. See SUNK v6.4.0 release notes for detailed information.
May 20, 2025
UpdateObservability
Internet Transit Dashboard is now available, providing real-time visibility into network traffic and performance. See Internet Transit Dashboard release notes for detailed information.
May 15, 2025
ChangeCKS
New Node Pool UI enhancements for improved cluster management experience. See Node Pool UI enhancements release notes for detailed information.
April 25, 2025
UpdateSUNK
New features in SUNK v6.3.0 including enhanced Slurm functionality and performance improvements. See SUNK v6.3.0 release notes for detailed information.
April 25, 2025
ChangePlatform
Node ID Format Change implemented for improved system identification and management. See Node Lifecycle for more information.
April 17, 2025
UpdateSUNK
SUNK v6.2.0 has been released with Device Plugin chart integration, Slurm upgrade to v24.11.4, AllowGaps patch for improved scheduling, and configurable operator log levels. See SUNK v6.2.0 release notes for detailed information.Slurm Device Plugin Helm chart has been integrated as a subchart in SUNK, simplifying GPU resource provisioning within clusters managed by Slurm.Slurm has been patched to support the AllowGaps setting in topology.conf, allowing for non-contiguous Node groupings in block topology mode.The SUNK operator now includes configurable log levels, which can be set through Helm values for fine-grained control over log verbosity.
April 17, 2025
UpdateObservability
A new drain_time_seconds metric has been added for Slurm nodes, reporting how long a Node has been in the DRAIN or DRAINING state.
April 17, 2025
UpdateInstances
A new compute Node type for CPU-only Nodes has been defined in the Helm charts, enabling deployment scenarios that do not require GPU-specific configurations. See CPU Instances for available options.
April 9, 2025
UpdateObservability
“Explore” Now Available in CoreWeave Observe™, providing enhanced data exploration capabilities. See Grafana Explore release notes for detailed information.
April 4, 2025
UpdateSUNK
New features in SUNK v6.1.0 including enhanced Slurm functionality and performance improvements. See SUNK release notes for more information.
March 31, 2025
UpdateStorage
CoreWeave AI Object Storage is now Generally Available, providing high-performance object storage optimized for AI workloads. See CoreWeave AI Object Storage GA release notes for detailed information.AI Object Storage now supported in an additional Availability Zone - US-EAST-01A. See CoreWeave AI Object Storage GA release notes for availability details.
March 31, 2025
UpdatePlatform
Brand new Cloud Console UI for AI Object Storage with enhanced user experience and streamlined data management. See CoreWeave AI Object Storage GA release notes for detailed information.
March 20, 2025
UpdateStorage
Introducing CoreWeave AI Object Storage, a new high-performance object storage solution designed specifically for AI and machine learning workloads. See CoreWeave AI Object Storage for more information.
March 14, 2025
UpdateSUNK
SUNK v6.0.0 has been released with significant new features and breaking changes. See SUNK v6.0.0 release notes for detailed information.
February 21, 2025
UpdateAPIs
CoreWeave Kubernetes Service (CKS) API is now Generally Available, enabling programmatic deployment, management, and scaling of HPC applications using Kubernetes on CoreWeave’s high-performance infrastructure. See CKS API and Terraform provider release notes for detailed information.CoreWeave Terraform provider is now available, allowing customers to deploy and manage VPCs and CKS clusters as code. See CKS API and Terraform provider release notes for detailed information.
February 21, 2025
ChangePlatform
Enhanced Cloud Console design and user experience with improved usability and creation flows for faster cluster deployment and better resource management. See CKS API and Terraform provider release notes for detailed information.
February 6, 2025
UpdateSUNK
SUNK v5.7.0 has been released with a change to using direct RPCs to the Slurm controller instead of the REST API. The REST API is now an optional component and must be explicitly enabled if required. See SUNK v5.7.0 release notes for detailed information.SUNK v5.6.0 released with enhanced Slurm login functionality and improved compute definitions. See SUNK for more information.Added individual Slurm login pods implementation with user cache controller for improved authentication management.Added GB200 compute definition to support the latest NVIDIA hardware.Added CUDA 12.8 image builds for enhanced GPU support.Upgraded Slurm to 24.05.05 with latest upstream fixes and improvements.Enhanced block topology configuration with automatic generation from labels for improved GPU scheduling.Added readiness probe to slurmd for better health monitoring.Fixed syncer cluster role binding name to prevent deployment issues.
February 6, 2025
FixSUNK
Removed default CPU limit for login pods to improve performance.Updated directory-cache image to include OS suffix for better compatibility.Fixed nvlink domain handling to skip domains labeled “0” (no domain).Improved resource usage calculation by ignoring completed pods.
February 3, 2025
UpdateInstances
GB200 NVL72-powered cloud instances are now available in selected CoreWeave Regions, combining NVIDIA’s GB200 Superchips in a 72-GPU NVLink-connected fabric with CoreWeave’s managed services. See GB200 NVL72 instances release notes for detailed information.
January 13, 2025
ChangeInstances
H100 and H200 based instances now support NV HGX 1.5.0 firmware, delivering enhanced GPU stability and improved troubleshooting capabilities. See H100 and H200 firmware update release notes for detailed information.
December 27, 2024
UpdateSUNK
SUNK v5.5.0 released with improved resource cleanup and Slurm login chart implementation. See SUNK for more information.
December 17, 2024
UpdateSUNK
SUNK v5.4.0 released with enhanced Slurm login functionality and improved compute definitions. See SUNK for more information.Added single projected volume for SSSD to simplify configuration and improve security.Added dynamic feature prefixing for flexible feature configuration.Added H200 compute definition to support the latest NVIDIA hardware.Enhanced Slurm login chart with improved pod specification handling.Added cleanup of Slurm nodes following removal from NodeSlices for better resource management.Implemented LoginReconciler for improved login pod management.Updated NCCL base images to newer versions with HPC-X 2.21 for enhanced performance.Added OwnerReference for resource cleanups to improve resource management and prevent orphaned resources.Implemented slurm-login chart for better login node management.
December 17, 2024
FixSUNK
Fixed affinity configuration in compute base definitions.Added InfiniBand support to H200 compute definitions.Removed Ubuntu 20.04 image builds to focus on supported versions.Fixed lock annotation removal when nodes are removed from nodesets.Upgraded to Go 1.23.2 for improved performance and security.Fixed ignore_group_members configuration by renaming to ignoreGroupMembers for consistency.Corrected login pod template indentation to prevent deployment issues.Updated LDAP secret key defaults to use ldap-password.conf for better compatibility.
October 25, 2024
UpdateSUNK
SUNK v5.3.0 released with enhanced Slurm functionality and improved monitoring capabilities. See SUNK for more information.Added GH200 compute definition to support the latest NVIDIA hardware.Enhanced login SSH daemon liveness probe for better health monitoring.Added scripts for deleting NVIDIA hooks on CPU nodes to prevent conflicts.Allowed list of prolog/epilog configmaps in Helm values for flexible configuration.Exposed all probes for all containers in Helm values for comprehensive monitoring.Moved Slurm secret manifests to secret job for improved security.Enhanced Node Extras handling to prevent overwriting of extra fields.Improved condition synchronization from pods to nodes for better state management.Added SSSD config reload capability for dynamic configuration changes.Upgraded Slurm to 24.05.4 with latest upstream fixes and improvements.
October 25, 2024
FixSUNK
Fixed Slurm probe indentation in Helm charts.Corrected MySQL resource defaults for better performance.Made MySQL secret immutable and persistent for improved security.Removed defunct CgroupAutomount option to prevent configuration errors.Enhanced persistent connections to slurmctld for improved stability.Fixed Slurm completion script permissions for proper execution.Updated Slurm image dependencies for better compatibility.Upgraded Ubuntu images to newer tags for security and performance.Fixed array job merge behavior for metrics collection.Corrected scheduler hook bug when pods are deleted before hook execution.Improved condition update handling on pods for better state management.Fixed termination grace error handling for improved reliability.
September 10, 2024
UpdateSUNK
SUNK v5.2.0 released with enhanced Slurm PAM module support and improved monitoring. See SUNK for more information.Added packages to support Slurm PAM module for enhanced authentication capabilities.Added host aliases to Slurm chart for improved networking configuration.Added Slurm not responding condition for better health monitoring.Enhanced operator syncer and scheduler configuration for improved performance.Switched to cgroup process tracking as default in Helm charts for better resource management.
September 10, 2024
FixSUNK
Fixed leader election configuration to not force it by default for SUNK.Fixed missing volumes on REST deployment for proper functionality.Updated disk space check for MySQL init container to prevent deployment issues.Used templates for operator scheduler and syncer configs for consistency.Moved hooksapi out of syncer and scheduler configs for better separation of concerns.Reevaluated Slurm controller liveness probe for improved health checking.Reduced noisy messages from user-lookup container for cleaner logs.
August 9, 2024
UpdateSUNK
SUNK v5.1.0 released with enhanced monitoring capabilities and improved configuration options. See SUNK for more information.Added additional slurmdbd.conf lines to Helm values for flexible database configuration.Allowed additional DNS config searches for improved name resolution.Added custom plugstack.conf entries support for enhanced Slurm configuration.Exposed compute liveness probe configuration for better health monitoring.Made field labels and metrics consistent across the platform.Added Slurm job uptime metrics for better job monitoring.Exposed Slurm RPC stats for Prometheus metrics collection.Renamed diagnostic metrics and fixed pointer checks for improved monitoring.Updated base images for all image builds to latest versions.
August 9, 2024
FixSUNK
Fixed user-lookup enablement to only activate when canary users are set.Added missing labels to resources for better organization.Adjusted slurmd default timeout to 60 seconds for better performance.Fixed scheduler script to prevent Slurm bug in job handling.Included topology.conf in watched files again for proper configuration monitoring.Properly added additional configuration to plugstack.conf for enhanced functionality.Set default max_rpc_cnt for SchedulerParameters to prevent issues.Unified approach to labels on SUNK chart for consistency.Patched default show flags in REST API for nodes to improve visibility.Removed unrecognized configure options from Slurm Dockerfile to prevent build issues.Implemented deduplication of Get requests in Slurm client for improved performance.Injected missing SLURM_CLUSTER_NAME environment variable in compute nodes.Corrected URLs to documentation for better user experience.Fixed nil features clearing issue in operator for better state management.Excluded pods not in Ready state from auto cleanup to prevent data loss.Started API health check after Slurm client initialization for proper startup sequence.
July 10, 2024
UpdateSUNK
SUNK v5.0.0 released with major upgrade to Slurm 24.05.x and enhanced security features. See SUNK for more information.Upgraded to Slurm 24.05.0 with latest upstream features and improvements.Added leader election for operator to improve reliability in multi-instance deployments.Enabled pyxis and security capabilities by default for enhanced container security.Upgraded default resources for Slurm components to improve performance.Enhanced plugstack.conf customization in Helm for flexible configuration.Upgraded enroot and pyxis to latest versions for improved container management.Updated exported node metrics from Slurm for better monitoring.Added orphaned pod checking for better resource cleanup.Updated controller-runtime to 0.18.3 for improved Kubernetes integration.
July 10, 2024
FixSUNK
Changed default Munged UID/GID and allowed configuration for better security.Bumped scrape timeout for syncer to prevent monitoring issues.Added temporary fix for inode locking issue to prevent file system problems.Dropped CUDA version in values-cw.yaml back to 12.2 for compatibility.Updated JWT secret to use infinite lifespan for better security.Corrected license dates in documentation header templates.Added condition delay check for nodes in tests to improve reliability.Added replica check for e2e tests to ensure proper deployment.Corrected scaleDeployment bug for checking incorrect pods.Upgraded Slurm to 24.05.1 with latest patch fixes.Made micromamba executable for proper package management.Properly handled node info updates in nodeslice for better state management.Changed array job merge behavior for improved job handling.Corrected minor behavior in nodeset scaling for better resource management.Fixed pod node assignment handling to prevent errors when nodes are not assigned.Improved clarity of scheduler errors for better troubleshooting.Moved kubectl installation from script to Dockerfile for better build process.Stopped prestop lifecycle hook from overriding existing reasons for better job management.Moved test setup into BeforeAll block for improved test organization.Added --load-images flag to skaffold for better development workflow.
Last modified on May 14, 2026