Skip to main content

Changelog

What's new at CoreWeave

The changelog encompasses any and all changes to CoreWeave products and new products or features. This page surfaces all customer-facing product changes with links to relevant documentation and more detailed release notes, where applicable.

Filter by Type:

Filter by Product:

August 2025

August 4, 2025

Update Storage CoreWeave AI Object Storage now supports server-side encryption with customer keys (SSE-C), providing enhanced data security and control for stored objects. See CoreWeave AI Object Storage SSE-C support for detailed information.

July 2025

July 31, 2025

Update CKS CKS now supports Kubernetes upgrades to take advantage of the latest features and security updates. See CKS Kubernetes upgrade support and CoreWeave Telecaster™ availability for detailed information.

Change CKS Default control group version changed to v2 for CKS clusters targeting Kubernetes v1.33, aligning with upstream Kubernetes support policy. See CKS Kubernetes upgrade support and CoreWeave Telecaster™ availability for more information.

Update Observability CoreWeave Telecaster™ is now available in CoreWeave Observe™, providing fully-managed log and metric forwarding to external destinations. See CKS Kubernetes upgrade support and CoreWeave Telecaster™ availability for detailed information.

July 12, 2025

Fix SUNK Added default value for pool size in Helm charts to prevent configuration issues.

Update SUNK SCIM provisioning for SUNK is now available via nsscache. This enables automated, standards-based user and group management from your IdP to CoreWeave clusters. See SCIM provisioning for SUNK release notes for detailed information.

Update Observability Slurm job and Node outputs now include direct links to their corresponding Grafana dashboards, giving operators one-click visibility into live job metrics. See CoreWeave Grafana for more information.

Update Instances Added two new compute definitions: rtxp6000-8x (NVIDIA RTX Pro 6000 Blackwell Server Edition) and gb300-4x (NVIDIA GB300). See Instances for detailed specifications.

Change Observability Slurm metrics now carry the slurm_cluster label, simplifying multi-cluster dashboards. See CoreWeave Grafana for monitoring capabilities.

Change Observability MySQL exporter metrics are automatically scraped and ingested. See CoreWeave Logs and Metrics for querying capabilities.

Change SUNK NCCL-test base image updated to nccl-tests/d5a135d, ensuring compatibility with the latest CUDA toolchain.

Change Platform Nodes that stay "busy" inside a reservation are automatically re-evaluated after 30 minutes, reducing orphaned allocations. See Node Lifecycle for more information.

Change SUNK CoreWeave IAM is now fully integrated with the Slurm Helm chart. See SUNK for more information.

Change SUNK Optional SSSD mounts are intelligently gated, reducing unnecessary container overhead. See Directory Services for configuration details.

Fix Platform Disabled NVIDIA device-plugin health checks that could cause false Nodedrains.

Fix SUNK Segment-calc now skips Nodes already in DRAIN state to prevent skewed capacity charts.

Fix Observability PodMonitor and VMPodScrape templates now use consistent relabeling syntax.

Fix Instances Removed the InfiniBand requirement for A100-based Nodes where it is not present. See Instances for A100 specifications.

Fix Platform Multiple operator dependencies updated (chi v5, viper v2, Go Slurm) to incorporate upstream security and stability patches.

July 11, 2025

Update SUNK SUNK v6.6.0 has been released with SCIM provisioning via nsscache, enhanced monitoring with dashboard links, improved node reconciliation, new GPU compute definitions (rtxp6000-8x and gb300-4x), metrics improvements, segment-calc script enhancements, and base image upgrades. This release also includes automatic scraping of MySQL metrics, fixes for metrics labeling, and improved segment-calc handling for DRAIN nodes. See SUNK v6.6.0 release notes for detailed information.

June 13, 2025

Update SUNK SUNK v6.5.0 has been released with major improvements to monitoring, system stability, and resource management. This release introduces enhanced dashboard integration for Slurm jobs and nodes, improved metrics labeling, automatic MySQL metrics scraping, and new compute definitions. It also includes fixes for NVIDIA device-plugin health checks, segment-calc handling for DRAIN nodes, and updates to operator dependencies. See SUNK v6.5.0 release notes for detailed information.

July 9, 2025

Update Instances RTX Pro 6000 Blackwell Server Edition cloud instances are now available in select CoreWeave Availability Zones. These instances combine NVIDIA's RTX Pro 6000 Blackwell Server Edition with CoreWeave's managed services, observability, and high-performance networking. See RTX Pro 6000 Blackwell Server Edition release notes for detailed information.

Change CKS CoreWeave enables encryption at rest for Kubernetes Secrets by default in all CoreWeave Kubernetes Service (CKS) clusters. This feature uses a KMS-backed integration to encrypt etcd data automatically. See CKS encryption at rest release notes for detailed information.

July 7, 2025

Update CKS Control Plane Node Pools are now available for CKS clusters, providing dedicated compute resources for Kubernetes Control Plane components. See Control Plane Node Pools and unmanaged auth API release notes for detailed information.

Update APIs New Kubernetes API endpoint for unmanaged auth is now available in CKS, enabling custom authentication workflows. See Control Plane Node Pools and unmanaged auth API release notes for detailed information.

June 2025

June 15, 2025

Update CKS CKS cluster management improvements with enhanced Node Pool management. See CKS Clusters for more information.

June 30, 2025

Change CKS Node Pool condition transition improvements for better cluster management and monitoring. See CKS Clusters for more information.

June 17, 2025

Update SUNK Support for NVSHMEM and GDRCopy is now available, enabling high-performance GPU-to-GPU communication. See NVSHMEM and GDRCopy for detailed information.

June 13, 2025

Update CKS CKS now supports Kubernetes v1.32. See CKS Kubernetes v1.32 support and SUNK enhancements release notes for detailed information.

Update SUNK Slurm upgraded to 24.11.05, bringing in the latest upstream fixes and enhancements. See SUNK for more information.

Update SUNK NCCL bumped to 2.26.5, improving GPU communication performance.

Update SUNK Added new CUDA runtime images for 12.8.1 and 12.9.0.

Update SUNK Introduced nsscache as an alternative option to SSSD for user caching. See Directory Services for configuration details.

Update SUNK Enabled timeout-based forced deletion of compute pods (disabled by default), allowing cleanup even when jobs are still running.

Update SUNK Backported Slurm 25.05 SlurmdSpecOverride and container awareness features to correctly configure CPUSpecList and MemSpecList, so static pod workloads no longer enter an invalid state after scontrol reconfigure.

Update SUNK Enhanced controller.etcConfigMap to accept either a single string or a list of multiple ConfigMaps.

Update SUNK Added the segment-calc script for visualizing block-topology segment allocations. See Topology/Block Scheduling in Slurm for more information.

Change Platform Defaulted PasswordAuthentication to no in sshd for improved security.

Change Platform Charts now manage the ns.coreweave.cloud/managed namespace label.

Update Observability Added support for VMPodScrape as an alternative to PodMonitor for metric gathering.

June 3, 2025

Update Observability Cabinet Wrangler is now available for managing cabinet-level operations and monitoring. See Cabinet Wrangler release notes for detailed information.

June 2, 2025

Update SUNK SUNK v6.4.1 has been released as a patch release with critical memory parsing fixes, improved MOTD script handling, container runtime enhancements, and RDMA configuration cleanup. This release addresses important issues discovered in v6.4.0, including a critical memory parsing fix, improved login template configuration, and enhanced container runtime stability. All v6.4.0 deployments should upgrade to v6.4.1 to resolve these issues. See SUNK v6.4.1 release notes for detailed information.

May 2025

May 29, 2025

Update Instances NVIDIA HGX B200 instances are now Generally Available, providing next-generation AI compute capabilities. See NVIDIA HGX B200 instances GA release notes for detailed information.

May 26, 2025

Update SUNK SUNK v6.4.0 has been released with significant improvements to login pod management, configuration capabilities, and user experience. This release introduces external MySQL database configuration in the Slurm Helm chart, improved hostname resolution for login pods, customizable MOTD display, user-controlled pod reboot, enhanced error handling, and dashboard integration features. See SUNK v6.4.0 release notes for detailed information.

May 20, 2025

Update Observability Internet Transit Dashboard is now available, providing real-time visibility into network traffic and performance. See Internet Transit Dashboard release notes for detailed information.

May 15, 2025

Change CKS New Node Pool UI enhancements for improved cluster management experience. See CKS Clusters for more information.

April 2025

April 25, 2025

Update SUNK New features in SUNK v6.3.0 including enhanced Slurm functionality and performance improvements. See SUNK for more information.

Change Platform Node ID Format Change implemented for improved system identification and management. See Node Lifecycle for more information.

April 17, 2025

Update SUNK SUNK v6.2.0 has been released with Device Plugin chart integration, Slurm upgrade to v24.11.4, AllowGaps patch for improved scheduling, and configurable operator log levels. See SUNK v6.2.0 release notes for detailed information.

Update SUNK Slurm Device Plugin Helm chart has been integrated as a subchart in SUNK, simplifying GPU resource provisioning within clusters managed by Slurm.

Update SUNK Slurm has been patched to support the AllowGaps setting in topology.conf, allowing for non-contiguous Node groupings in block topology mode.

Update Observability A new drain_time_seconds metric has been added for Slurm nodes, reporting how long a Nodehas been in the DRAIN or DRAINING state.

Update SUNK The SUNK operator now includes configurable log levels, which can be set through Helm values for fine-grained control over log verbosity.

Update Instances A new compute Node type for CPU-only Nodes has been defined in the Helm charts, enabling deployment scenarios that do not require GPU-specific configurations. See CPU Instances for available options.

April 9, 2025

Update Observability "Explore" Now Available in Managed Grafana, providing enhanced data exploration capabilities. See CoreWeave Logs and Metrics for more information.

April 4, 2025

Update SUNK New features in SUNK v6.1.0 including enhanced Slurm functionality and performance improvements. See SUNK for more information.

March 2025

March 31, 2025

Update Storage CoreWeave AI Object Storage is now Generally Available, providing high-performance object storage optimized for AI workloads. See CoreWeave AI Object Storage GA release notes for detailed information.

March 20, 2025

Update Storage Introducing CoreWeave AI Object Storage, a new high-performance object storage solution designed specifically for AI and machine learning workloads. See CoreWeave AI Object Storage for more information.

March 14, 2025

Update SUNK SUNK v6.0.0 has been released with significant new features and breaking changes. See SUNK v6.0.0 release notes for detailed information.

February 2025

February 6, 2025

Update SUNK SUNK v5.7.0 has been released with a change to using direct RPCs to the Slurm controller instead of the REST API. The REST API is now an optional component and must be explicitly enabled if required. See SUNK v5.7.0 release notes for detailed information.

Update SUNK SUNK v5.6.0 released with enhanced Slurm login functionality and improved compute definitions. See SUNK for more information.

Update SUNK Added individual Slurm login pods implementation with user cache controller for improved authentication management.

Update SUNK Added GB200 compute definition to support the latest NVIDIA hardware.

Update SUNK Added CUDA 12.8 image builds for enhanced GPU support.

Update SUNK Upgraded Slurm to 24.05.05 with latest upstream fixes and improvements.

Update SUNK Enhanced block topology configuration with automatic generation from labels for improved GPU scheduling.

Update SUNK Added readiness probe to slurmd for better health monitoring.

Fix SUNK Fixed syncer cluster role binding name to prevent deployment issues.

Fix SUNK Removed default CPU limit for login pods to improve performance.

Fix SUNK Updated directory-cache image to include OS suffix for better compatibility.

Fix SUNK Fixed nvlink domain handling to skip domains labeled "0" (no domain).

Fix SUNK Improved resource usage calculation by ignoring completed pods.

February 21, 2025

Update APIs CoreWeave Kubernetes Service (CKS) API is now Generally Available, enabling programmatic deployment, management, and scaling of HPC applications using Kubernetes on CoreWeave's high-performance infrastructure. See CKS API Reference for details.

Update APIs CoreWeave Terraform provider is now available, allowing customers to deploy and manage VPCs and CKS clusters as code. See CoreWeave Terraform Provider for more information.

Change Platform Enhanced Cloud Console design and user experience with improved usability and creation flows for faster cluster deployment and better resource management.

February 3, 2025

Update Instances GB200 NVL72-powered cloud instances are now available in selected CoreWeave Regions, combining NVIDIA's GB200 Superchips in a 72-GPU NVLink-connected fabric with CoreWeave's managed services. See GB200 NVL72-powered instances for specifications.

January 2025

January 13, 2025

Change Instances H100 and H200 based instances now support NV HGX 1.5.0 firmware, delivering enhanced GPU stability and improved troubleshooting capabilities. See H100 with InfiniBand and H200 with InfiniBand for specifications.

December 2024

December 27, 2024

Update SUNK SUNK v5.5.0 released with improved resource cleanup and Slurm login chart implementation. See SUNK for more information.

December 17, 2024

Update SUNK SUNK v5.4.0 released with enhanced Slurm login functionality and improved compute definitions. See SUNK for more information.

Update SUNK Added single projected volume for SSSD to simplify configuration and improve security.

Update SUNK Added dynamic feature prefixing for flexible feature configuration.

Update SUNK Added H200 compute definition to support the latest NVIDIA hardware.

Update SUNK Enhanced Slurm login chart with improved pod specification handling.

Update SUNK Added cleanup of Slurm nodes following removal from NodeSlices for better resource management.

Update SUNK Implemented LoginReconciler for improved login pod management.

Update SUNK Updated NCCL base images to newer versions with HPC-X 2.21 for enhanced performance.

Fix SUNK Fixed affinity configuration in compute base definitions.

Fix SUNK Added InfiniBand support to H200 compute definitions.

Fix SUNK Removed Ubuntu 20.04 image builds to focus on supported versions.

Fix SUNK Fixed lock annotation removal when nodes are removed from nodesets.

Fix SUNK Upgraded to Go 1.23.2 for improved performance and security.

Update SUNK Added OwnerReference for resource cleanups to improve resource management and prevent orphaned resources.

Update SUNK Implemented slurm-login chart for better login node management.

Fix SUNK Fixed ignore_group_members configuration by renaming to ignoreGroupMembers for consistency.

Fix SUNK Corrected login pod template indentation to prevent deployment issues.

Fix SUNK Updated LDAP secret key defaults to use ldap-password.conf for better compatibility.

December 20, 2024

Update Platform Welcome to the new CoreWeave Documentation Hub! The new documentation site is designed to provide a more streamlined and user-friendly experience, making it easier to find the information you need.

October 2024

October 25, 2024

Update SUNK SUNK v5.3.0 released with enhanced Slurm functionality and improved monitoring capabilities. See SUNK for more information.

Update SUNK Added GH200 compute definition to support the latest NVIDIA hardware.

Update SUNK Enhanced login SSH daemon liveness probe for better health monitoring.

Update SUNK Added scripts for deleting NVIDIA hooks on CPU nodes to prevent conflicts.

Update SUNK Allowed list of prolog/epilog configmaps in Helm values for flexible configuration.

Update SUNK Exposed all probes for all containers in Helm values for comprehensive monitoring.

Update SUNK Moved Slurm secret manifests to secret job for improved security.

Update SUNK Enhanced Node Extras handling to prevent overwriting of extra fields.

Update SUNK Improved condition synchronization from pods to nodes for better state management.

Update SUNK Added SSSD config reload capability for dynamic configuration changes.

Update SUNK Upgraded Slurm to 24.05.4 with latest upstream fixes and improvements.

Fix SUNK Fixed Slurm probe indentation in Helm charts.

Fix SUNK Corrected MySQL resource defaults for better performance.

Fix SUNK Made MySQL secret immutable and persistent for improved security.

Fix SUNK Removed defunct CgroupAutomount option to prevent configuration errors.

Fix SUNK Enhanced persistent connections to slurmctld for improved stability.

Fix SUNK Fixed Slurm completion script permissions for proper execution.

Fix SUNK Updated Slurm image dependencies for better compatibility.

Fix SUNK Upgraded Ubuntu images to newer tags for security and performance.

Fix SUNK Fixed array job merge behavior for metrics collection.

Fix SUNK Corrected scheduler hook bug when pods are deleted before hook execution.

Fix SUNK Improved condition update handling on pods for better state management.

Fix SUNK Fixed termination grace error handling for improved reliability.

September 2024

September 10, 2024

Update SUNK SUNK v5.2.0 released with enhanced Slurm PAM module support and improved monitoring. See SUNK for more information.

Update SUNK Added packages to support Slurm PAM module for enhanced authentication capabilities.

Update SUNK Added host aliases to Slurm chart for improved networking configuration.

Update SUNK Added Slurm not responding condition for better health monitoring.

Update SUNK Enhanced operator syncer and scheduler configuration for improved performance.

Update SUNK Switched to cgroup process tracking as default in Helm charts for better resource management.

Fix SUNK Fixed leader election configuration to not force it by default for SUNK.

Fix SUNK Fixed missing volumes on REST deployment for proper functionality.

Fix SUNK Updated disk space check for MySQL init container to prevent deployment issues.

Fix SUNK Used templates for operator scheduler and syncer configs for consistency.

Fix SUNK Moved hooksapi out of syncer and scheduler configs for better separation of concerns.

Fix SUNK Reevaluated Slurm controller liveness probe for improved health checking.

Fix SUNK Reduced noisy messages from user-lookup container for cleaner logs.

August 2024

August 9, 2024

Update SUNK SUNK v5.1.0 released with enhanced monitoring capabilities and improved configuration options. See SUNK for more information.

Update SUNK Added additional slurmdbd.conf lines to Helm values for flexible database configuration.

Update SUNK Allowed additional DNS config searches for improved name resolution.

Update SUNK Added custom plugstack.conf entries support for enhanced Slurm configuration.

Update SUNK Exposed compute liveness probe configuration for better health monitoring.

Update SUNK Made field labels and metrics consistent across the platform.

Update SUNK Added Slurm job uptime metrics for better job monitoring.

Update SUNK Exposed Slurm RPC stats for Prometheus metrics collection.

Update SUNK Renamed diagnostic metrics and fixed pointer checks for improved monitoring.

Update SUNK Updated base images for all image builds to latest versions.

Fix SUNK Fixed user-lookup enablement to only activate when canary users are set.

Fix SUNK Added missing labels to resources for better organization.

Fix SUNK Adjusted slurmd default timeout to 60 seconds for better performance.

Fix SUNK Fixed scheduler script to prevent Slurm bug in job handling.

Fix SUNK Included topology.conf in watched files again for proper configuration monitoring.

Fix SUNK Properly added additional configuration to plugstack.conf for enhanced functionality.

Fix SUNK Set default max_rpc_cnt for SchedulerParameters to prevent issues.

Fix SUNK Unified approach to labels on SUNK chart for consistency.

Fix SUNK Patched default show flags in REST API for nodes to improve visibility.

Fix SUNK Removed unrecognized configure options from Slurm Dockerfile to prevent build issues.

Fix SUNK Implemented deduplication of Get requests in Slurm client for improved performance.

Fix SUNK Injected missing SLURM_CLUSTER_NAME environment variable in compute nodes.

Fix SUNK Corrected URLs to documentation for better user experience.

Fix SUNK Fixed nil features clearing issue in operator for better state management.

Fix SUNK Excluded pods not in Ready state from auto cleanup to prevent data loss.

Fix SUNK Started API health check after Slurm client initialization for proper startup sequence.

July 2024

July 10, 2024

Update SUNK SUNK v5.0.0 released with major upgrade to Slurm 24.05.x and enhanced security features. See SUNK for more information.

Update SUNK Upgraded to Slurm 24.05.0 with latest upstream features and improvements.

Update SUNK Added leader election for operator to improve reliability in multi-instance deployments.

Update SUNK Enabled pyxis and security capabilities by default for enhanced container security.

Update SUNK Upgraded default resources for Slurm components to improve performance.

Update SUNK Enhanced plugstack.conf customization in Helm for flexible configuration.

Update SUNK Upgraded enroot and pyxis to latest versions for improved container management.

Update SUNK Updated exported node metrics from Slurm for better monitoring.

Update SUNK Added orphaned pod checking for better resource cleanup.

Update SUNK Updated controller-runtime to 0.18.3 for improved Kubernetes integration.

Fix SUNK Changed default Munged UID/GID and allowed configuration for better security.

Fix SUNK Bumped scrape timeout for syncer to prevent monitoring issues.

Fix SUNK Added temporary fix for inode locking issue to prevent file system problems.

Fix SUNK Dropped CUDA version in values-cw.yaml back to 12.2 for compatibility.

Fix SUNK Updated JWT secret to use infinite lifespan for better security.

Fix SUNK Corrected license dates in documentation header templates.

Fix SUNK Added condition delay check for nodes in tests to improve reliability.

Fix SUNK Added replica check for e2e tests to ensure proper deployment.

Fix SUNK Corrected scaleDeployment bug for checking incorrect pods.

Fix SUNK Upgraded Slurm to 24.05.1 with latest patch fixes.

Fix SUNK Made micromamba executable for proper package management.

Fix SUNK Properly handled node info updates in nodeslice for better state management.

Fix SUNK Changed array job merge behavior for improved job handling.

Fix SUNK Corrected minor behavior in nodeset scaling for better resource management.

Fix SUNK Fixed pod node assignment handling to prevent errors when nodes are not assigned.

Fix SUNK Improved clarity of scheduler errors for better troubleshooting.

Fix SUNK Moved kubectl installation from script to Dockerfile for better build process.

Fix SUNK Stopped prestop lifecycle hook from overriding existing reasons for better job management.

Fix SUNK Moved test setup into BeforeAll block for improved test organization.

Fix SUNK Added --load-images flag to skaffold for better development workflow.

Note

For changelog entries prior to December 2024, please see the CoreWeave Classic documentation.