CoreWeave
Search
K

Release Notes

Feature Updates and Release Notes for CoreWeave Cloud

December 2023

Email notifications are configured on the User Settings page
With the new alert notifications feature in User Settings, organization members may configure whether or not to receive emails for specific critical status alerts. Alerts may be configured for failing jobs, quotas nearing or reaching their capacity limits, high Persistent Volume usage, and more.
Important Organizations that already exist on CoreWeave Cloud will need to explicitly turn on any desired email notifications.
Organizations that are created after November 14, 2023 will have all email notifications enabled by default. If specific alerts are not desired, they will need to be manually turned off from the User Settings page.

Introducing Intel Xeon Ice Lake CPU nodes

Intel Xeon Ice Lake CPU nodes are now available in LAS1! For more information on how to use Ice Lake nodes, see Node Types.

October 2023

When inviting new members to your organization, their permissions may be set right from the invitation modal. Configurable permissions now include:
  • access to specific namespaces,
  • organization admin permissions, and
  • access to view billing information.
All permissions may be configured in the invitation to the new member, or the permissions may be adjusted from the Organization Management page once a new user has accepted their invitation and created their account.
Datadog's CoreWeave integration leverages CoreWeave Prometheus metrics to create robust Datadog monitors for CoreWeave infrastructure and to track usage patterns for insights into platform activity as well as how your organization is billed based on resource usage.
Generate alerts based on specific queries, allowing your team to quickly address any issues and identify areas of high use.
With the Datadog integration for CoreWeave, users can:
  • leverage Prometheus metrics for robust reports and monitoring
  • track infrastructure usage patterns for insights into platform use
  • monitor billing based on resource use with precision
and more!
Knative Serving, the serverless runtime for CoreWeave Cloud's inference stack, manages autoscaling, revision control, and canary deployments. By following our new Best Practices guide, you can ensure you're getting the best performance out of Knative Serving for inference tasks.

July 2023

CoreWeave Tensorizer is a tool for fast PyTorch module, model, and tensor serialization and deserialization, making it possible to load models extremely quickly from HTTP/HTTPS and S3 endpoints. It also speeds up loading from network and local disk volumes.
With faster model loading times for LLMs and reduces GPU memory utilization, Tensorizer helps accelerate model instance spin up times while reducing overall costs to serve inference.
Tensorizer is S3/HTTP-compatible, enabling model streams directly from S3 into the container without having to download the model to the container's local filesystem.
The average latency per request was >5x faster for Tensorizer compared to Hugging Face when scaling from zero, and required fewer pod spin ups and less RAM.
In addition to a brand new blog post about Tensorizer's performance benchmarks, a new tutorial for running a real-world benchmark test is now available to try yourself!
The CoreWeave Cloud UI is now even easier and more intuitive to use! Manage all your resources and account information right from your browser. Additionally, a new guide exploring all of the features of the updated Cloud UI has been added to better introduce you to this feature-rich GUI.
With new namespace access controls, organization administrators can create access tokens with specific namespace permissions, allowing for a greater level of security for organization members. A token with no specified namespace permissions can also be created, granting the organization administrator the ability to create Kubernetes custom RBAC policies.

👋
Support dropped for Ubuntu 18.04

As per the out of support EOL notice for Ubuntu 18.04 begun at the end of May, CoreWeave no longer supports Ubuntu 18.04. Existing images will not yet be deleted, but no new 18.04 images will be built.

May 2023

CoreWeave's Tensorizer is an S3 and local filesystem compatible module, model, and tensor serializer and deserializer that makes it possible to load models in less than five seconds, making it easier, more flexible, and more cost-efficient to serve models at scale. Reduce resource usage with flexible iterations.
Single Sign-On, more commonly referred to as SSO, is an authentication scheme that allows the users in an organization to authenticate to CoreWeave Cloud from the same identity provider (IDP) used to log in to other organization-wide apps. Single Sign-On enhances security, and makes for a smoother log-in experience for your team.
CoreWeave currently supports Okta, JumpCloud, and general, generic IDP configurations.
Our sleek new Cloud UI overhaul for Virtual Servers makes creating high-performance virtual machines even easier than ever! And, for those who want even finer-grained control, the new YAML editor allows users to edit the Custom Resource Definition (CRD) directly, allowing for extreme flexibility.
Screenshot of the new Virtual Server UI
The new Virtual Server UI features a side-by-side YAML editor
With new per-namespace user access controls, your organization admin can now grant users in the organization access controls to one or more namespaces, allowing users the ability to easily spin up new Virtual Servers, allocate storage, and more!
Resource Pools are groups of hardware selections plus memory requests and limits that make it simple to select resource groups for Determined AI deployments, helping users get their Determined AI experiments up and running faster.
Our Fine-tune Stable Diffusion Models demo now incorporates details for working with DreamBooth!
DreamBooth is a technique used to teach novel concepts to Stable Diffusion. The DreamBooth method allows you to fine-tune Stable Diffusion on a small number of examples to produce images containing a specific object or person. This method for fine-tuning diffusion models was introduced in a paper publish in 2022, DreamBooth: Fine Tuning Text-to-Image Diffusion Models for Subject-Driven Generation. A lighter introductory text was also released along with the paper in this blog post.
The DreamBooth method is a way to teach a diffusion model about a specific object or style using approximately three to five example images. After the model is fine-tuned on a specific object using DreamBooth, it can produce images containing that object in new settings.
Zeet is a software platform that runs on top of your Cloud account, making it simple for developers to deploy code on production-grade infrastructure. With CoreWeave's Kubernetes-native infrastructure and Zeet's team of Kubernetes engineers, we're helping our clients scale and realize value faster without having to build an entire infrastructure engineering team of their own.
Our partnership allows companies to tap into the industry’s broadest selection of on-demand GPU compute resources and DevOps expertise.

March 2023

New this month on CoreWeave Cloud...
Big news! We are proud to announce that CoreWeave has become the first Cloud provider in the world to bring the super powerful NVIDIA HGX H100 nodes online!
The NVIDIA HGX H100 enables up to seven times more efficient high-performance computing (HPC) applications, up to nine times faster AI training on large models, and up to thirty times faster AI inference than the NVIDIA HGX A100.
This speed, combined with the lowest NVIDIA GPUDirect network latency in the market with the NVIDIA Quantum-2 InfiniBand platform, reduces the training time of AI models to "days or hours, instead of months." With AI permeating nearly every industry today, this speed and efficiency has never been more vital for HPC applications.

Introducing SUNK: Slurm on Kubernetes

Slurm is the de-facto scheduler for large HPC jobs in supercomputer centers around the world. CoreWeave's Slurm implementation, SUNK ("SlUrm oN Kubernetes"), integrates Slurm with Kubernetes, allowing compute to transition between distributed training in Slurm and applications such as online inference in Kubernetes.
As an implementation of Slurm on Kubernetes deployed on CoreWeave Cloud, SUNK comes complete with options for:
  • external Directory Services such as Active Directory
  • Slurm Accounting, backed by a MySQL database
  • dynamic Slurm node scaling to match your Workload requirements
In SUNK, Slurm images are derived from OCI container images, which execute on bare metal, and compute node resources are allocated using Kubernetes.
Note
CoreWeave maintains several base images for different CUDA versions, including all dependencies for InfiniBand and SHARP. If you'd like to implement SUNK in your cluster, please contact CoreWeave support for engineering support for cluster design and deployment.
Embedding machine learning models directly into images has become a popular ease-of-use technique, but it has made image pull times slower due to the increased size of container images. As a result, pulling images is often the most time-consuming aspect of spinning up new containers, and for those who rely on fast autoscaling to respond to changes in demand, the time it takes to create new containers can pose as a major hurdle.
It's for this reason that CoreWeave Cloud now supports using Nydus, the external plugin for containerd, for shorter container image pull times.
Leveraging its own container image service, Nydus implements a content-addressable filesystem on top of a RAFS format for container images. This formatting allows for major improvements to the current OCI image specification in terms of container launching speed, image space, network bandwidth efficiency, and data integrity. The result: significantly faster container image pull times.
Important
Nydus on CoreWeave is currently an alpha offering, with limited, node-specific release.
The Kubeflow project is dedicated to making deployments of Machine Learning (ML) workflows on Kubernetes simple, portable, and scalable. The goal is not to recreate other services, but to provide a straightforward way to deploy best-of-breed open-source systems for ML to diverse infrastructures. Anywhere you are running Kubernetes, you should be able to run Kubeflow.
CoreWeave is pleased to present new tutorials on using Kubeflow training operators for distributed training on CoreWeave Cloud! Follow along with these walkthroughs to train ResNet-50 with ImageNet, or fine-tune GPT-NeoX-20B with Argo Workflows!
Disk images may be imported from external URLs to be used as source images for root or additional disks for Virtual Servers. In addition to qcow2, raw and iso formatted images are also supported, and may be compressed with either gz or xz.
Following our newly published guide, an image stored locally can easily be uploaded to CoreWeave Object Storage, then imported to a DataVolume.
Hosting your own containerized applications on CoreWeave Cloud is simple! With our new guide for deploying custom containers, you can have your applications running in CoreWeave Cloud in minutes!

December 2022
❄️

New on CoreWeave Cloud this month:

Welcome NVIDIA HGX H100s to the CoreWeave fleet!
💪

CoreWeave's infrastructure has always been purpose-built for large-scale GPU-accelerated workloads. Since the beginning, CoreWeave Cloud has been specialized to serve the most demanding AI and machine learning applications. So it only makes sense that CoreWeave will soon be one of the only Cloud platforms in the world offering NVIDIA's most powerful end-to-end AI supercomputing platform.
NVIDIA HGX H100s enable...
  • seven times more efficient high-performance computing (HPC) applications,
  • up to nine times faster AI training on large models,
  • and up to thirty times faster AI inference than the NVIDIA HGX A100!
This speed, combined with the lowest NVIDIA GPUDirect network latency in the market with the NVIDIA Quantum-2 InfiniBand platform, reduces the training time of AI models to "days or hours, instead of months."
HGX H100s will be available in Q1 of 2023!

Launch GPT DeepSpeed Models using Determined AI
🧠

DeepSpeed is an open source deep learning optimization library for PyTorch, designed for low latency and high throughput training while reducing compute power and memory use for the purpose of training large distributed models.
In our new walkthrough, a minimal GPT-NeoX DeepSpeed distributed training job is launched without the additional features such as tracking, metrics, and visualization that DeterminedAI offers.

Multi-namespace support
🎡

CoreWeave Cloud now supports multiple namespaces for organizations!
Kubernetes namespaces provide logical separations of resources within a Kubernetes cluster. While it is typical for CoreWeave client resources to be run inside a single namespace, there are sometimes cases in which more than one namespace within the same organization is required.
CoreWeave Cloud now supports multiple namespaces for organizations, enabled by default!

Accelerated Object Storage

Accelerated Object Storage provides local caching for frequently accessed objects across all CoreWeave data centers. Accelerated Object Storage is especially useful for large scale, multi-region rendering, or for inference auto-scaling where the same data needs to be loaded by hundreds or even thousands of compute nodes.
Import Disk Images from CoreWeave Object Storage
Did you know you can import your own Virtual Disk Images for Virtual Servers right from CoreWeave Object Storage? With the help of our new guide, you can learn how to do just that!

Introducing CoreWeave CoSchedulers
⏱️

In Machine Learning, it is often necessary for all pieces of a project to begin at the same time. In the context of Kubernetes, this means that all Pods must be deployed at the same time.
With CoreWeave CoSchedulers, you can ensure that your Pods are all deployed at once, and that deployments only occur if required resources are already available, thereby eliminating the possibility of partial deployments!

September 2022
🍁

New on CoreWeave Cloud this month:

Self-serve signup for CoreWeave Cloud
✍️

Signing up for an account on CoreWeave Cloud is now easier than ever! With self-serve signup, you can create your own account without additional approval.
Note
Some features are only available through an upgrade request. To increase your quota, or access Kubernetes, log in to your CoreWeave account and navigate to Upgrade Quotas.
NVIDIA Mellanox Quantum leaf switches in the CoreWeave LAS1 datacenter
NVIDIA Mellanox Quantum leaf switches in the CoreWeave LAS1 datacenter
A100 80GB NVLINK SXM4 GPUs are now available in the LAS1 region. These GPUs are provisioned in large clusters, intended for distributed training and inference of LLMs such as BLOOM 176B.
Connectivity between compute hardware, as well as storage, plays a major role in overall system performance for applications of Neural Net Training, Rendering, and Simulation. Certain workloads, such as those used for training massive language models of over 100 billion parameters over hundreds or thousands of GPUs, require the fastest, lowest-latency interconnect.
CoreWeave provides highly optimized IP-over-Ethernet connectivity across all GPUs, and an industry-leading, non-blocking InfiniBand fabric for our top-of-the-line A100 NVLINK GPU fleet. CoreWeave has partnered with NVIDIA in its design of interconnect for A100 HGX training clusters. All CoreWeave A100 NVLINK GPUs offer GPUDirect RDMA over InfiniBand, in addition to standard IP/Ethernet networking.
CoreWeave's InfiniBand topology is fully SHARP compliant, and all components to leverage SHARP are implemented in the network control-plane, such as Adaptive Routing and Aggregation Managers, effectively doubling the performance of a compliant InfiniBand network as compared to a network with similar specifications without in-network computing such as RDMA over Converged Ethernet (RoCE).
A100 NVLINK 80GB GPUs with InfiniBand are now available in the LAS1 (Las Vegas) data center region. A100 NVLINK 40GB GPUs with InfiniBand are available in the ORD1 (Chicago) data center region!
Read more about HPC Interconnect and SHARP on CoreWeave Cloud!

CoreWeave's Private Docker Registry 📦

Customers can now deploy their own private Docker registry from the application Catalog!
Images being hosted inside CoreWeave means no requirement for any subscriptions to external services such as Docker Hub, GitHub or GitLab. Additionally, credentials to pull images are automatically provisioned to a customer's namespace, alleviating the need to fiddle with “image pull secrets” that trip up many first-timers.
As usual with CoreWeave services, there is no charge except for the storage used for images and the minimal compute resources needed to run the registry server.
Head over to the Cloud applications Catalog to deploy a private Docker registry to your namespace!

Rocky Linux is now supported on CoreWeave Cloud
⛰️

Rocky Linux is a premiere, open-source enterprise Operating System, designed to be completely compatible with Red Hat Enterprise Linux®. Tipped to replace CentOS 7 as the leading VFX workstation of choice by the Visual Effects Society survey, Rocky Linux provides a stable platform with a 10-year upstream support lifecycle.

Determined AI is now available in the Applications Catalog
🧠

The Determined AI logo
Determined AI is an open-source deep learning training platform that makes building models fast and easy. Determined AI can now be deployed directly onto CoreWeave Cloud by deploying the application from the application Catalog. With Determined AI, you can launch Jupyter notebooks, interactive shells with VSCode support, and distributed training experiments right from the Web UI and CLI tools. Deploying Determined AI from the CoreWeave applications Catalog makes spinning up an instance fast and easy, and when running, the platform consumes minimal resources and incurs minimal cost.
Find Determined AI in the apps Catalog to learn more about it or deploy an instance to your namespace!

vCluster is now available in the Applications Catalog

For those of you who require or desire more custom control over your Kubernetes Control Plane, the vCluster application is a great solution. With vCluster, you can install your own custom cluster-wide controllers, manage your own custom resource definitions, all without sacrificing the benefits of running on CoreWeave Cloud's bare metal environment.
Find vCluster in the apps Catalog to learn more about it or deploy an instance to your namespace!

New machine learning walkthroughs on CoreWeave Cloud
🧪

It's never been easier to deploy, train, and fine-tune machine learning models on the Cloud for some incredible results, and with our new walkthroughs and examples demonstrating just some of the ways CoreWeave's state-of-the-art compute power can be leveraged for model training, you can start today!:

Introducing Layer 2 VPC
☁️

CoreWeave Cloud Networking (CCNN) is built to handle workloads requiring up to 100Gbps of network connectivity at scale, and it also handles firewalls and Load Balancing via Network Policies. Certain use cases, however, require a deeper level of network control than what is offered by a traditional Cloud network stack. For these users, we are now introducing the CoreWeave Cloud Layer 2 VPC (L2VPC).
L2VPC provides fine-grained customization by relinquishing all control over DHCP servers, and VPN gateways to the user. Virtual Firewalls are also supported and configured by the user - most KVM-compatible firewall images are compatible, allowing you to install your own firewall from the ground up. Installation guides for some of the most popular third-party choices, such as Fortinet's FortiGate, are also provided.
L2VPC is built on top of SR-IOV hardware virtualization technology, retaining the high performance and low latency customers have come to expect from CoreWeave Cloud.

CoreWeave Object Storage is now in beta

Object Storage is coming to CoreWeave! CoreWeave's S3-compatible Object Storage allows for an easy place to store and reference things like Docker images, machine learning models, and any other kinds of objects right within CoreWeave Cloud, streamlining your project workflows! Object storage is priced at only $0.03/GB/mo with no access and egress fees!
Accelerated object storage provides local caching for frequently accessed objects across all CoreWeave datacenters. Accelerated object storage is especially useful for large scale multi region rendering or inference auto-scaling where the same data needs to be loaded by hundreds or thousands of compute-nodes.
This feature is currently in beta, but you can learn more now, and contact your CoreWeave Support Specialist to try it out!

Introducing The Workload Activity Tracker dashboard
📈

Screenshot of the Workload Activity Tracker in action - vertical columns displaying information on Pods, such as their CPU usage and idle status
The Workload Activity Tracker in action
It's an all too common experience to let idle research shells or experiments idle in your namespace after you're done working with them, only to later come back and realize you've been eating resources unnecessarily. Now, with the Workload Activity Tracker dashboard for Grafana, answering "is everything deployed in my namespace doing something?" is never a question you have to worry about.
The Workload Activity Tracker displays which of your Workloads have had activity in the past 24 hours, which are inactive, how many resources they are consuming, and how much cost they're incurring, all in a convenient and concise overview format.

May 2022
🌻

The Release Notes for May 2022 are inclusive of many new features launched since January 2022.

Say Hello to LGA1
🎉

We are pleased to announce the general availability of the CoreWeave LGA1 data center, providing extremely low latency, high performance cloud compute resources to the broader New York City market. Richly connected into the global Tier 1 internet backbone, LGA1 is built for low latency compute intensive use cases that require ultimate reliability and security.
Like all CoreWeave data centers, LGA1 is packed with a broad range of state of the art NVIDIA GPU accelerated cloud compute instances, including the Quadro RTX series, the newest RTX Ampere workstation and A40 data center GPUs. In addition to GPU compute, LGA1 is packed with CPU only instances, and high performance Block and Shared File System storage.
LGA1 is housed in an ISO 27001 certified, SSAE 18 SOC 2 compliant, Energy Star Certified campus, providing the utmost in security and efficiency for your critical workloads.
Try it today by launching a Virtual Server from the CoreWeave Cloud UI!

Increased A100 80GB Capacity
📈

CoreWeave now offers the NVIDIA A100 80GB PCIe, which delivers unprecedented acceleration to power the world’s highest-performing AI, data analytics, and HPC applications. The NVIDIA A100 80GB PCIe accelerator is now available for Kubernetes deployments in ORD1 using the gpu.nvidia.com/model label selector A100_PCIE_80GB.
Coming Soon: CoreWeave is bringing NVIDIA A100 80GB support to the LAS1 region with a deployment of NVIDIA HGX A100 80GB NVLINK servers, built with GPUDirect Infiniband RDMA connectivity for blazing fast GPU to GPU communication.
Reach out to [email protected] today to reserve space on our newest distributed training infrastructure!

View and Manage Storage Volumes
💾

Managing cloud native storage has never been easier. CoreWeave Cloud now provides an easy to use UI to manage your Storage Volumes. Expand and clone your volumes with the click of a button. Learn more about CoreWeave Cloud Storage.

Organization Management
👯

By popular demand, we’ve added support for multiple users per organization and an Organization Management UI to invite and manage these users. Keep an eye on this page - we are regularly updating it with additional improvements and functionality.
Since the start of the year, we've added:
👫 Multi-User Support: Invite and manage users to your Organization.
🔢 Resource Quotas: See how many pods, the number of GPUs, and storage capacity allocated at any time.
Features coming soon:
🔐
RBAC: Permissions and granular control over user access
💼
Multiple Namespaces: Provision multiple namespaces per Organization

Apps Catalog Additions
📋

🕹️ Scalable Pixel Streaming: Stream your Unreal Engine projects to the masses quickly and easily.
🌐 Traefik: Custom ingresses, for use with your own domains.
🚚 ArgoCD: Access to a declarative, GitOps continuous delivery tool for Kubernetes.
🔥 Backblaze: Automate your volume backups to safeguard your data.
Launch any of these new Applications via apps.coreweave.com

Fine-tune Your ML Models
📊

Looking to fine-tune your own ML model on CoreWeave? Check out our new reference tools and examples for models such as GPT-Neo, GPT-J-6B, and Fairseq. Learn how to collect your dataset, which will then be tokenized and fine-tuned on with the parameters you give it, and even set up an endpoint to test your work with.

Kubernetes Log Forwarding

Logs from all your containers to popular aggregation tools such as Loki and DataDog. Click here to learn more.

Better Track API Access Tokens
🗝️

Need to organize your access tokens by user or track what they are being used for? You can now label them at creation from the CoreWeave Cloud UI.

Virtual Server Enhancements
💻

With CloudInit, you can choose your preferred settings in advance and they'll be set up during your instance launch. Plus, we now offer Static MAC Addresses and Serial Number support.

Upgrades to Global Connectivity
🌎

We’ve invested heavily in networking to start 2022, with upgrades to 200Gbps+ Tier 1 transit in each region.
Direct connects up to 100Gbps are now available at all of our data centers, and we’ve installed a CoreWeave Cloud On Ramp in downtown Los Angeles at CoreSite LA2 to accept cross connects back to LAS1.
We’ve also joined the Megaport network at LAS1 and LGA1 for direct, quick software defined connectivity to CoreWeave Cloud.