Release Notes
Feature Updates and Release Notes for CoreWeave Cloud
CoreWeave Tensorizer is a tool for fast PyTorch module, model, and tensor serialization and deserialization, making it possible to load models extremely quickly from HTTP/HTTPS and S3 endpoints. It also speeds up loading from network and local disk volumes.
With faster model loading times for LLMs and reduces GPU memory utilization, Tensorizer helps accelerate model instance spin up times while reducing overall costs to serve inference.
Tensorizer is S3/HTTP-compatible, enabling model streams directly from S3 into the container without having to download the model to the container's local filesystem.
The average latency per request was >5x faster for Tensorizer compared to Hugging Face when scaling from zero, and required fewer pod spin ups and less RAM.
In addition to a brand new blog post about Tensorizer's performance benchmarks, a new tutorial for running a real-world benchmark test is now available to try yourself!

The CoreWeave Cloud UI is now even easier and more intuitive to use! Manage all your resources and account information right from your browser. Additionally, a new guide exploring all of the features of the updated Cloud UI has been added to better introduce you to this feature-rich GUI.

With new namespace access controls, organization administrators can create access tokens with specific namespace permissions, allowing for a greater level of security for organization members. A token with no specified namespace permissions can also be created, granting the organization administrator the ability to create Kubernetes custom RBAC policies.
As per the out of support EOL notice for Ubuntu 18.04 begun at the end of May, CoreWeave no longer supports Ubuntu 18.04. Existing images will not yet be deleted, but no new 18.04 images will be built.
CoreWeave's Tensorizer is an S3 and local filesystem compatible module, model, and tensor serializer and deserializer that makes it possible to load models in less than five seconds, making it easier, more flexible, and more cost-efficient to serve models at scale. Reduce resource usage with flexible iterations.
Single Sign-On, more commonly referred to as SSO, is an authentication scheme that allows the users in an organization to authenticate to CoreWeave Cloud from the same identity provider (IDP) used to log in to other organization-wide apps. Single Sign-On enhances security, and makes for a smoother log-in experience for your team.
CoreWeave currently supports Okta, JumpCloud, and general, generic IDP configurations.
Our sleek new Cloud UI overhaul for Virtual Servers makes creating high-performance virtual machines even easier than ever! And, for those who want even finer-grained control, the new YAML editor allows users to edit the Custom Resource Definition (CRD) directly, allowing for extreme flexibility.

The new Virtual Server UI features a side-by-side YAML editor
With new per-namespace user access controls, your organization admin can now grant users in the organization access controls to one or more namespaces, allowing users the ability to easily spin up new Virtual Servers, allocate storage, and more!
Resource Pools are groups of hardware selections plus memory requests and limits that make it simple to select resource groups for Determined AI deployments, helping users get their Determined AI experiments up and running faster.
DreamBooth is a technique used to teach novel concepts to Stable Diffusion. The DreamBooth method allows you to fine-tune Stable Diffusion on a small number of examples to produce images containing a specific object or person. This method for fine-tuning diffusion models was introduced in a paper publish in 2022, DreamBooth: Fine Tuning Text-to-Image Diffusion Models for Subject-Driven Generation. A lighter introductory text was also released along with the paper in this blog post.
The DreamBooth method is a way to teach a diffusion model about a specific object or style using approximately three to five example images. After the model is fine-tuned on a specific object using DreamBooth, it can produce images containing that object in new settings.
Zeet is a software platform that runs on top of your Cloud account, making it simple for developers to deploy code on production-grade infrastructure. With CoreWeave's Kubernetes-native infrastructure and Zeet's team of Kubernetes engineers, we're helping our clients scale and realize value faster without having to build an entire infrastructure engineering team of their own.
Our partnership allows companies to tap into the industry’s broadest selection of on-demand GPU compute resources and DevOps expertise.
New this month on CoreWeave Cloud...
Big news! We are proud to announce that CoreWeave has become the first Cloud provider in the world to bring the super powerful NVIDIA HGX H100 nodes online!
The NVIDIA HGX H100 enables up to seven times more efficient high-performance computing (HPC) applications, up to nine times faster AI training on large models, and up to thirty times faster AI inference than the NVIDIA HGX A100.
This speed, combined with the lowest NVIDIA GPUDirect network latency in the market with the NVIDIA Quantum-2 InfiniBand platform, reduces the training time of AI models to "days or hours, instead of months." With AI permeating nearly every industry today, this speed and efficiency has never been more vital for HPC applications.
Slurm is the de-facto scheduler for large HPC jobs in supercomputer centers around the world. CoreWeave's Slurm implementation, SUNK ("SlUrm oN Kubernetes"), integrates Slurm with Kubernetes, allowing compute to transition between distributed training in Slurm and applications such as online inference in Kubernetes.
As an implementation of Slurm on Kubernetes deployed on CoreWeave Cloud, SUNK comes complete with options for:
- external Directory Services such as Active Directory
- Slurm Accounting, backed by a MySQL database
- dynamic Slurm node scaling to match your Workload requirements
In SUNK, Slurm images are derived from OCI container images, which execute on bare metal, and compute node resources are allocated using Kubernetes.
Note
CoreWeave maintains several base images for different CUDA versions, including all dependencies for InfiniBand and SHARP. If you'd like to implement SUNK in your cluster, please contact CoreWeave support for engineering support for cluster design and deployment.
Embedding machine learning models directly into images has become a popular ease-of-use technique, but it has made image pull times slower due to the increased size of container images. As a result, pulling images is often the most time-consuming aspect of spinning up new containers, and for those who rely on fast autoscaling to respond to changes in demand, the time it takes to create new containers can pose as a major hurdle.
It's for this reason that CoreWeave Cloud now supports using Nydus, the external plugin for containerd, for shorter container image pull times.
Leveraging its own container image service, Nydus implements a content-addressable filesystem on top of a RAFS format for container images. This formatting allows for major improvements to the current OCI image specification in terms of container launching speed, image space, network bandwidth efficiency, and data integrity. The result: significantly faster container image pull times.
Important
Nydus on CoreWeave is currently an alpha offering, with limited, node-specific release.
The Kubeflow project is dedicated to making deployments of Machine Learning (ML) workflows on Kubernetes simple, portable, and scalable. The goal is not to recreate other services, but to provide a straightforward way to deploy best-of-breed open-source systems for ML to diverse infrastructures. Anywhere you are running Kubernetes, you should be able to run Kubeflow.
CoreWeave is pleased to present new tutorials on using Kubeflow training operators for distributed training on CoreWeave Cloud! Follow along with these walkthroughs to train ResNet-50 with ImageNet, or fine-tune GPT-NeoX-20B with Argo Workflows!
Disk images may be imported from external URLs to be used as source images for root or additional disks for Virtual Servers. In addition to
qcow2
, raw
and iso
formatted images are also supported, and may be compressed with either gz
or xz
.Following our newly published guide, an image stored locally can easily be uploaded to CoreWeave Object Storage, then imported to a
DataVolume
.Hosting your own containerized applications on CoreWeave Cloud is simple! With our new guide for deploying custom containers, you can have your applications running in CoreWeave Cloud in minutes!
New on CoreWeave Cloud this month:
CoreWeave's infrastructure has always been purpose-built for large-scale GPU-accelerated workloads. Since the beginning, CoreWeave Cloud has been specialized to serve the most demanding AI and machine learning applications. So it only makes sense that CoreWeave will soon be one of the only Cloud platforms in the world offering NVIDIA's most powerful end-to-end AI supercomputing platform.
NVIDIA HGX H100s enable...
- seven times more efficient high-performance computing (HPC) applications,
- up to nine times faster AI training on large models,
This speed, combined with the lowest NVIDIA GPUDirect network latency in the market with the NVIDIA Quantum-2 InfiniBand platform, reduces the training time of AI models to "days or hours, instead of months."
HGX H100s will be available in Q1 of 2023!
DeepSpeed is an open source deep learning optimization library for PyTorch, designed for low latency and high throughput training while reducing compute power and memory use for the purpose of training large distributed models.
In our new walkthrough, a minimal GPT-NeoX DeepSpeed distributed training job is launched without the additional features such as tracking, metrics, and visualization that DeterminedAI offers.
CoreWeave Cloud now supports multiple namespaces for organizations!
Kubernetes namespaces provide logical separations of resources within a Kubernetes cluster. While it is typical for CoreWeave client resources to be run inside a single namespace, there are sometimes cases in which more than one namespace within the same organization is required.
CoreWeave Cloud now supports multiple namespaces for organizations, enabled by default!
Accelerated Object Storage provides local caching for frequently accessed objects across all CoreWeave data centers. Accelerated Object Storage is especially useful for large scale, multi-region rendering, or for inference auto-scaling where the same data needs to be loaded by hundreds or even thousands of compute nodes.
Import Disk Images from CoreWeave Object Storage
Did you know you can import your own Virtual Disk Images for Virtual Servers right from CoreWeave Object Storage? With the help of our new guide, you can learn how to do just that!
In Machine Learning, it is often necessary for all pieces of a project to begin at the same time. In the context of Kubernetes, this means that all Pods must be deployed at the same time.
With CoreWeave CoSchedulers, you can ensure that your Pods are all deployed at once, and that deployments only occur if required resources are already available, thereby eliminating the possibility of partial deployments!
New on CoreWeave Cloud this month:
Signing up for an account on CoreWeave Cloud is now easier than ever! With self-serve signup, you can create your own account without additional approval.
Note
Some features are only available through an upgrade request. To increase your quota, or access Kubernetes, log in to your CoreWeave account and navigate to Upgrade Quotas.

NVIDIA Mellanox Quantum leaf switches in the CoreWeave LAS1 datacenter
A100 80GB NVLINK SXM4 GPUs are now available in the LAS1 region. These GPUs are provisioned in large clusters, intended for distributed training and inference of LLMs such as BLOOM 176B.
Connectivity between compute hardware, as well as storage, plays a major role in overall system performance for applications of Neural Net Training, Rendering, and Simulation. Certain workloads, such as those used for training massive language models of over 100 billion parameters over hundreds or thousands of GPUs, require the fastest, lowest-latency interconnect.
CoreWeave provides highly optimized IP-over-Ethernet connectivity across all GPUs, and an industry-leading, non-blocking InfiniBand fabric for our top-of-the-line A100 NVLINK GPU fleet. CoreWeave has partnered with NVIDIA in its design of interconnect for A100 HGX training clusters. All CoreWeave A100 NVLINK GPUs offer GPUDirect RDMA over InfiniBand, in addition to standard IP/Ethernet networking.
CoreWeave's InfiniBand topology is fully SHARP compliant, and all components to leverage SHARP are implemented in the network control-plane, such as Adaptive Routing and Aggregation Managers, effectively doubling the performance of a compliant InfiniBand network as compared to a network with similar specifications without in-network computing such as RDMA over Converged Ethernet (RoCE).
A100 NVLINK 80GB GPUs with InfiniBand are now available in the LAS1 (Las Vegas) data center region. A100 NVLINK 40GB GPUs with InfiniBand are available in the ORD1 (Chicago) data center region!
Images being hosted inside CoreWeave means no requirement for any subscriptions to external services such as Docker Hub, GitHub or GitLab. Additionally, credentials to pull images are automatically provisioned to a customer's namespace, alleviating the need to fiddle with “image pull secrets” that trip up many first-timers.
As usual with CoreWeave services, there is no charge except for the storage used for images and the minimal compute resources needed to run the registry server.
Rocky Linux is a premiere, open-source enterprise Operating System, designed to be completely compatible with Red Hat Enterprise Linux®. Tipped to replace CentOS 7 as the leading VFX workstation of choice by the Visual Effects Society survey, Rocky Linux provides a stable platform with a 10-year upstream support lifecycle.

Determined AI is an open-source deep learning training platform that makes building models fast and easy. Determined AI can now be deployed directly onto CoreWeave Cloud by deploying the application from the application Catalog. With Determined AI, you can launch Jupyter notebooks, interactive shells with VSCode support, and distributed training experiments right from the Web UI and CLI tools. Deploying Determined AI from the CoreWeave applications Catalog makes spinning up an instance fast and easy, and when running, the platform consumes minimal resources and incurs minimal cost.
Find Determined AI in the apps Catalog to learn more about it or deploy an instance to your namespace!
For those of you who require or desire more custom control over your Kubernetes Control Plane, the vCluster application is a great solution. With vCluster, you can install your own custom cluster-wide controllers, manage your own custom resource definitions, all without sacrificing the benefits of running on CoreWeave Cloud's bare metal environment.
It's never been easier to deploy, train, and fine-tune machine learning models on the Cloud for some incredible results, and with our new walkthroughs and examples demonstrating just some of the ways CoreWeave's state-of-the-art compute power can be leveraged for model training, you can start today!:
- PyTorch Hugging Face Diffusers - Stable Diffusion Text to Image: Generating high-quality images with photorealistic qualities from nothing but a text prompt used to be the stuff of science fiction. But now, using the open source model built by our friends at Stability.AI, you can leverage CoreWeave Cloud's compute power to do precisely that with just a few clicks and commands in our latest walkthrough of this cutting-edge AI technology.
- PyTorch Hugging Face Transformers BigScience BLOOM: In the PyTorch Hugging Face Transformers BigScience BLOOM walkthrough, you'll learn how to use the autoregressive Large Language Model (LLM) trained to continue text from a prompt on vast amounts of text data using industrial-scale computational resources. BLOOM is able to output coherent text in 46 languages - and 13 programming languages - whose structure is hardly distinguishable from text written by humans. BLOOM can even be instructed to perform text tasks it hasn't been explicitly trained for by casting them as text generation tasks.
- Triton Inference Server for GPT-J with FasterTransformer: GPT-J is one of the most popular Open Source NLP model. It's size and performance makes it a perfect fit for cost sensitive NLP use cases. In our Triton Inference Server for GPT-J FasterTransformer walkthrough, you'll learn how to leverage FasterTransformer for up to 40% faster GPT-J inference over a vanilla Hugging Face Transformers based implementation.
- Triton Inference Server for GPT-NeoX 20B with FasterTransformer: Together with EleutherAI, CoreWeave trained and released the Open Source GPT-NeoX 20B model in January. We are now taking self-hosted inference of this Large Language Model to the next level by offering a NVIDIA FasterTransformer-based inference option. In our Triton Inference Server for GPT-NeoX 20B walkthrough, you'll learn how to leverage FasterTransformer for up to 40% faster GPT-NeoX inference over a vanilla Hugging Face Transformers based implementation.
- GPT-NeoX fine-tuning: In our new GPT-NeoX fine-tuning walkthrough, using the Determined AI MLOps platform to run distributed fine-tuning jobs, you'll learn how to fine-tune a 20B parameter autoregressive model trained on the Pile dataset to generate text based on context or unconditionally for use cases such as story generation, chat bots, summarization, and more.
CoreWeave Cloud Networking (CCNN) is built to handle workloads requiring up to 100Gbps of network connectivity at scale, and it also handles firewalls and Load Balancing via Network Policies. Certain use cases, however, require a deeper level of network control than what is offered by a traditional Cloud network stack. For these users, we are now introducing the CoreWeave Cloud Layer 2 VPC (L2VPC).
L2VPC provides fine-grained customization by relinquishing all control over DHCP servers, and VPN gateways to the user. Virtual Firewalls are also supported and configured by the user - most KVM-compatible firewall images are compatible, allowing you to install your own firewall from the ground up. Installation guides for some of the most popular third-party choices, such as Fortinet's FortiGate, are also provided.
L2VPC is built on top of SR-IOV hardware virtualization technology, retaining the high performance and low latency customers have come to expect from CoreWeave Cloud.
Object Storage is coming to CoreWeave! CoreWeave's S3-compatible Object Storage allows for an easy place to store and reference things like Docker images, machine learning models, and any other kinds of objects right within CoreWeave Cloud, streamlining your project workflows! Object storage is priced at only $0.03/GB/mo with no access and egress fees!
Accelerated object storage provides local caching for frequently accessed objects across all CoreWeave datacenters. Accelerated object storage is especially useful for large scale multi region rendering or inference auto-scaling where the same data needs to be loaded by hundreds or thousands of compute-nodes.
This feature is currently in beta, but you can learn more now, and contact your CoreWeave Support Specialist to try it out!

The Workload Activity Tracker in action
It's an all too common experience to let idle research shells or experiments idle in your namespace after you're done working with them, only to later come back and realize you've been eating resources unnecessarily. Now, with the Workload Activity Tracker dashboard for Grafana, answering "is everything deployed in my namespace doing something?" is never a question you have to worry about.
The Workload Activity Tracker displays which of your Workloads have had activity in the past 24 hours, which are inactive, how many resources they are consuming, and how much cost they're incurring, all in a convenient and concise overview format.
The Release Notes for May 2022 are inclusive of many new features launched since January 2022.
We are pleased to announce the general availability of the CoreWeave LGA1 data center, providing extremely low latency, high performance cloud compute resources to the broader New York City market. Richly connected into the global Tier 1 internet backbone, LGA1 is built for low latency compute intensive use cases that require ultimate reliability and security.
Like all CoreWeave data centers, LGA1 is packed with a broad range of state of the art NVIDIA GPU accelerated cloud compute instances, including the Quadro RTX series, the newest RTX Ampere workstation and A40 data center GPUs. In addition to GPU compute, LGA1 is packed with CPU only instances, and high performance Block and Shared File System storage.
LGA1 is housed in an ISO 27001 certified, SSAE 18 SOC 2 compliant, Energy Star Certified campus, providing the utmost in security and efficiency for your critical workloads.
CoreWeave now offers the NVIDIA A100 80GB PCIe, which delivers unprecedented acceleration to power the world’s highest-performing AI, data analytics, and HPC applications. The NVIDIA A100 80GB PCIe accelerator is now available for Kubernetes deployments in ORD1 using the
gpu.nvidia.com/model
label selector A100_PCIE_80GB
.Coming Soon: CoreWeave is bringing NVIDIA A100 80GB support to the LAS1 region with a deployment of NVIDIA HGX A100 80GB NVLINK servers, built with GPUDirect Infiniband RDMA connectivity for blazing fast GPU to GPU communication.
Reach out to [email protected] today to reserve space on our newest distributed training infrastructure!
Managing cloud native storage has never been easier. CoreWeave Cloud now provides an easy to use UI to manage your Storage Volumes. Expand and clone your volumes with the click of a button. Learn more about CoreWeave Cloud Storage.
By popular demand, we’ve added support for multiple users per organization and an Organization Management UI to invite and manage these users. Keep an eye on this page - we are regularly updating it with additional improvements and functionality.
Since the start of the year, we've added:
👫 Multi-User Support: Invite and manage users to your Organization.
🔢 Resource Quotas: See how many pods, the number of GPUs, and storage capacity allocated at any time.
Features coming soon:
RBAC: Permissions and granular control over user access
🔐
Multiple Namespaces: Provision multiple namespaces per Organization
💼
🕹️ Scalable Pixel Streaming: Stream your Unreal Engine projects to the masses quickly and easily.
🌐 Traefik: Custom ingresses, for use with your own domains.
🚚 ArgoCD: Access to a declarative, GitOps continuous delivery tool for Kubernetes.
🔥 Backblaze: Automate your volume backups to safeguard your data.
Looking to fine-tune your own ML model on CoreWeave? Check out our new reference tools and examples for models such as GPT-Neo, GPT-J-6B, and Fairseq. Learn how to collect your dataset, which will then be tokenized and fine-tuned on with the parameters you give it, and even set up an endpoint to test your work with.
Logs from all your containers to popular aggregation tools such as Loki and DataDog. Click here to learn more.
Need to organize your access tokens by user or track what they are being used for? You can now label them at creation from the CoreWeave Cloud UI.
With CloudInit, you can choose your preferred settings in advance and they'll be set up during your instance launch. Plus, we now offer Static MAC Addresses and Serial Number support.
We’ve invested heavily in networking to start 2022, with upgrades to 200Gbps+ Tier 1 transit in each region.
Direct connects up to 100Gbps are now available at all of our data centers, and we’ve installed a CoreWeave Cloud On Ramp in downtown Los Angeles at CoreSite LA2 to accept cross connects back to LAS1.
We’ve also joined the Megaport network at LAS1 and LGA1 for direct, quick software defined connectivity to CoreWeave Cloud.