March 2023
New this month on CoreWeave Cloud...
🎉 HGX H100 nodes are now online!​
Big news! We are proud to announce that CoreWeave has become the first Cloud provider in the world to bring the super powerful NVIDIA HGX H100 nodes online!
The NVIDIA HGX H100 enables up to seven times more efficient high-performance computing (HPC) applications, up to nine times faster AI training on large models, and up to thirty times faster AI inference than the NVIDIA HGX A100.
This speed, combined with the lowest NVIDIA GPUDirect network latency in the market with the NVIDIA Quantum-2 InfiniBand platform, reduces the training time of AI models to "days or hours, instead of months." With AI permeating nearly every industry today, this speed and efficiency has never been more vital for HPC applications.
⚓ Introducing SUNK: Slurm on Kubernetes​
Slurm is the de-facto scheduler for large HPC jobs in supercomputer centers around the world. CoreWeave's Slurm implementation, SUNK ("SlUrm oN Kubernetes"), integrates Slurm with Kubernetes, allowing compute to transition between distributed training in Slurm and applications such as online inference in Kubernetes.
As an implementation of Slurm on Kubernetes deployed on CoreWeave Cloud, SUNK comes complete with options for:
- external Directory Services such as Active Directory
- Slurm Accounting, backed by a MySQL database
- dynamic Slurm node scaling to match your Workload requirements
In SUNK, Slurm images are derived from OCI container images, which execute on bare metal, and compute node resources are allocated using Kubernetes.
CoreWeave maintains several base images for different CUDA versions, including all dependencies for InfiniBand and SHARP. If you'd like to implement SUNK in your cluster, please contact CoreWeave support for engineering support for cluster design and deployment.
⚡ Nydus is now on CoreWeave!​
Embedding machine learning models directly into images has become a popular ease-of-use technique, but it has made image pull times slower due to the increased size of container images. As a result, pulling images is often the most time-consuming aspect of spinning up new containers, and for those who rely on fast autoscaling to respond to changes in demand, the time it takes to create new containers can pose as a major hurdle.
It's for this reason that CoreWeave Cloud now supports using Nydus, the external plugin for containerd, for shorter container image pull times.
Leveraging its own container image service, Nydus implements a content-addressable filesystem on top of a RAFS format for container images. This formatting allows for major improvements to the current OCI image specification in terms of container launching speed, image space, network bandwidth efficiency, and data integrity. The result: significantly faster container image pull times.
Nydus on CoreWeave is currently an alpha offering, with limited, node-specific release.
💪 Distributed training using Kubeflow operators​
The Kubeflow project is dedicated to making deployments of Machine Learning (ML) workflows on Kubernetes simple, portable, and scalable. The goal is not to recreate other services, but to provide a straightforward way to deploy best-of-breed open-source systems for ML to diverse infrastructures. Anywhere you are running Kubernetes, you should be able to run Kubeflow.
CoreWeave is pleased to present new tutorials on using Kubeflow training operators for distributed training on CoreWeave Cloud! Follow along with these walkthroughs to train ResNet-50 with ImageNet, or fine-tune GPT-NeoX-20B with Argo Workflows!
💽 Import disk images using CoreWeave Object Storage​
Disk images may be imported from external URLs to be used as source images for root or additional disks for Virtual Servers. In addition to qcow2
, raw
and iso
formatted images are also supported, and may be compressed with either gz
or xz
.
Following our newly published guide, an image stored locally can easily be uploaded to CoreWeave Object Storage, then imported to a DataVolume
.
🚢 Deploy custom containers on CoreWeave Cloud​
Hosting your own containerized applications on CoreWeave Cloud is simple! With our new guide for deploying custom containers, you can have your applications running in CoreWeave Cloud in minutes!