Get Started with ML and AI
Learn what makes CoreWeave special for machine learning and AI applications
CoreWeave Cloud's infrastructure is purpose-built from the ground up for machine learning use cases. Model training, hosting, fine-tuning, as well as inference serving, are all made simple on CoreWeave.
Because CoreWeave is optimized for ML and AI workloads, it is highly recommended to follow our onboarding and best practices guides for both model training and inference in order to get the most out of CoreWeave infrastructure.
Specialized hardware
💪 NVIDIA HGX H100
The NVIDIA HGX H100 enables up to seven times more efficient high-performance computing (HPC) applications, up to nine times faster AI training on large models, and up to thirty times faster AI inference than the NVIDIA HGX A100.
This speed, combined with the lowest NVIDIA GPUDirect network latency in the market with the NVIDIA Quantum-2 InfiniBand platform, reduces the training time of AI models to "days or hours, instead of months." With AI permeating nearly every industry today, this speed and efficiency has never been more vital for HPC applications.
Specialized infrastructure
For model training and fine-tuning
Training Machine Learning models, especially models of modern Deep Neural Networks, is at the center of CoreWeave Cloud's architecture. The entire CoreWeave Cloud stack is purpose-built to enable highly scalable, cost-efficient model training.
- Bare-metal nodes sport a wide range of NVIDIA GPUs to offer top-of-the-line intensive compute power.
- The CoreWeave network stack features InfiniBand Interconnect, allowing for extremely fast, low-latency network connections.
- High-performance, network-attached storage loads and writes checkpoints at terabit speeds to our software control plane, enabling large distributed training jobs to be scaled up in seconds.
In addition to its core tech stack, CoreWeave has a history of supporting our customers with cutting-edge Machine Learning research through in-house expertise, industry partnerships, and contributions to research organizations. Our team has extensive experience in training large transformer architectures, optimizing Hugging Face code, and selecting the right hardware for any given job.
For inference
CoreWeave Cloud's inference engine autoscales containers based on demand in order to swiftly fulfill user requests. Then, containers are automatically scaled down according to load so as to preserve GPU resources and reduce costs for inactive resources.
This quick autoscale allows for a significantly more responsive service than that of other Cloud providers - on CoreWeave, allocating new resources and scaling up a container can take just seconds - for example, spinning up the 6B GPT-J model can take just fifteen seconds on CoreWeave Cloud.
Additionally, CoreWeave Cloud's inference stack is backed by well-supported, industry standard Open Source tools.
- Knative Serving acts as our serverless runtime, managing autoscaling, revision control, and canary deployments.
- KServe provides an easy-to-use interface via Kubernetes resource definitions for deploying models without the fuss of correctly configuring the underlying framework (such as Tensorflow).
Specialized tooling
On top of cutting-edge infrastructure, CoreWeave also provides a suite of powerful, flexible tooling for ML use cases.
🖥 Virtual Servers
Virtual Servers are virtual desktops accessible from anywhere. These are the most "vanilla" of CoreWeave's compute offerings, and are ideal for experiments utilizing few GPUs using a familiar environment. However, administrative and performance overheads make them less desirable for distributed tasks.
⛵ Kubernetes
Our Kubernetes offering differs from most of the other leading Cloud providers by offering a fully managed cluster with thousands of GPUs pre-populated and ready to use. Kubernetes access gives experienced MLOps teams the power to deploy their own stacks in a bare metal container environment. With RDMA GPUDirect InfiniBand, CoreWeave's Kubernetes environment fully supports massive distributed training on our NVIDIA A100 HGX clusters. Plus, there's no need to worry about cluster scaling or idle virtual machines incurring costs while inactive - charges are incurred only for what is actually used.
⚓ SUNK: Slurm on Kubernetes
Slurm is the de-facto scheduler for large HPC jobs in supercomputing centers, government laboratories, universities, and companies worldwide. It performs workload management for more than half of the fastest 10 systems on the TOP500 list.
SUNK (Slurm on Kubernetes) is an implementation of Slurm which is deployed on Kubernetes via a Helm chart.
SUNK is currently available for reserved instance customers only. Please contact support for more information.
🗄 High-performance storage
CoreWeave makes it easy to host models from a range of highly performant storage backends, including S3-compatible object storage, HTTP, or persistent Storage Volumes of different types, offering flexible configurations tailorable for your use case.