Skip to main content

Get Started with ML and AI

Learn what makes CoreWeave special for machine learning and AI applications

CoreWeave Cloud's infrastructure is purpose-built from the ground up for machine learning use cases. Model training, hosting, fine-tuning, as well as inference serving, are all made simple on CoreWeave.

Tip

Because CoreWeave is optimized for ML and AI workloads, it is highly recommended to follow our onboarding and best practices guides for both model training and inference in order to get the most out of CoreWeave infrastructure.

Specialized hardware

💪 NVIDIA HGX H100

The NVIDIA HGX H100 enables up to seven times more efficient high-performance computing (HPC) applications, up to nine times faster AI training on large models, and up to thirty times faster AI inference than the NVIDIA HGX A100.

This speed, combined with the lowest NVIDIA GPUDirect network latency in the market with the NVIDIA Quantum-2 InfiniBand platform, reduces the training time of AI models to "days or hours, instead of months." With AI permeating nearly every industry today, this speed and efficiency has never been more vital for HPC applications.

Specialized infrastructure

For model training and fine-tuning

Training Machine Learning models, especially models of modern Deep Neural Networks, is at the center of CoreWeave Cloud's architecture. The entire CoreWeave Cloud stack is purpose-built to enable highly scalable, cost-efficient model training.

In addition to its core tech stack, CoreWeave has a history of supporting our customers with cutting-edge Machine Learning research through in-house expertise, industry partnerships, and contributions to research organizations. Our team has extensive experience in training large transformer architectures, optimizing Hugging Face code, and selecting the right hardware for any given job.

For inference

CoreWeave Cloud's inference engine autoscales containers based on demand in order to swiftly fulfill user requests. Then, containers are automatically scaled down according to load so as to preserve GPU resources and reduce costs for inactive resources.

This quick autoscale allows for a significantly more responsive service than that of other Cloud providers - on CoreWeave, allocating new resources and scaling up a container can take just seconds - for example, spinning up the 6B GPT-J model can take just fifteen seconds on CoreWeave Cloud.

Additionally, CoreWeave Cloud's inference stack is backed by well-supported, industry standard Open Source tools.

  • Knative Serving acts as our serverless runtime, managing autoscaling, revision control, and canary deployments.
  • KServe provides an easy-to-use interface via Kubernetes resource definitions for deploying models without the fuss of correctly configuring the underlying framework (such as Tensorflow).

Specialized tooling

On top of cutting-edge infrastructure, CoreWeave also provides a suite of powerful, flexible tooling for ML use cases.

🖥 Virtual Servers

Virtual Servers are virtual desktops accessible from anywhere. These are the most "vanilla" of CoreWeave's compute offerings, and are ideal for experiments utilizing few GPUs using a familiar environment. However, administrative and performance overheads make them less desirable for distributed tasks.

Kubernetes

Our Kubernetes offering differs from most of the other leading Cloud providers by offering a fully managed cluster with thousands of GPUs pre-populated and ready to use. Kubernetes access gives experienced MLOps teams the power to deploy their own stacks in a bare metal container environment. With RDMA GPUDirect InfiniBand, CoreWeave's Kubernetes environment fully supports massive distributed training on our NVIDIA A100 HGX clusters. Plus, there's no need to worry about cluster scaling or idle virtual machines incurring costs while inactive - charges are incurred only for what is actually used.

SUNK: Slurm on Kubernetes

Slurm is the de-facto scheduler for large HPC jobs in supercomputing centers, government laboratories, universities, and companies worldwide. It performs workload management for more than half of the fastest 10 systems on the TOP500 list.

SUNK (Slurm on Kubernetes) is an implementation of Slurm which is deployed on Kubernetes via a Helm chart.

Note

SUNK is currently available for reserved instance customers only. Please contact support for more information.

🗄 High-performance storage

CoreWeave makes it easy to host models from a range of highly performant storage backends, including S3-compatible object storage, HTTP, or persistent Storage Volumes of different types, offering flexible configurations tailorable for your use case.