Welcome to Inference on CoreWeave
Machine learning is one of the most popular applications of CoreWeave Cloud's state-of-the-art infrastructure. Models are easily hosted on CoreWeave, and can be sourced from a range of storage backends including S3-compatible object storage, HTTP, or persistent Storage Volumes.
CoreWeave Cloud's Inference engine autoscales containers based on demand in order to to swiftly fulfill user requests, then scales down according to load so as to preserve GPU resources. Allocating new resources and scaling up new containers can be as fast as 15 seconds for the 6B GPT-J model.
Allocating new resources and scaling up a container can be as fast as fifteen seconds for the 6B GPT-J model. This quick autoscale allows for a significantly more responsive service than that of other Cloud providers.