March 31, 2026 - CoreWeave Inference

CoreWeave Inference is now available, providing multiple ways to deploy and serve AI models on CoreWeave GPU infrastructure.

Serverless Inference

CoreWeave Serverless Inference lets you deploy and serve AI models without provisioning or managing the underlying infrastructure. CoreWeave handles scaling, routing, and resource allocation automatically. To learn more, go to About Serverless Inference.

Dedicated Inference

CoreWeave Dedicated Inference lets you deploy custom model weights on dedicated GPU infrastructure through managed OpenAI-compatible API endpoints. Choose your GPU type and inference runtime, such as vLLM or SGLang, and CoreWeave manages deployment, scaling, and request routing. To learn more, go to About Dedicated Inference.

Inference on CKS

Inference on CKS gives you full control over your inference deployment stack using CoreWeave Kubernetes Service. Deploy inference runtimes, configure networking, and manage scaling directly through Kubernetes resources on CoreWeave GPU infrastructure. To learn more, go to About Inference on CKS.

​Serverless Inference

​Dedicated Inference

​Inference on CKS

Serverless Inference

Dedicated Inference

Inference on CKS