Skip to main content

Documentation Index

Fetch the complete documentation index at: https://docs.coreweave.com/llms.txt

Use this file to discover all available pages before exploring further.

CoreWeave Inference is now available, providing multiple ways to deploy and serve AI models on CoreWeave GPU infrastructure.

Serverless Inference

CoreWeave Serverless Inference lets you deploy and serve AI models without provisioning or managing the underlying infrastructure. CoreWeave handles scaling, routing, and resource allocation automatically. To learn more, go to About Serverless Inference.

Dedicated Inference

CoreWeave Dedicated Inference lets you deploy custom model weights on dedicated GPU infrastructure through managed OpenAI-compatible API endpoints. Choose your GPU type and inference runtime, such as vLLM or SGLang, and CoreWeave manages deployment, scaling, and request routing. To learn more, go to About Dedicated Inference.

Inference on CKS

Inference on CKS gives you full control over your inference deployment stack using CoreWeave Kubernetes Service. Deploy inference runtimes, configure networking, and manage scaling directly through Kubernetes resources on CoreWeave GPU infrastructure. To learn more, go to About Inference on CKS.
Last modified on April 8, 2026