Documentation Index
Fetch the complete documentation index at: https://docs.coreweave.com/llms.txt
Use this file to discover all available pages before exploring further.
CoreWeave Inference is now available, providing multiple ways to deploy and serve AI models on CoreWeave GPU infrastructure.
Serverless Inference
CoreWeave Serverless Inference lets you deploy and serve AI models without provisioning or managing the underlying infrastructure. CoreWeave handles scaling, routing, and resource allocation automatically. To learn more, go to About Serverless Inference.
Dedicated Inference
CoreWeave Dedicated Inference lets you deploy custom model weights on dedicated GPU infrastructure through managed OpenAI-compatible API endpoints. Choose your GPU type and inference runtime, such as vLLM or SGLang, and CoreWeave manages deployment, scaling, and request routing. To learn more, go to About Dedicated Inference.
Inference on CKS
Inference on CKS gives you full control over your inference deployment stack using CoreWeave Kubernetes Service. Deploy inference runtimes, configure networking, and manage scaling directly through Kubernetes resources on CoreWeave GPU infrastructure. To learn more, go to About Inference on CKS.