Skip to main content

Documentation Index

Fetch the complete documentation index at: https://docs.coreweave.com/llms.txt

Use this file to discover all available pages before exploring further.

CoreWeave Inference is a family of model-serving products that run on CoreWeave GPU infrastructure. Choose the deployment option that matches how much control you want over the serving stack: from fully managed (Serverless) to fully self-managed (CKS), with a managed-but-configurable middle ground (Dedicated). All three options expose OpenAI API-compatible endpoints, so existing tools and applications that work with the OpenAI API can connect with minimal changes.
CoreWeave Inference is available as a private preview. To request access, contact your CoreWeave representative.

Choose a deployment option

Serverless Inference

Run inference without managing infrastructure. CoreWeave handles provisioning, scaling, and operations automatically. Choose from a CoreWeave-managed catalog of models.

Dedicated Inference

Bring your own model weights and deploy on dedicated GPU infrastructure. Full control over gateways, scaling, and capacity reservations, while CoreWeave manages the underlying clusters.

Inference on CKS

Run inference workloads on CoreWeave Kubernetes Service with complete control over your deployment stack, runtimes, and networking.

Key capabilities

CoreWeave Inference provides the following capabilities across all options:
  • NVIDIA GPU access: Run on H100, B200, A100, and other NVIDIA GPU types.
  • OpenAI-compatible endpoints: Connect with existing OpenAI client libraries, agents, and tooling.
  • CoreWeave platform integration: Use the same accounts, IAM, observability, and billing as the rest of CoreWeave.
The deployment options differ in how much of the stack CoreWeave manages, what model catalog is available, and how you configure scaling and routing. See each product page for details.

Access the API

Dedicated Inference is configured through the CoreWeave Inference management API, available over:
  • REST/JSON: HTTP/1.1 with JSON, available at api.coreweave.com.
  • gRPC: Protocol buffers over HTTP/2.
  • Terraform: Manage inference resources through the CoreWeave Terraform Provider.
For API details, see the Inference API reference. For Serverless Inference and Inference on CKS, see the linked product pages for their configuration surfaces.

Pricing

CoreWeave Inference has no additional platform fees. You are billed only for the underlying GPU or node usage consumed by your workloads. The exact billing model depends on the deployment option you choose:
  • Serverless Inference: Pay-per-token billing for catalog models.
  • Dedicated Inference: GPU-hour or node-based billing, depending on your contract. See Dedicated Inference pricing for details.
  • Inference on CKS: Standard CKS billing (GPU-hour or reserved node).
For the latest published rates, see the CoreWeave pricing page.

Get started

For Dedicated Inference, follow Getting started with Inference to create a gateway, deploy a model, and send your first inference request. For Serverless Inference and Inference on CKS, see the linked product pages above.
Last modified on May 6, 2026