About CoreWeave Inference - CoreWeave Docs

CoreWeave Inference is a family of model-serving products that run on CoreWeave GPU infrastructure. Choose the deployment option that matches how much control you want over the serving stack: from fully managed (Serverless) to fully self-managed (CKS), with a managed-but-configurable middle ground (Dedicated). All three options expose OpenAI API-compatible endpoints, so existing tools and applications that work with the OpenAI API can connect with minimal changes.

Choose a deployment option

Serverless Inference

Run inference without managing infrastructure. CoreWeave handles provisioning, scaling, and operations automatically. Choose from a CoreWeave-managed catalog of models.

Dedicated Inference

Bring your own model weights and deploy on dedicated GPU infrastructure. Full control over gateways, scaling, and capacity reservations, while CoreWeave manages the underlying clusters.

Inference on CKS

Run inference workloads on CoreWeave Kubernetes Service with complete control over your deployment stack, runtimes, and networking.

Key capabilities

CoreWeave Inference provides the following capabilities across all options:

NVIDIA GPU access: Run on H100, B200, A100, and other NVIDIA GPU types.
OpenAI-compatible endpoints: Connect with existing OpenAI client libraries, agents, and tooling.
CoreWeave platform integration: Use the same accounts, Identity and Access Management (IAM), observability, and billing as the rest of CoreWeave.

The deployment options differ in how much of the stack CoreWeave manages, what model catalog is available, and how you configure scaling and routing. For details, see each product page.

Access the API

You configure Dedicated Inference through the CoreWeave Inference management API, available over:

REST/JSON: HTTP/1.1 with JSON, available at api.coreweave.com.
gRPC: Protocol buffers over HTTP/2.
Terraform: Manage inference resources through the CoreWeave Terraform Provider.

For API details, see the Inference API reference. For Serverless Inference and Inference on CKS, see the linked product pages for their configuration surfaces.

Pricing

The billing model depends on the deployment option you choose:

Serverless Inference: pay-per-token billing for catalog models.
Dedicated Inference: GPU-hour or node-based billing, depending on your contract. For details, see Dedicated Inference pricing.
Inference on CKS: standard CKS billing (GPU-hour or reserved node).

For the latest published rates, see the CoreWeave pricing page.

Get started

For Dedicated Inference, follow Getting started with Inference to create a gateway, deploy a model, and send your first inference request. For Serverless Inference and Inference on CKS, see the linked product pages in Choose a deployment option.

Last modified on July 14, 2026

​Choose a deployment option

Serverless Inference

Dedicated Inference

Inference on CKS

​Key capabilities

​Access the API

​Pricing

​Get started

Choose a deployment option

Key capabilities

Access the API

Pricing

Get started