Skip to main content

Documentation Index

Fetch the complete documentation index at: https://docs.coreweave.com/llms.txt

Use this file to discover all available pages before exploring further.

The CoreWeave Inference API provides programmatic control over inference gateways, model deployments, and capacity claims. The API is available at api.coreweave.com. This page covers cross-cutting topics: authentication, protocols, status values, error formats, and the OpenAI-compatible inference endpoint. For per-endpoint request and response schemas, see the per-operation pages under each service in the left sidebar:
The Inference API is versioned as v1alpha1. APIs may change before general availability.

Authentication

All API requests must include a CoreWeave API access token in the Authorization header as a Bearer token. The token must belong to a user with the Inference Viewer or Inference Admin role, depending on the operation.
curl "https://api.coreweave.com/v1alpha1/inference/gateways" \
  -H "Authorization: Bearer [API-TOKEN]"
For details on obtaining an API token, see Manage API access tokens.

Protocol support

The Inference API supports multiple protocols:
ProtocolDescription
REST/JSONStandard HTTP/1.1 with JSON request and response bodies. All examples in this documentation use REST.
gRPCProtocol buffers over HTTP/2 for high-performance programmatic access.
ConnectgRPC-compatible protocol with improved browser and HTTP/1.1 support.

Query parameters

List endpoints support the following query parameter:
ParameterTypeDescription
updatedAfterdate-timeFilter resources to those updated after the specified timestamp (ISO 8601 format).

Status values

All resources share a common set of status values:
StatusDescription
STATUS_CREATINGResource is being provisioned.
STATUS_READYResource is active and operational.
STATUS_UPDATINGResource configuration is being updated.
STATUS_DELETINGResource is being removed.
STATUS_ERRORResource encountered a recoverable error.
STATUS_FAILEDResource encountered a terminal error.
Each resource includes a conditions array in its status with detailed information about the current state, including timestamps, reasons, and human-readable messages.

Error responses

Error responses follow the standard format:
{
  "code": 400,
  "message": "name is required",
  "details": []
}

OpenAI-compatible endpoint

Deployed models expose an OpenAI-compatible completions endpoint through their associated gateway. The endpoint URL depends on the gateway’s routing strategy. See Gateways for details on how each routing strategy constructs the request URL. For a complete walkthrough, see the Getting started guide.
Last modified on May 6, 2026