Skip to main content

Documentation Index

Fetch the complete documentation index at: https://docs.coreweave.com/llms.txt

Use this file to discover all available pages before exploring further.

A gateway is the entry point for inference requests in CoreWeave Inference. Each gateway provides a routable endpoint that handles authentication, request routing, load balancing, and traffic splitting across one or more model deployments. This page explains how to configure gateway authentication, routing strategies, zones, and traffic splitting.

What gateways do

When you create a gateway, CoreWeave provisions an external-facing endpoint that clients use to send inference requests. The gateway manages several responsibilities:
  • Authentication: Validates incoming requests using the configured authentication provider before forwarding them to model deployments.
  • Request routing: Directs requests to the correct deployment based on the configured routing strategy.
  • Load balancing: Distributes requests across multiple replicas of a deployment to optimize throughput and latency.
  • Traffic splitting: Splits traffic across multiple deployments that share the same model name, enabling A/B testing and canary rollouts between model versions.

Authentication

Every gateway must be configured with exactly one authentication type.

CoreWeave IAM authentication

CoreWeave IAM authentication (coreWeaveAuth) validates requests using CoreWeave API access tokens. This is the same token used to manage inference resources through the Inference API. No additional configuration is required.
{
  "coreWeaveAuth": {}
}

W&B authentication

W&B authentication (weightsAndBiasesAuth) validates requests using W&B SaaS credentials. This option supports optional usage reporting and rate limiting through the W&B platform.
W&B authentication requires a W&B SaaS account. W&B self-hosted (non-SaaS) isn’t currently supported. W&B SaaS authentication is in preview. Contact CoreWeave support if you encounter errors.
{
  "weightsAndBiasesAuth": {
    "enableUsageReports": true,
    "enableRateLimiting": false
  }
}
FieldDescription
enableUsageReportsSend inference usage data to W&B for tracking.
enableRateLimitingEnable W&B-controlled rate limiting for inference requests.

Routing strategies

Every gateway must be configured with exactly one routing strategy. The routing strategy determines how the gateway identifies which deployment should handle each request.
StrategyWhere the model name comes fromUse when
Body-based routingThe model field in the JSON request body.You want OpenAI-compatible behavior. This is the default.
Path-based routingThe first segment of the URL path.You need to route without inspecting the request body.
Header-based routingA custom HTTP header you specify on the gateway.You want to route on a header value separate from the request body or URL.

Body-based routing

Body-based routing extracts the model name from the model field in the JSON request body. This is the default routing strategy and is compatible with OpenAI API conventions.
{
  "bodyBasedRouting": {
    "apiType": "API_TYPE_OPENAI"
  }
}
With body-based routing, clients send requests to the gateway endpoint directly. The gateway reads the model field to determine which deployment handles the request:
curl -X POST "${CW_GATEWAY_ENDPOINT}/v1/chat/completions" \
  -H "Authorization: Bearer ${CW_API_TOKEN}" \
  -H "Content-Type: application/json" \
  -d '{
    "model": "my-model",
    "messages": [{"role": "user", "content": "Hello"}]
  }'

Path-based routing

Path-based routing uses the model name as the first segment of the URL path. This approach is useful when you need to route requests without inspecting the request body.
{
  "pathBasedRouting": {}
}
With path-based routing, the model name appears in the URL. If the model name contains special characters, URL-encode it before inserting it into the path (for example, my%2Fmodel for my/model).
curl -X POST "${CW_GATEWAY_ENDPOINT}/my-model/v1/chat/completions" \
  -H "Authorization: Bearer ${CW_API_TOKEN}" \
  -H "Content-Type: application/json" \
  -d '{
    "model": "my-model",
    "messages": [{"role": "user", "content": "Hello"}]
  }'

Header-based routing

Header-based routing uses a custom HTTP header to identify the target deployment. You specify the header name when creating the gateway.
{
  "headerBasedRouting": {
    "headerName": "X-Model-Name"
  }
}
The headerName field accepts values between 1 and 100 characters. With header-based routing, clients include the model name in the specified header:
curl -X POST "${CW_GATEWAY_ENDPOINT}/v1/chat/completions" \
  -H "Authorization: Bearer ${CW_API_TOKEN}" \
  -H "Content-Type: application/json" \
  -H "X-Model-Name: my-model" \
  -d '{
    "model": "my-model",
    "messages": [{"role": "user", "content": "Hello"}]
  }'

Zones

When you create a gateway, you must specify the CoreWeave Availability Zone where the gateway is deployed. The zones field accepts an array, but gateways are currently limited to one zone. To list the available zones, query the gateway parameters endpoint:
curl "${CW_BASE_URL}/v1alpha1/inference/gateways/parameters" \
  -H "Authorization: Bearer ${CW_API_TOKEN}"

Gateway endpoints

After a gateway reaches STATUS_READY, it exposes one or more endpoint URLs in the status.endpoints field. The endpoint URL follows the pattern:
https://api.[GATEWAY-ID].gw.cwinference.com
Retrieve the endpoint URL by querying the gateway:
curl "${CW_BASE_URL}/v1alpha1/inference/gateways/${CW_GATEWAY_ID}" \
  -H "Authorization: Bearer ${CW_API_TOKEN}"
To configure additional DNS names for a gateway, use the endpointConfiguration.additionalDns field. If you configure additional DNS names, you must manually point them to the gateway endpoint.

Traffic splitting

When multiple deployments share the same model name and are associated with the same gateway, the gateway splits traffic between them based on the traffic.weight field on each deployment. Weights are normalized into percentages. For example, the following weights configure a canary deployment:
  • Production deployment: traffic.weight = 900 (receives 90% of traffic)
  • Canary deployment: traffic.weight = 100 (receives 10% of traffic)
For more information about traffic weights, see Models and deployments.

Gateway lifecycle

Gateways go through the following states:
StatusDescription
STATUS_CREATINGCoreWeave is provisioning the gateway.
STATUS_READYThe gateway is ready to receive traffic.
STATUS_UPDATINGCoreWeave is updating the gateway configuration.
STATUS_DELETINGCoreWeave is removing the gateway.
STATUS_ERRORThe gateway has an error. Check status.conditions for details.

Manage gateways

Manage gateways through the CoreWeave Inference API. For per-operation request and response schemas, see the GatewayService pages in the API reference. For a complete walkthrough of creating a gateway, see the Getting started guide.
Last modified on May 6, 2026