Skip to main content
A gateway is the entry point for inference requests in CoreWeave Inference. Each gateway provides a routable endpoint that handles authentication, request routing, load balancing, and traffic splitting across one or more model deployments. This page explains how gateways work and how to configure their authentication, routing strategies, zones, endpoints, and traffic splitting. Use it to decide which options fit your inference workload.

What gateways do

When you create a gateway, CoreWeave provisions an external-facing endpoint that clients use to send inference requests. The gateway manages several responsibilities:
  • Authentication: Validates incoming requests using the configured authentication provider before forwarding them to model deployments.
  • Request routing: Directs requests to the correct deployment based on the configured routing strategy.
  • Load balancing: Distributes requests across multiple replicas of a deployment to optimize throughput and latency.
  • Traffic splitting: Splits traffic across multiple deployments that share the same model name, enabling A/B testing and canary rollouts between model versions.

Authentication

Configure each gateway with exactly one authentication type. Choose the option that matches how your clients already obtain credentials.

CoreWeave IAM authentication

CoreWeave IAM authentication (coreWeaveAuth) validates requests using CoreWeave API access tokens. This is the same token you use to manage inference resources through the Inference API. No additional configuration is required.
{
  "coreWeaveAuth": {}
}

W&B authentication

W&B authentication (weightsAndBiasesAuth) validates requests using W&B SaaS credentials. This option supports optional usage reporting and rate limiting through the W&B platform.
W&B authentication requires a W&B SaaS account. W&B self-hosted (non-SaaS) isn’t supported. W&B SaaS authentication is in preview. If you encounter errors, contact CoreWeave support.
{
  "weightsAndBiasesAuth": {
    "enableUsageReports": true,
    "enableRateLimiting": false
  }
}
FieldDescription
enableUsageReportsSend inference usage data to W&B for tracking.
enableRateLimitingEnable W&B-controlled rate limiting for inference requests.

Routing strategies

Configure each gateway with exactly one routing strategy. The routing strategy determines how the gateway identifies which deployment handles each request.
StrategyWhere the model name comes fromUse when
Body-based routingThe model field in the JSON request body.You want OpenAI-compatible behavior. This is the default.
Path-based routingThe first segment of the URL path.You need to route without inspecting the request body.
Header-based routingA custom HTTP header you specify on the gateway.You want to route on a header value separate from the request body or URL.

Body-based routing

Body-based routing extracts the model name from the model field in the JSON request body. This is the default routing strategy and is compatible with OpenAI API conventions.
{
  "bodyBasedRouting": {
    "apiType": "API_TYPE_OPENAI"
  }
}
With body-based routing, clients send requests to the gateway endpoint directly. The gateway reads the model field to determine which deployment handles the request:
curl -X POST "${CW_GATEWAY_ENDPOINT}/v1/chat/completions" \
  -H "Authorization: Bearer ${CW_API_TOKEN}" \
  -H "Content-Type: application/json" \
  -d '{
    "model": "my-model",
    "messages": [{"role": "user", "content": "Hello"}]
  }'

Path-based routing

Path-based routing uses the model name as the first segment of the URL path. This approach is useful when you need to route requests without inspecting the request body.
{
  "pathBasedRouting": {}
}
With path-based routing, the model name appears in the URL. If the model name contains special characters, URL-encode it before inserting it into the path (for example, my%2Fmodel for my/model).
curl -X POST "${CW_GATEWAY_ENDPOINT}/my-model/v1/chat/completions" \
  -H "Authorization: Bearer ${CW_API_TOKEN}" \
  -H "Content-Type: application/json" \
  -d '{
    "model": "my-model",
    "messages": [{"role": "user", "content": "Hello"}]
  }'

Header-based routing

Header-based routing uses a custom HTTP header to identify the target deployment. You specify the header name when creating the gateway.
{
  "headerBasedRouting": {
    "headerName": "X-Model-Name"
  }
}
The headerName field accepts values between 1 and 100 characters. With header-based routing, clients include the model name in the specified header:
curl -X POST "${CW_GATEWAY_ENDPOINT}/v1/chat/completions" \
  -H "Authorization: Bearer ${CW_API_TOKEN}" \
  -H "Content-Type: application/json" \
  -H "X-Model-Name: my-model" \
  -d '{
    "model": "my-model",
    "messages": [{"role": "user", "content": "Hello"}]
  }'

Zones

When you create a gateway, you must specify the CoreWeave Availability Zone where the gateway is deployed. Placing a gateway close to its model deployments reduces request latency. The zones field accepts an array, but each gateway supports only one zone. To list the available zones, query the gateway parameters endpoint:
curl "${CW_BASE_URL}/v1alpha1/inference/gateways/parameters" \
  -H "Authorization: Bearer ${CW_API_TOKEN}"

Gateway endpoints

After a gateway reaches STATUS_READY, the status.endpoints field lists one or more endpoint URLs that CoreWeave generates for you. The endpoint URL follows the pattern:
https://api.[GATEWAY-ID].gw.cwinference.com
Retrieve the endpoint URL by querying the gateway:
curl "${CW_BASE_URL}/v1alpha1/inference/gateways/${CW_GATEWAY_ID}" \
  -H "Authorization: Bearer ${CW_API_TOKEN}"
To configure additional DNS names for a gateway, use the endpointConfiguration.additionalDns field. If you configure additional DNS names, you must manually point them to the gateway endpoint.

Traffic splitting

Traffic splitting lets you route a portion of requests to different deployments of the same model, which is useful for A/B testing and canary rollouts. When multiple deployments share the same model name and use the same gateway, the gateway splits traffic between them based on the traffic.weight field on each deployment. The gateway normalizes weights into percentages. For example, the following weights configure a canary deployment:
  • Production deployment: traffic.weight = 900 (receives 90% of traffic).
  • Canary deployment: traffic.weight = 100 (receives 10% of traffic).
For more information about traffic weights, see Models and deployments.

Gateway lifecycle

A gateway moves through several states from creation to deletion. Use the current status to determine whether a gateway is ready to receive traffic or requires attention:
StatusDescription
STATUS_CREATINGCoreWeave is provisioning the gateway.
STATUS_READYThe gateway is ready to receive traffic.
STATUS_UPDATINGCoreWeave is updating the gateway configuration.
STATUS_DELETINGCoreWeave is removing the gateway.
STATUS_ERRORThe gateway has an error. Check status.conditions for details.

Manage gateways

Manage gateways through the CoreWeave Inference API. For per-operation request and response schemas, see the GatewayService pages in the API reference. For a complete walkthrough of creating a gateway, see the Getting started guide.
Last modified on June 10, 2026