Gateways - CoreWeave Docs

A gateway is the entry point for inference requests in CoreWeave Inference. Each gateway provides a routable endpoint that handles authentication, request routing, load balancing, and traffic splitting across one or more model deployments. This page explains how to configure gateway authentication, routing strategies, zones, and traffic splitting.

What gateways do

When you create a gateway, CoreWeave provisions an external-facing endpoint that clients use to send inference requests. The gateway manages several responsibilities:

Authentication: Validates incoming requests using the configured authentication provider before forwarding them to model deployments.
Request routing: Directs requests to the correct deployment based on the configured routing strategy.
Load balancing: Distributes requests across multiple replicas of a deployment to optimize throughput and latency.
Traffic splitting: Splits traffic across multiple deployments that share the same model name, enabling A/B testing and canary rollouts between model versions.

Authentication

Every gateway must be configured with exactly one authentication type.

CoreWeave IAM authentication

CoreWeave IAM authentication (coreWeaveAuth) validates requests using CoreWeave API access tokens. This is the same token used to manage inference resources through the Inference API. No additional configuration is required.

{
  "coreWeaveAuth": {}
}

W&B authentication

W&B authentication (weightsAndBiasesAuth) validates requests using W&B SaaS credentials. This option supports optional usage reporting and rate limiting through the W&B platform.

W&B authentication requires a W&B SaaS account. W&B self-hosted (non-SaaS) isn’t currently supported. W&B SaaS authentication is in preview. Contact CoreWeave support if you encounter errors.

{
  "weightsAndBiasesAuth": {
    "enableUsageReports": true,
    "enableRateLimiting": false
  }
}

Field	Description
`enableUsageReports`	Send inference usage data to W&B for tracking.
`enableRateLimiting`	Enable W&B-controlled rate limiting for inference requests.

Routing strategies

Every gateway must be configured with exactly one routing strategy. The routing strategy determines how the gateway identifies which deployment should handle each request.

Strategy	Where the model name comes from	Use when
Body-based routing	The `model` field in the JSON request body.	You want OpenAI-compatible behavior. This is the default.
Path-based routing	The first segment of the URL path.	You need to route without inspecting the request body.
Header-based routing	A custom HTTP header you specify on the gateway.	You want to route on a header value separate from the request body or URL.

Body-based routing

Body-based routing extracts the model name from the model field in the JSON request body. This is the default routing strategy and is compatible with OpenAI API conventions.

{
  "bodyBasedRouting": {
    "apiType": "API_TYPE_OPENAI"
  }
}

With body-based routing, clients send requests to the gateway endpoint directly. The gateway reads the model field to determine which deployment handles the request:

curl -X POST "${CW_GATEWAY_ENDPOINT}/v1/chat/completions" \
  -H "Authorization: Bearer ${CW_API_TOKEN}" \
  -H "Content-Type: application/json" \
  -d '{
    "model": "my-model",
    "messages": [{"role": "user", "content": "Hello"}]
  }'

Path-based routing

Path-based routing uses the model name as the first segment of the URL path. This approach is useful when you need to route requests without inspecting the request body.

{
  "pathBasedRouting": {}
}

With path-based routing, the model name appears in the URL. If the model name contains special characters, URL-encode it before inserting it into the path (for example, my%2Fmodel for my/model).

curl -X POST "${CW_GATEWAY_ENDPOINT}/my-model/v1/chat/completions" \
  -H "Authorization: Bearer ${CW_API_TOKEN}" \
  -H "Content-Type: application/json" \
  -d '{
    "model": "my-model",
    "messages": [{"role": "user", "content": "Hello"}]
  }'

Header-based routing

Header-based routing uses a custom HTTP header to identify the target deployment. You specify the header name when creating the gateway.

{
  "headerBasedRouting": {
    "headerName": "X-Model-Name"
  }
}

The headerName field accepts values between 1 and 100 characters. With header-based routing, clients include the model name in the specified header:

curl -X POST "${CW_GATEWAY_ENDPOINT}/v1/chat/completions" \
  -H "Authorization: Bearer ${CW_API_TOKEN}" \
  -H "Content-Type: application/json" \
  -H "X-Model-Name: my-model" \
  -d '{
    "model": "my-model",
    "messages": [{"role": "user", "content": "Hello"}]
  }'

Zones

When you create a gateway, you must specify the CoreWeave Availability Zone where the gateway is deployed. The zones field accepts an array, but gateways are currently limited to one zone. To list the available zones, query the gateway parameters endpoint:

curl "${CW_BASE_URL}/v1alpha1/inference/gateways/parameters" \
  -H "Authorization: Bearer ${CW_API_TOKEN}"

Gateway endpoints

After a gateway reaches STATUS_READY, it exposes one or more endpoint URLs in the status.endpoints field. The endpoint URL follows the pattern:

https://api.[GATEWAY-ID].gw.cwinference.com

Retrieve the endpoint URL by querying the gateway:

curl "${CW_BASE_URL}/v1alpha1/inference/gateways/${CW_GATEWAY_ID}" \
  -H "Authorization: Bearer ${CW_API_TOKEN}"

To configure additional DNS names for a gateway, use the endpointConfiguration.additionalDns field. If you configure additional DNS names, you must manually point them to the gateway endpoint.

Traffic splitting

When multiple deployments share the same model name and are associated with the same gateway, the gateway splits traffic between them based on the traffic.weight field on each deployment. Weights are normalized into percentages. For example, the following weights configure a canary deployment:

Production deployment: traffic.weight = 900 (receives 90% of traffic)
Canary deployment: traffic.weight = 100 (receives 10% of traffic)

For more information about traffic weights, see Models and deployments.

Gateway lifecycle

Gateways go through the following states:

Status	Description
`STATUS_CREATING`	CoreWeave is provisioning the gateway.
`STATUS_READY`	The gateway is ready to receive traffic.
`STATUS_UPDATING`	CoreWeave is updating the gateway configuration.
`STATUS_DELETING`	CoreWeave is removing the gateway.
`STATUS_ERROR`	The gateway has an error. Check `status.conditions` for details.

Manage gateways

Manage gateways through the CoreWeave Inference API. For per-operation request and response schemas, see the GatewayService pages in the API reference. For a complete walkthrough of creating a gateway, see the Getting started guide.

Documentation Index

​What gateways do

​Authentication

​CoreWeave IAM authentication

​W&B authentication

​Routing strategies

​Body-based routing

​Path-based routing

​Header-based routing

​Zones

​Gateway endpoints

​Traffic splitting

​Gateway lifecycle

​Manage gateways

What gateways do

Authentication

CoreWeave IAM authentication

W&B authentication

Routing strategies

Body-based routing

Path-based routing

Header-based routing

Zones

Gateway endpoints

Traffic splitting

Gateway lifecycle

Manage gateways