A gateway is the entry point for inference requests in CoreWeave Inference. Each gateway provides a routable endpoint that handles authentication, request routing, load balancing, and traffic splitting across one or more model deployments. This page explains how to configure gateway authentication, routing strategies, zones, and traffic splitting.Documentation Index
Fetch the complete documentation index at: https://docs.coreweave.com/llms.txt
Use this file to discover all available pages before exploring further.
What gateways do
When you create a gateway, CoreWeave provisions an external-facing endpoint that clients use to send inference requests. The gateway manages several responsibilities:- Authentication: Validates incoming requests using the configured authentication provider before forwarding them to model deployments.
- Request routing: Directs requests to the correct deployment based on the configured routing strategy.
- Load balancing: Distributes requests across multiple replicas of a deployment to optimize throughput and latency.
- Traffic splitting: Splits traffic across multiple deployments that share the same model name, enabling A/B testing and canary rollouts between model versions.
Authentication
Every gateway must be configured with exactly one authentication type.CoreWeave IAM authentication
CoreWeave IAM authentication (coreWeaveAuth) validates requests using CoreWeave API access tokens. This is the same token used to manage inference resources through the Inference API. No additional configuration is required.
W&B authentication
W&B authentication (weightsAndBiasesAuth) validates requests using W&B SaaS credentials. This option supports optional usage reporting and rate limiting through the W&B platform.
W&B authentication requires a W&B SaaS account. W&B self-hosted (non-SaaS) isn’t currently supported. W&B SaaS authentication is in preview. Contact CoreWeave support if you encounter errors.
| Field | Description |
|---|---|
enableUsageReports | Send inference usage data to W&B for tracking. |
enableRateLimiting | Enable W&B-controlled rate limiting for inference requests. |
Routing strategies
Every gateway must be configured with exactly one routing strategy. The routing strategy determines how the gateway identifies which deployment should handle each request.| Strategy | Where the model name comes from | Use when |
|---|---|---|
| Body-based routing | The model field in the JSON request body. | You want OpenAI-compatible behavior. This is the default. |
| Path-based routing | The first segment of the URL path. | You need to route without inspecting the request body. |
| Header-based routing | A custom HTTP header you specify on the gateway. | You want to route on a header value separate from the request body or URL. |
Body-based routing
Body-based routing extracts the model name from themodel field in the JSON request body. This is the default routing strategy and is compatible with OpenAI API conventions.
model field to determine which deployment handles the request:
Path-based routing
Path-based routing uses the model name as the first segment of the URL path. This approach is useful when you need to route requests without inspecting the request body.my%2Fmodel for my/model).
Header-based routing
Header-based routing uses a custom HTTP header to identify the target deployment. You specify the header name when creating the gateway.headerName field accepts values between 1 and 100 characters. With header-based routing, clients include the model name in the specified header:
Zones
When you create a gateway, you must specify the CoreWeave Availability Zone where the gateway is deployed. Thezones field accepts an array, but gateways are currently limited to one zone.
To list the available zones, query the gateway parameters endpoint:
Gateway endpoints
After a gateway reachesSTATUS_READY, it exposes one or more endpoint URLs in the status.endpoints field. The endpoint URL follows the pattern:
endpointConfiguration.additionalDns field. If you configure additional DNS names, you must manually point them to the gateway endpoint.
Traffic splitting
When multiple deployments share the same model name and are associated with the same gateway, the gateway splits traffic between them based on thetraffic.weight field on each deployment. Weights are normalized into percentages.
For example, the following weights configure a canary deployment:
- Production deployment:
traffic.weight= 900 (receives 90% of traffic) - Canary deployment:
traffic.weight= 100 (receives 10% of traffic)
Gateway lifecycle
Gateways go through the following states:| Status | Description |
|---|---|
STATUS_CREATING | CoreWeave is provisioning the gateway. |
STATUS_READY | The gateway is ready to receive traffic. |
STATUS_UPDATING | CoreWeave is updating the gateway configuration. |
STATUS_DELETING | CoreWeave is removing the gateway. |
STATUS_ERROR | The gateway has an error. Check status.conditions for details. |