What gateways do
When you create a gateway, CoreWeave provisions an external-facing endpoint that clients use to send inference requests. The gateway manages several responsibilities:- Authentication: Validates incoming requests using the configured authentication provider before forwarding them to model deployments.
- Request routing: Directs requests to the correct deployment based on the configured routing strategy.
- Load balancing: Distributes requests across multiple replicas of a deployment to optimize throughput and latency.
- Traffic splitting: Splits traffic across multiple deployments that share the same model name, enabling A/B testing and canary rollouts between model versions.
Authentication
Configure each gateway with exactly one authentication type. Choose the option that matches how your clients already obtain credentials.CoreWeave IAM authentication
CoreWeave IAM authentication (coreWeaveAuth) validates requests using CoreWeave API access tokens. This is the same token you use to manage inference resources through the Inference API. No additional configuration is required.
W&B authentication
W&B authentication (weightsAndBiasesAuth) validates requests using W&B SaaS credentials. This option supports optional usage reporting and rate limiting through the W&B platform.
W&B authentication requires a W&B SaaS account. W&B self-hosted (non-SaaS) isn’t supported. W&B SaaS authentication is in preview. If you encounter errors, contact CoreWeave support.
| Field | Description |
|---|---|
enableUsageReports | Send inference usage data to W&B for tracking. |
enableRateLimiting | Enable W&B-controlled rate limiting for inference requests. |
Routing strategies
Configure each gateway with exactly one routing strategy. The routing strategy determines how the gateway identifies which deployment handles each request.| Strategy | Where the model name comes from | Use when |
|---|---|---|
| Body-based routing | The model field in the JSON request body. | You want OpenAI-compatible behavior. This is the default. |
| Path-based routing | The first segment of the URL path. | You need to route without inspecting the request body. |
| Header-based routing | A custom HTTP header you specify on the gateway. | You want to route on a header value separate from the request body or URL. |
Body-based routing
Body-based routing extracts the model name from themodel field in the JSON request body. This is the default routing strategy and is compatible with OpenAI API conventions.
model field to determine which deployment handles the request:
Path-based routing
Path-based routing uses the model name as the first segment of the URL path. This approach is useful when you need to route requests without inspecting the request body.my%2Fmodel for my/model).
Header-based routing
Header-based routing uses a custom HTTP header to identify the target deployment. You specify the header name when creating the gateway.headerName field accepts values between 1 and 100 characters. With header-based routing, clients include the model name in the specified header:
Zones
When you create a gateway, you must specify the CoreWeave Availability Zone where the gateway is deployed. Placing a gateway close to its model deployments reduces request latency. Thezones field accepts an array, but each gateway supports only one zone.
To list the available zones, query the gateway parameters endpoint:
Gateway endpoints
After a gateway reachesSTATUS_READY, the status.endpoints field lists one or more endpoint URLs that CoreWeave generates for you. The endpoint URL follows the pattern:
endpointConfiguration.additionalDns field. If you configure additional DNS names, you must manually point them to the gateway endpoint.
Traffic splitting
Traffic splitting lets you route a portion of requests to different deployments of the same model, which is useful for A/B testing and canary rollouts. When multiple deployments share the same model name and use the same gateway, the gateway splits traffic between them based on thetraffic.weight field on each deployment. The gateway normalizes weights into percentages.
For example, the following weights configure a canary deployment:
- Production deployment:
traffic.weight= 900 (receives 90% of traffic). - Canary deployment:
traffic.weight= 100 (receives 10% of traffic).
Gateway lifecycle
A gateway moves through several states from creation to deletion. Use the current status to determine whether a gateway is ready to receive traffic or requires attention:| Status | Description |
|---|---|
STATUS_CREATING | CoreWeave is provisioning the gateway. |
STATUS_READY | The gateway is ready to receive traffic. |
STATUS_UPDATING | CoreWeave is updating the gateway configuration. |
STATUS_DELETING | CoreWeave is removing the gateway. |
STATUS_ERROR | The gateway has an error. Check status.conditions for details. |