CoreWeave Inference incurs no additional platform fees. You are billed only for the underlying compute resources consumed by your deployments.Documentation Index
Fetch the complete documentation index at: https://docs.coreweave.com/llms.txt
Use this file to discover all available pages before exploring further.
Billing models
Node-based billing is for customers who already have reserved node capacity. You can redirect existing reserved nodes for training or other workloads to the inference platform at your existing rates. There are no additional charges beyond the existing reservation cost. GPU-based billing is for on-demand workloads. You pay per GPU-hour based on the instance type. Inference compute is measured in GPU-hours at the deployment level. Pricing is transparent and available on the CoreWeave pricing page.Cost optimization
Follow these guidelines to reduce inference costs.- Right-size your GPU selection. Choose the smallest instance type that meets your model’s memory and throughput requirements.
- Use autoscaling to match demand. Scaling down during low-traffic periods reduces costs. Set
minto the lowest value that meets your latency requirements. - Consider reserved capacity for steady-state workloads. Capacity claims with reserved nodes offer predictable pricing for workloads with consistent demand.
- Monitor replica utilization. If replicas are consistently underutilized, consider reducing
maxor switching to a smaller instance type. - Use scaling priority. When multiple deployments share reserved capacity, set
priorityso that higher-value workloads scale first.