Skip to main content

Documentation Index

Fetch the complete documentation index at: https://docs.coreweave.com/llms.txt

Use this file to discover all available pages before exploring further.

CoreWeave AI Object Storage now supports pre-staging objects into the LOTA cache. By issuing a HeadObject call with a Range: bytes=0-0 header against any object, LOTA fetches the complete object from the storage backend and places it into the distributed NVMe cache on your GPU Nodes, without transferring a response body to your client.

Overview

Pre-staging eliminates the “cold start” latency penalty that occurs when objects are read for the first time. With a warm cache, training, inference, and checkpoint-restore workloads access data at full cache speed from the very first byte. Key benefits include:
  • Faster first epoch: Training jobs that start with a warm cache can see a 30-50% faster first epoch compared to cold reads.
  • Lower Time-To-First-Token: Inference services load models from cache instead of backend storage, reducing TTFT for large models to sub-second levels.
  • Faster checkpoint restores: Resuming training from the LOTA cache rather than remote storage minimizes downtime after failures.
  • Zero client bandwidth: HeadObject returns only headers. All cache-fill traffic stays inside the backend CoreWeave network.
  • No SDK changes required: Works with any S3-compatible tool or SDK (AWS CLI, boto3, s3cmd).

Limitations

  • Pre-staging caches whole objects only. Byte-range pre-fetching is not supported.
  • Pre-staging only works through the LOTA endpoint (http://cwlota.com). HeadObject calls to the primary endpoint (https://cwobject.com) return metadata but do not trigger a cache fill.
  • A HeadObject with bytes=0-0 request counts as an object access triggering the transition of an object from Cold or Warm storage tiers to the Hot pricing tier. If bytes=0-0 is not included in the HeadObject request, no tier transition will occur.

Learn more

Last modified on April 13, 2026