The Pre-stage Cache feature allows you to proactively warm the LOTA (Local Object Transport Accelerator) cache for CoreWeave AI Object Storage. By issuing a HeadObject call against any object, you instruct LOTA to fetch the complete object from backend storage and place it into the distributed NVMe cache on your Nodes without transferring a response body to your client. This eliminates the “cold start” latency penalty that occurs on first reads. Training jobs with a warm cache can see 30-50% faster first epochs, inference services can achieve sub-second Time-To-First-Token for large models, and checkpoint restores complete at full cache speed from the very first byte. Other benefits include:Documentation Index
Fetch the complete documentation index at: https://docs.coreweave.com/llms.txt
Use this file to discover all available pages before exploring further.
- Zero client bandwidth. HeadObject returns only headers. All cache-fill traffic stays inside the backend CoreWeave network.
- No SDK changes required. Works with any S3-compatible tool or SDK (AWS CLI, Boto3, s3cmd) with no custom headers or proprietary extensions.
Use cases
The following use cases benefit from pre-staging the LOTA cache:| Use Case | Description |
|---|---|
| Pre-warm datasets before distributed training | Issue HeadObject calls for every shard of the training dataset before launching distributed training across hundreds of GPUs. When the first epoch starts, all data-loading workers read directly from cache. |
| Pre-load model weights for inference scale-up | Pre-stage model files while compute pods are initializing during an inference deployment scale-up. By the time pods are ready to serve traffic, model weights are already in cache, ensuring optimal Time-To-First-Token. |
| Accelerate checkpoint-restore after failure | After a node failure interrupts training, pre-stage the most recent checkpoint shards into cache before replacement pods start. This reduces downtime between failure and resumed training. |
| Warm data for multi-region workflows | Before running a production workload in a specific CoreWeave region, pre-stage required objects into that region’s LOTA cache. This enables cache-speed performance from the first read, even if data resides in another region. |
How it works
Pre-staging uses a HeadObject call with aRange: bytes=0-0 header as the trigger. When LOTA receives this request:
- The object storage gateway authenticates the request.
- The object storage service returns the object’s metadata (size, content type, ETag, last modified date) in the response headers, exactly as a normal HeadObject call.
- LOTA fetches the complete object from the backend storage and writes it into the distributed NVMe cache across the Nodes in your CKS cluster.
- Future GET requests for that object are served directly from the LOTA cache.
Whole object caching
Pre-staging caches the entire object, since a HeadObject call with aRange: bytes=0-0 header caches the complete object. Byte-range pre-fetching of partial objects is not supported. However, range-reads will be served from cache as usual.
LOTA endpoint required
Pre-staging only works through the LOTA endpoint (http://cwlota.com). HeadObject calls sent to the primary endpoint (https://cwobject.com) return metadata but do not trigger a cache fill.
Billing and storage tier behavior
A HeadObject withbytes=0-0 request counts as an object access triggering the transition of an object from Cold or Warm to the Hot Storage Pricing tier. If bytes=0-0 is not included in the HeadObject request, no tier transition will occur.
Finite cache capacity
Total LOTA cache capacity scales with cluster size, with each Node contributing 1 TiB to the distributed cache by default (for example, a 30-Node cluster has approximately 30 TiB of cache). Pre-stage only the objects your workload will access in the near term. The cache uses LRU (Least Recently Used) eviction, so pre-staging more data than the cache can hold evicts objects that other active workloads may depend on. Use the CAIOS LOTA dashboard to monitor cache utilization and hit rates. To request a larger cache allocation, contact CoreWeave support.Prerequisites
- An active CoreWeave organization with at least one AI Object Storage bucket containing objects.
- A valid API access key pair (Access Key ID and Secret Access Key) configured for your organization. See Get Started with AI Object Storage for setup instructions.
- A CKS cluster where your workload will run. Your workload must run inside a CoreWeave CKS cluster to reach the LOTA endpoint at
http://cwlota.com. - An S3-compatible client, such as AWS CLI, boto3 (Python), or s3cmd, installed and configured for CoreWeave AI Object Storage.
Configure CoreWeave credentials
Using a separate profile for CoreWeave AI Object Storage is recommended to avoid conflicts with your other AWS profiles and S3-compatible services; if you do not set up this configuration, you may encounter errors when using AI Object Storage.Configure CoreWeave credentials
Configure CoreWeave credentials
-
Create a new credentials file and profile in your CoreWeave configuration directory.
Create a new credentials file and profile
-
When prompted for information, provide the following values:
- AWS Access Key ID: The Access Key ID of your CoreWeave AI Object Storage Access Key.
- AWS Secret Access Key: The Secret Key of your CoreWeave AI Object Storage Access Key.
- Default region name: Optional. To set a default region, refer to the CoreWeave Availability Zones.
- Default output format: Use
jsonfor JSON output.
-
Set the default endpoint URL to the appropriate endpoint for your use case:
- The primary endpoint,
https://cwobject.com, for use when running outside of a CoreWeave cluster. - The LOTA endpoint,
http://cwlota.com, for use when running inside a CoreWeave cluster. The LOTA endpoint routes to the LOTA path for best performance.
Set the primary endpoint for local development - The primary endpoint,
-
Set the S3
addressing_styletovirtual:Set virtual addressing style
Pre-stage a single object
To pre-stage an object, send a HeadObject request to the LOTA endpoint for the target bucket and key. Run these commands from within the same CKS cluster where your training or inference Pods will run, so that data is cached on the correct set of Nodes. A response is returned after the object is written into the LOTA cache. Before completing and running these commands, make sure you have configured your CoreWeave credentials.- AWS CLI
- Boto3
- s3cmd
To pre-stage a single object, fill in the following parameters:The command returns the object’s metadata and triggers a cache fill:
[BUCKET-NAME]with the name of the bucket containing the object you want to pre-stage.[OBJECT-KEY]with the key of the object you want to pre-stage, for example,datasets/imagenet/shard-00001.tar.
Pre-stage a single object
Response
Pre-stage multiple objects in parallel
Parallelizing the pre-stage requests significantly reduces the total time to warm a large dataset. For guidance on tuning connection pool size and concurrency for high-throughput workloads, see Maximize parallelism. Before completing and running these commands, make sure you have configured your CoreWeave credentials.- AWS CLI
- Boto3
- s3cmd
To pre-stage all objects under a prefix, fill in the following parameters:
[BUCKET-NAME]with the name of the bucket containing the objects you want to pre-stage.[PREFIX]with the prefix of the objects you want to pre-stage, for example,datasets/imagenet/.
Pre-stage all objects under a prefix
Pre-stage before your training job starts
Use a Kubernetes Job as an initial step in your training pipeline. The Job pre-stages all training data, then your training pods start with a fully warmed cache. Replace[BUCKET-NAME] with the name of the bucket containing your training data and [PREFIX] with the object prefix to pre-stage (for example, datasets/imagenet/). The Job reads credentials from a Kubernetes Secret named storage-credentials.
job-prestage-training-data.yaml
Schedule the pre-stage Job to run before your training pods launch. In an Argo Workflow or a Kubernetes-native pipeline, add a dependency so the training step waits for the pre-stage Job to succeed. Pre-stage data just before the compute step that needs it to balance warm cache benefits against cache capacity.
Related resources
- About LOTA (Local Object Transport Accelerator)
- Conditional writes: use the ETag returned by HeadObject with
If-Matchto perform atomic compare-and-swap updates - Get Started with AI Object Storage
- Manage Buckets
- Storage Pricing
- CAIOS LOTA dashboard