HeadObject call against any object, you instruct LOTA to fetch the complete object from backend storage and place it into the distributed NVMe cache on your Nodes without transferring a response body to your client.
This eliminates the “cold start” latency penalty that occurs on first reads. Training jobs with a warm cache can see 30% to 50% faster first epochs, inference services can achieve sub-second Time-To-First-Token for large models, and checkpoint restores complete at full cache speed from the first byte.
Other benefits include:
- Zero client bandwidth.
HeadObjectreturns only headers. All cache-fill traffic stays inside the backend CoreWeave network. - No SDK changes required. Works with any S3-compatible tool or SDK (AWS CLI, Boto3, s3cmd) with no custom headers or proprietary extensions.
Use cases
The following use cases benefit from pre-staging the LOTA cache:| Use case | Description |
|---|---|
| Pre-warm datasets before distributed training | Issue HeadObject calls for every shard of the training dataset before launching distributed training across hundreds of GPUs. When the first epoch starts, all data-loading workers read directly from cache. |
| Pre-load model weights for inference scale-up | Pre-stage model files while compute pods are initializing during an inference deployment scale-up. By the time pods are ready to serve traffic, model weights are already in cache, ensuring optimal Time-To-First-Token. |
| Accelerate checkpoint-restore after failure | After a node failure interrupts training, pre-stage the most recent checkpoint shards into cache before replacement pods start. This reduces downtime between failure and resumed training. |
| Warm data for multi-region workflows | Before running a production workload in a specific CoreWeave region, pre-stage required objects into that region’s LOTA cache. This enables cache-speed performance from the first read, even if data resides in another region. |
How it works
Pre-staging uses aHeadObject call with a Range: bytes=0-0 header as the trigger. When LOTA receives this request:
- The object storage gateway authenticates the request.
- The object storage service returns the object’s metadata (size, content type, ETag, last modified date) in the response headers, exactly as a normal
HeadObjectcall. - LOTA fetches the complete object from the backend storage and writes it into the distributed NVMe cache across the Nodes in your CKS cluster.
- LOTA serves future
GETrequests for that object directly from the cache.
HeadObject doesn’t return a response body, pre-staging consumes minimal client-side resources and bandwidth. The cache fill happens entirely within the CoreWeave infrastructure.
Whole object caching
Pre-staging caches the entire object, since aHeadObject call with a Range: bytes=0-0 header caches the complete object. Byte-range pre-fetching of partial objects isn’t supported. However, the LOTA cache serves range-reads as usual.
LOTA endpoint required
Pre-staging works only through the LOTA endpoint (http://cwlota.com). HeadObject calls sent to the primary endpoint (https://cwobject.com) return metadata but don’t trigger a cache fill.
Billing and storage tier behavior
AHeadObject request with bytes=0-0 counts as an object access, triggering the transition of an object from the Cold or Warm tier to the Hot Storage Pricing tier. If bytes=0-0 isn’t included in the HeadObject request, no tier transition occurs.
Finite cache capacity
Total LOTA cache capacity scales with cluster size, with each Node contributing 1 TiB to the distributed cache by default (for example, a 30-Node cluster has approximately 30 TiB of cache). Pre-stage only the objects your workload will access in the near term. The cache uses LRU (Least Recently Used) eviction, so pre-staging more data than the cache can hold evicts objects that other active workloads may depend on. Use the CAIOS LOTA dashboard to monitor cache utilization and hit rates. To request a larger cache allocation, contact CoreWeave support.Prerequisites
- An active CoreWeave organization with at least one AI Object Storage bucket containing objects.
- A valid API access key pair (Access Key ID and Secret Access Key) configured for your organization. See Get Started with AI Object Storage for setup instructions.
- A CKS cluster where your workload runs. Your workload must run inside a CoreWeave CKS cluster to reach the LOTA endpoint at
http://cwlota.com. - An S3-compatible client, such as AWS CLI, boto3 (Python), or s3cmd, installed and configured for CoreWeave AI Object Storage.
Configure CoreWeave credentials
We recommend using a separate profile for CoreWeave AI Object Storage to avoid conflicts with your other AWS profiles and S3-compatible services. If you don’t set up this configuration, you might encounter errors when using AI Object Storage.Configure CoreWeave credentials
Configure CoreWeave credentials
-
Create a new credentials file and profile in your CoreWeave configuration directory.
Create a new credentials file and profile
-
When prompted, provide the following values:
- AWS Access Key ID: The Access Key ID of your CoreWeave AI Object Storage Access Key.
- AWS Secret Access Key: The Secret Key of your CoreWeave AI Object Storage Access Key.
- Default region name (Optional): To set a default region, see CoreWeave Availability Zones.
- Default output format: Use
jsonfor JSON output.
-
Set the default endpoint URL to the appropriate endpoint for your use case:
- The primary endpoint,
https://cwobject.com, for use outside a CoreWeave cluster. - The LOTA endpoint,
http://cwlota.com, for use inside a CoreWeave cluster. The LOTA endpoint routes to the LOTA path for best performance.
Set the primary endpoint for local development - The primary endpoint,
-
Set the S3
addressing_styletovirtual:Set virtual addressing style
endpoint_url and s3.addressing_style directly in your code (for example, in a Boto3 Config object), you can skip steps 3 and 4. The profile only needs the access key, secret key, and region.Pre-stage a single object
To pre-stage an object, send aHeadObject request to the LOTA endpoint for the target bucket and key. Run these commands from within the same CKS cluster where your training or inference Pods run, so that data is cached on the correct set of Nodes. LOTA returns a response after writing the object into the cache.
Before completing and running these commands, make sure you have configured your CoreWeave credentials.
- AWS CLI
- Boto3
- s3cmd
To pre-stage a single object, replace the following placeholders:The command returns the object’s metadata and triggers a cache fill:
[BUCKET-NAME]with the name of the bucket containing the object you want to pre-stage.[OBJECT-KEY]with the key of the object you want to pre-stage, for example,datasets/imagenet/shard-00001.tar.
Pre-stage a single object
Response
Pre-stage multiple objects in parallel
Parallelizing the pre-stage requests reduces the total time to warm a large dataset. For guidance on tuning connection pool size and concurrency for high-throughput workloads, see Maximize parallelism. Before completing and running these commands, make sure you have configured your CoreWeave credentials.- AWS CLI
- Boto3
- s3cmd
To pre-stage all objects under a prefix, replace the following placeholders:
[BUCKET-NAME]with the name of the bucket containing the objects you want to pre-stage.[PREFIX]with the prefix of the objects you want to pre-stage, for example,datasets/imagenet/.
Pre-stage all objects under a prefix
Pre-stage before your training job starts
Use a Kubernetes Job as an initial step in your training pipeline. The Job pre-stages all training data, then your training pods start with a fully warmed cache. Replace[BUCKET-NAME] with the name of the bucket containing your training data and [PREFIX] with the object prefix to pre-stage (for example, datasets/imagenet/). The Job reads credentials from a Kubernetes Secret named storage-credentials.
job-prestage-training-data.yaml
Schedule the pre-stage Job to run before your training pods launch. In an Argo Workflow or a Kubernetes-native pipeline, add a dependency so the training step waits for the pre-stage Job to succeed. Pre-stage data immediately before the compute step that needs it to balance warm cache benefits against cache capacity.
Related resources
- About LOTA (Local Object Transport Accelerator)
- Conditional writes: use the ETag returned by
HeadObjectwithIf-Matchto perform atomic compare-and-swap updates - Get Started with AI Object Storage
- Manage Buckets
- Storage Pricing
- CAIOS LOTA dashboard