Skip to main content

Documentation Index

Fetch the complete documentation index at: https://docs.coreweave.com/llms.txt

Use this file to discover all available pages before exploring further.

The Pre-stage Cache feature allows you to proactively warm the LOTA (Local Object Transport Accelerator) cache for CoreWeave AI Object Storage. By issuing a HeadObject call against any object, you instruct LOTA to fetch the complete object from backend storage and place it into the distributed NVMe cache on your Nodes without transferring a response body to your client. This eliminates the “cold start” latency penalty that occurs on first reads. Training jobs with a warm cache can see 30-50% faster first epochs, inference services can achieve sub-second Time-To-First-Token for large models, and checkpoint restores complete at full cache speed from the very first byte. Other benefits include:
  • Zero client bandwidth. HeadObject returns only headers. All cache-fill traffic stays inside the backend CoreWeave network.
  • No SDK changes required. Works with any S3-compatible tool or SDK (AWS CLI, Boto3, s3cmd) with no custom headers or proprietary extensions.

Use cases

The following use cases benefit from pre-staging the LOTA cache:
Use CaseDescription
Pre-warm datasets before distributed trainingIssue HeadObject calls for every shard of the training dataset before launching distributed training across hundreds of GPUs. When the first epoch starts, all data-loading workers read directly from cache.
Pre-load model weights for inference scale-upPre-stage model files while compute pods are initializing during an inference deployment scale-up. By the time pods are ready to serve traffic, model weights are already in cache, ensuring optimal Time-To-First-Token.
Accelerate checkpoint-restore after failureAfter a node failure interrupts training, pre-stage the most recent checkpoint shards into cache before replacement pods start. This reduces downtime between failure and resumed training.
Warm data for multi-region workflowsBefore running a production workload in a specific CoreWeave region, pre-stage required objects into that region’s LOTA cache. This enables cache-speed performance from the first read, even if data resides in another region.

How it works

Pre-staging uses a HeadObject call with a Range: bytes=0-0 header as the trigger. When LOTA receives this request:
  1. The object storage gateway authenticates the request.
  2. The object storage service returns the object’s metadata (size, content type, ETag, last modified date) in the response headers, exactly as a normal HeadObject call.
  3. LOTA fetches the complete object from the backend storage and writes it into the distributed NVMe cache across the Nodes in your CKS cluster.
  4. Future GET requests for that object are served directly from the LOTA cache.
Since HeadObject does not return a response body, pre-staging consumes minimal client-side resources and bandwidth. The cache fill happens entirely within the CoreWeave infrastructure.

Whole object caching

Pre-staging caches the entire object, since a HeadObject call with a Range: bytes=0-0 header caches the complete object. Byte-range pre-fetching of partial objects is not supported. However, range-reads will be served from cache as usual.

LOTA endpoint required

Pre-staging only works through the LOTA endpoint (http://cwlota.com). HeadObject calls sent to the primary endpoint (https://cwobject.com) return metadata but do not trigger a cache fill.

Billing and storage tier behavior

A HeadObject with bytes=0-0 request counts as an object access triggering the transition of an object from Cold or Warm to the Hot Storage Pricing tier. If bytes=0-0 is not included in the HeadObject request, no tier transition will occur.
Pre-staging a large number of objects transitions them all to the Hot pricing tier. Review the Storage Pricing documentation to understand the cost implications before pre-staging at scale.

Finite cache capacity

Total LOTA cache capacity scales with cluster size, with each Node contributing 1 TiB to the distributed cache by default (for example, a 30-Node cluster has approximately 30 TiB of cache). Pre-stage only the objects your workload will access in the near term. The cache uses LRU (Least Recently Used) eviction, so pre-staging more data than the cache can hold evicts objects that other active workloads may depend on. Use the CAIOS LOTA dashboard to monitor cache utilization and hit rates. To request a larger cache allocation, contact CoreWeave support.

Prerequisites

Configure CoreWeave credentials

Using a separate profile for CoreWeave AI Object Storage is recommended to avoid conflicts with your other AWS profiles and S3-compatible services; if you do not set up this configuration, you may encounter errors when using AI Object Storage.
  1. Create a new credentials file and profile in your CoreWeave configuration directory.
    Create a new credentials file and profile
    AWS_SHARED_CREDENTIALS_FILE=~/.coreweave/cw.credentials aws configure --profile cw
    
  2. When prompted for information, provide the following values:
    • AWS Access Key ID: The Access Key ID of your CoreWeave AI Object Storage Access Key.
    • AWS Secret Access Key: The Secret Key of your CoreWeave AI Object Storage Access Key.
    • Default region name: Optional. To set a default region, refer to the CoreWeave Availability Zones.
    • Default output format: Use json for JSON output.
  3. Set the default endpoint URL to the appropriate endpoint for your use case:
    • The primary endpoint, https://cwobject.com, for use when running outside of a CoreWeave cluster.
    • The LOTA endpoint, http://cwlota.com, for use when running inside a CoreWeave cluster. The LOTA endpoint routes to the LOTA path for best performance.
    Set the primary endpoint for local development
    AWS_CONFIG_FILE=~/.coreweave/cw.config aws configure set endpoint_url https://cwobject.com --profile cw
    
  4. Set the S3 addressing_style to virtual:
    Set virtual addressing style
    AWS_CONFIG_FILE=~/.coreweave/cw.config aws configure set s3.addressing_style virtual --profile cw
    

Pre-stage a single object

To pre-stage an object, send a HeadObject request to the LOTA endpoint for the target bucket and key. Run these commands from within the same CKS cluster where your training or inference Pods will run, so that data is cached on the correct set of Nodes. A response is returned after the object is written into the LOTA cache. Before completing and running these commands, make sure you have configured your CoreWeave credentials.
To pre-stage a single object, fill in the following parameters:
  • [BUCKET-NAME] with the name of the bucket containing the object you want to pre-stage.
  • [OBJECT-KEY] with the key of the object you want to pre-stage, for example, datasets/imagenet/shard-00001.tar.
Pre-stage a single object
aws s3api head-object \
  --bucket [BUCKET-NAME] \
  --key [OBJECT-KEY] \
  --endpoint-url http://cwlota.com \
  --range "bytes=0-0"
The command returns the object’s metadata and triggers a cache fill:
Response
{
  "ContentLength": 524288000,
  "ContentType": "application/x-tar",
  "ETag": "\"a1b2c3d4e5f6a7b8c9d0e1f2a3b4c5d6\"",
  "LastModified": "2025-11-01T08:30:00+00:00",
  "Metadata": {}
}

Pre-stage multiple objects in parallel

Parallelizing the pre-stage requests significantly reduces the total time to warm a large dataset. For guidance on tuning connection pool size and concurrency for high-throughput workloads, see Maximize parallelism.
Pre-staging a large number of objects transitions them all to the Hot pricing tier. Review the Storage Pricing to understand the cost implications before pre-staging at scale.
Before completing and running these commands, make sure you have configured your CoreWeave credentials.
To pre-stage all objects under a prefix, fill in the following parameters:
  • [BUCKET-NAME] with the name of the bucket containing the objects you want to pre-stage.
  • [PREFIX] with the prefix of the objects you want to pre-stage, for example, datasets/imagenet/.
Pre-stage all objects under a prefix
aws s3api list-objects-v2 \
  --bucket [BUCKET-NAME] \
  --prefix [PREFIX] \
  --endpoint-url http://cwlota.com \
  --query 'Contents[].Key' \
  --output text | tr '\t' '\n' | \
while read key; do
  aws s3api head-object \
    --bucket [BUCKET-NAME] \
    --key "$key" \
    --endpoint-url http://cwlota.com \
    --range "bytes=0-0" &
done

# Wait for all background jobs to complete
wait
echo "Pre-staging complete."

Pre-stage before your training job starts

Use a Kubernetes Job as an initial step in your training pipeline. The Job pre-stages all training data, then your training pods start with a fully warmed cache. Replace [BUCKET-NAME] with the name of the bucket containing your training data and [PREFIX] with the object prefix to pre-stage (for example, datasets/imagenet/). The Job reads credentials from a Kubernetes Secret named storage-credentials.
job-prestage-training-data.yaml
apiVersion: batch/v1
kind: Job
metadata:
  name: prestage-training-data
spec:
  template:
    spec:
      containers:
        - name: prestage
          image: amazon/aws-cli:latest
          command:
            - "/bin/bash"
            - "-c"
            - |
              aws s3api list-objects-v2 \
                --bucket [BUCKET-NAME] \
                --prefix [PREFIX] \
                --endpoint-url http://cwlota.com \
                --query 'Contents[].Key' \
                --output text | tr '\t' '\n' | \
              while read key; do
                aws s3api head-object \
                  --bucket [BUCKET-NAME] \
                  --key "$key" \
                  --endpoint-url http://cwlota.com \
                  --range 'bytes=0-0' > /dev/null &
              done
              wait
              echo "Pre-staging complete."
          env:
            - name: AWS_ACCESS_KEY_ID
              valueFrom:
                secretKeyRef:
                  name: storage-credentials
                  key: access-key-id
            - name: AWS_SECRET_ACCESS_KEY
              valueFrom:
                secretKeyRef:
                  name: storage-credentials
                  key: secret-access-key
      restartPolicy: Never
  backoffLimit: 3
Schedule the pre-stage Job to run before your training pods launch. In an Argo Workflow or a Kubernetes-native pipeline, add a dependency so the training step waits for the pre-stage Job to succeed. Pre-stage data just before the compute step that needs it to balance warm cache benefits against cache capacity.
Last modified on April 20, 2026