Migrate Data to AI Object Storage
Use s5cmd or Rclone to migrate data to AI Object Storage
This guide explains how to copy data between AI Object Storage buckets using s5cmd and Rclone.
You can also use s3cmd or Cyberduck to copy data to CoreWeave AI Object Storage.
The same approaches can also be used to transfer data from any S3-compatible storage service, such as AWS S3, Azure Blob Storage, or Google Cloud Storage.
Choose a migration tool
| Tool | Best for | Notes |
|---|---|---|
| s5cmd | High-performance parallel transfers; transfers involving Local Storage | Recommended for most use cases. High default concurrency (256 workers). |
| Rclone | Complex sync operations; transfers between diverse cloud providers | Do not use with Local Storage due to known kernel panic issues. Safe for AI Object Storage, Distributed File Storage, and external systems. |
Prerequisites
Before you start, ensure you have the following:
-
The primary endpoint for AI Object Storage,
https://cwobject.com, requires TLS v1.3. Ensure your S3-compatible tools and OpenSSL support TLS v1.3. -
An organization access policy configured.
-
An access key and secret key for each bucket.
For CoreWeave AI Object Storage, use either of these methods:
- Create keys with a SAML assertion and Workload Identity Federation
- Create keys with a Cloud Console token.
If a bucket is located on a different S3-compatible platform, such as AWS S3, Azure Blob Storage, or Google Cloud Storage, refer to the respective documentation to generate access keys.
Migrate data with s5cmd
s5cmd is a high-performance, parallel S3 and local filesystem execution tool. CoreWeave maintains a fork of s5cmd that defaults to virtual-style addressing for cwobject.com and cwlota.com, which is required for AI Object Storage.
The upstream s5cmd defaults to path-style addressing, which AI Object Storage does not support. You must use the CoreWeave fork for compatibility with AI Object Storage.
Install s5cmd
-
Download the latest release binary for your platform from the CoreWeave s5cmd releases page.
The CoreWeave fork is based on the latest upstream s5cmd and the only difference is that it defaults to virtual-style addressing for AI Object Storage. Other S3-compatible backends are unaffected, so this fork can safely replace any existing s5cmd installation.
-
After downloading, make the binary executable and move it to a directory in your
PATH:$chmod +x s5cmd && sudo mv s5cmd /usr/local/bin/ -
Verify the installation:
$s5cmd version
Configure credentials
s5cmd reads credentials from environment variables or the AWS shared credentials file. Set the following environment variables with your AI Object Storage access keys:
$export AWS_ACCESS_KEY_ID=your_access_key_id$export AWS_SECRET_ACCESS_KEY=your_secret_access_key
Alternatively, if you have already configured credentials for AI Object Storage following the Get Started guide, you can use your existing credentials file:
$export AWS_SHARED_CREDENTIALS_FILE=~/.coreweave/cw.credentials
Copy objects between buckets
Copy objects from a source bucket to a target bucket:
$s5cmd --endpoint-url https://cwobject.com \cp 's3://source-bucket/*' s3://target-bucket/
This copies all objects from source-bucket to target-bucket.
s5cmd preserves the source directory structure by default. If you want to flatten the source directory structure, use the --flatten flag.
$s5cmd --endpoint-url https://cwobject.com \cp --flatten 's3://source-bucket/*' s3://target-bucket/
In shells like zsh, the * character is treated as a file globbing wildcard. Wrap wildcard expressions in single quotes to prevent unexpected behavior:
# Correct - wildcards are passed to s5cmds5cmd cp 's3://bucket/*.gz' ./local-dir/# Incorrect - shell expands the wildcard firsts5cmd cp s3://bucket/*.gz ./local-dir/
To copy a specific file:
$s5cmd --endpoint-url https://cwobject.com \cp s3://source-bucket/path/to/file.txt s3://target-bucket/path/to/file.txt
To copy from a local directory to a bucket:
$s5cmd --endpoint-url https://cwobject.com \cp '/local/path/*' s3://target-bucket/
Optimize s5cmd performance
s5cmd provides two main options for tuning parallelism:
--numworkers: Sets the size of the global worker pool (default: 256). This controls how many files are transferred concurrently.--concurrency: Acpcommand option that sets the number of parts uploaded or downloaded in parallel for a single file (default: 5). This is useful for large files using multipart transfers.
For many small files, increase --numworkers to maximize parallel file transfers:
$s5cmd --endpoint-url https://cwobject.com \--numworkers 512 \cp 's3://source-bucket/*' s3://target-bucket/
For a few large files, keep --numworkers low and increase --concurrency to maximize multipart upload parallelism:
$s5cmd --endpoint-url https://cwobject.com \--numworkers 4 \cp --concurrency 16 's3://source-bucket/*' s3://target-bucket/
For mixed workloads, balance both options:
$s5cmd --endpoint-url https://cwobject.com \--numworkers 64 \cp --concurrency 8 's3://source-bucket/*' s3://target-bucket/
Additional s5cmd operations
s5cmd supports many S3 operations beyond copying. Some useful commands include:
# List objects in a bucket$s5cmd --endpoint-url https://cwobject.com ls s3://bucket-name/# Remove objects$s5cmd --endpoint-url https://cwobject.com rm 's3://bucket-name/prefix/*'# Sync directories (copy only changed files)$s5cmd --endpoint-url https://cwobject.com sync 's3://source-bucket/*' s3://target-bucket/
For the full list of commands, run s5cmd --help.
Migrate data with Rclone
Rclone is a versatile tool for managing files across cloud storage providers. It supports complex sync operations and works with a wide variety of storage backends.
Rclone has known kernel panic issues when reading from and writing to Local Storage volumes. When moving data between Local Storage and AI Object Storage, use s5cmd, s3cmd, or aws s3 cp instead.
Rclone is safe to use with AI Object Storage and other systems, including Distributed File Storage, other Object Storage buckets, or external systems.
Install Rclone
Follow the Rclone installation guide to install Rclone. This guide uses Rclone version v1.69.
Configure Rclone
-
Create source and destination profiles in your Rclone config. To locate the config file, run:
$rclone config fileIf the config file doesn't exist, Rclone reports the default location where the config file should be created:
Configuration file doesn't exist, but rclone will use this path:/home/username/.config/rclone/rclone.confIf an active config file exists, Rclone reports its actual location with a similar message.
-
Edit the config file with your preferred text editor and add the following profiles, replacing the placeholders with your actual access key ID and secret key:
~/.config/rclone/rclone.conf[source]type = s3provider = Otheraccess_key_id = your_access_key_idsecret_access_key = your_secret_access_keyendpoint = https://cwobject.comforce_path_style = falseno_check_bucket = true[target]type = s3provider = Otheraccess_key_id = your_access_key_idsecret_access_key = your_secret_access_keyendpoint = https://cwobject.comforce_path_style = falseno_check_bucket = true -
Save and close the file.
Copy objects between buckets
Copy objects from the source to the target:
$rclone copy source:source-bucket target:target-bucket \--progress --stats 15s
This copies all objects:
- from
source-bucketon thesourceprofile - to
target-bucketon thetargetprofile
The Rclone options --progress --stats 15s print a progress bar with estimated time to completion and detailed transfer statistics every 15 seconds.
When working with versioned buckets, Rclone copies the latest version of each object by default. See Use Rclone with Versioned Buckets and Objects for more details, including specific guidance about how to work with specific versions and delete markers.
Optimize Rclone performance
To optimize Rclone throughput with AI Object Storage, use flags to fine-tune parallelism and chunking for large files. The most significant flags are:
--transfers: Sets the number of concurrent file transfers. Adjusting this helps fully utilize the available network bandwidth between the customer and CoreWeave, or between CoreWeave regions.--checkers: Controls the number of concurrent file checks for equality. This is useful to adjust when transferring many small files.--s3-chunk-size: Defines the chunk size used for uploading files larger than the upload_cutoff or files with unknown sizes. Larger chunks reduce HTTP requests but increase memory usage.--s3-upload-concurrency: Sets the level of concurrency for multipart uploads.
To check the default values for each flag, use rclone help flags. Rclone has many flags, so grep is useful to filter the output.
$rclone help flags | grep -E -- '--checkers|--transfers|--s3-chunk-size|--s3-upload-concurrency'--checkers int Number of checkers to run in parallel (default 8)--transfers int Number of file transfers to run in parallel (default 4)--s3-chunk-size SizeSuffix Chunk size to use for uploading (default 5Mi)--s3-upload-concurrency int Concurrency for multipart uploads and copies (default 4)
Optimization guidelines
Use these guidelines to optimize Rclone throughput when copying data to and from CoreWeave AI Object Storage:
-
To maximize migration throughput, consider the combined effects of
--transfersand--s3-upload-concurrencyas multiplicative:Total streams ≈
--transfers×--s3-upload-concurrency-
For transfers dominated with many small or medium-sized files (KBs to MBs), increase
--transfersto a value between 8 and 32 to move multiple files in parallel, but keep--s3-upload-concurrencyat a lower value, between 1 and 4, because small files don't benefit from multipart uploads. -
For transfers with a few large files (hundreds of GB), do the opposite: set
--transfersto 1 or 2 to avoid initiating too many multipart uploads, while increasing--s3-upload-concurrencyto a large value, between 8 and 16, to upload multiple parts of each large file in parallel and saturate available bandwidth.
-
-
Increase
--s3-chunk-sizeto 50MB for best performance. The default is 5MB. -
Monitor memory use. Estimate Rclone's RAM needs with this calculation:
RAM ≈
--transfers× (--s3-upload-concurrency× (--s3-chunk-size+--buffer-size))- Each active stream uses
--buffer-size(the default 16MB) - Each multipart chunk consumes
--s3-chunk-size.
- Each active stream uses
-
Increase one flag at a time while monitoring with
rclone --progress --stats 15s. -
Stop tuning when throughput plateaus or retries increase. This method ensures you maximize the available bandwidth without overloading local I/O, system memory, or remote service limits.
Usage example
The following command copies data from source-bucket to target-bucket, using the recommended flags.
Use --s3-chunk-size 50M for best performance, while adjusting the other flags based on your data size, number of files, and available bandwidth.
$rclone copy source:source-bucket target:target-bucket \--progress \--stats 15s \--transfers 64 \--checkers 128 \--s3-chunk-size 50M \--s3-upload-concurrency 10
Check bucket usage
If you have S3cmd installed, you can check the usage of the target bucket with:
$s3cmd du --human-readable-sizes s3://target-bucket
The output shows the total size of the bucket and the number of objects it contains.
238M 2303 objects s3://target-bucket/