Migrate Data to AI Object Storage
Use Rclone to migrate data to AI Object Storage
You can use various replication tools to copy data to CoreWeave AI Object Storage, including Rclone, S3cmd, and Cyberduck. This guide explains how to copy data between two AI Object Storage buckets in different regions using Rclone. The same approach can also be used to transfer data from any S3-compatible storage service, such as AWS S3, Azure Blob Storage, or Google Cloud Storage.
For the purposes of this guide, the source and target buckets are in different CoreWeave Availability Zones.
Prerequisites
-
You need an access key and secret key for each bucket's location.
For CoreWeave AI Object Storage, use either of these methods:
- Create keys with a SAML assertion and Workload Identity Federation
- Create keys with a Cloud Console token.
If a bucket is located on a different S3-compatible platform, such as AWS S3, Azure Blob Storage, or Google Cloud Storage, refer to the respective documentation to generate access keys.
-
Ensure Rclone is installed by following the Rclone installation guide. This example uses Rclone version
v1.69
.
Configure Rclone
-
Create source and destination profiles in your Rclone config. To locate the config file, run:
$rclone config fileIf the config file doesn't exist, Rclone reports the default location where the config file should be created:
Configuration file doesn't exist, but rclone will use this path:/home/username/.config/rclone/rclone.confIf an active config file exists, Rclone reports its actual location with a similar message.
-
Edit the config file with your preferred text editor and add the following profiles, replacing the placeholders with your actual access key ID and secret key:
~/.config/rclone/rclone.conf[source]type = s3provider = Otheraccess_key_id = your_access_key_idsecret_access_key = your_secret_access_keyendpoint = https://cwobject.comforce_path_style = falseno_check_bucket = true[target]type = s3provider = Otheraccess_key_id = your_access_key_idsecret_access_key = your_secret_access_keyendpoint = https://cwobject.comforce_path_style = falseno_check_bucket = true -
Save and close the file.
Copy objects between buckets
Copy objects from the source to the target:
$rclone copy source:source-bucket target:target-bucket \--progress --stats 15s
This copies all objects:
- from
source-bucket
on thesource
profile - to
target-bucket
on thetarget
profile
The Rclone options --progress --stats 15s
print a progress bar with estimated time to completion and detailed transfer statistics every 15 seconds.
Check target bucket usage
If you have S3cmd installed, you can check the usage of the target bucket with:
$s3cmd du --human-readable-sizes s3://target
The output shows the total size of the bucket and the number of objects it contains.
238M 2303 objects s3://target/
Useful flags
To optimize Rclone throughput with AI Object Storage, use flags to fine-tune parallelism and chunking for large files. The most significant flags when copying data to and from CoreWeave AI Object Storage are:
--transfers
: Sets the number of concurrent file transfers. Adjusting this helps fully utilize the available network bandwidth between the customer and CoreWeave, or between CoreWeave regions.--checkers
: Controls the number of concurrent file checks for equality. This is useful to adjust when transferring many small files.--s3-chunk-size
: Defines the chunk size used for uploading files larger than the upload_cutoff or files with unknown sizes. Larger chunks reduce HTTP requests but increase memory usage.--s3-upload-concurrency
: Sets the level of concurrency for multipart uploads.
To check the default values for each flag, use rclone help flags
. Rclone has many flags, so grep
is useful to filter the output.
$rclone help flags | grep -E -- '--checkers|--transfers|--s3-chunk-size|--s3-upload-concurrency'--checkers int Number of checkers to run in parallel (default 8)--transfers int Number of file transfers to run in parallel (default 4)--s3-chunk-size SizeSuffix Chunk size to use for uploading (default 5Mi)--s3-upload-concurrency int Concurrency for multipart uploads and copies (default 4)
Optimization guidelines
Use these guidelines to optimize Rclone throughput when copying data to and from CoreWeave AI Object Storage:
-
To maximize migration throughput, consider the combined effects of
--transfers
and--s3-upload-concurrency
as multiplicative:Total streams ≈
--transfers
×--s3-upload-concurrency
-
For transfers dominated with many small or medium-sized files (KBs to MBs), increase
--transfers
to a value between 8 and 32 to move multiple files in parallel, but keep--s3-upload-concurrency
at a lower value, between 1 and 4, because small files don't benefit from multipart uploads. -
For transfers with a few large files (hundreds of GB), do the opposite: set
--transfers
to 1 or 2 to avoid initiating too many multipart uploads, while increasing--s3-upload-concurrency
to a large value, between 8 and 16, to upload multiple parts of each large file in parallel and saturate available bandwidth.
-
-
Increase
--s3-chunk-size
to 50MB for best performance. The default is 5MB. -
Monitor memory use. Estimate Rclone's RAM needs with this calculation:
RAM ≈
--transfers
× (--s3-upload-concurrency
× (--s3-chunk-size
+--buffer-size
))- Each active stream uses
--buffer-size
(the default 16MB) - Each multipart chunk consumes
--s3-chunk-size
.
- Each active stream uses
-
Increase one flag at a time while monitoring with
rclone --progress --stats 15s
. -
Stop tuning when throughput plateaus or retries increase. This method ensures you maximize the available bandwidth without overloading local I/O, system memory, or remote service limits.
Usage example
The following command copies data from source-bucket
to target-bucket
, using the recommended flags.
Use --s3-chunk-size 50M
for best performance, while adjusting the other flags based on your data size, number of files, and available bandwidth.
$rclone copy source:source-bucket target:target-bucket \--progress \--stats 15s \--transfers 64 \--checkers 128 \--s3-chunk-size 50M \--s3-upload-concurrency 10