Use s5cmd with CoreWeave Object Storage
S5cmd is a fast and efficient command-line utility for interacting with S3-compatible Object Storage services. According to the author, it's multiple orders of magnitude faster than s3cmd or aws-cli.
This guide explains how to install, configure, and use s5cmd with CoreWeave Object Storage. Before proceeding, you'll need a Linux, macOS, or Windows machine with terminal access and a CoreWeave Object Storage Token.
Installation
To install s5cmd for your platform:
- Linux: Download the latest release for your platform from the GitHub repository, unpack the archive, and move the executable to
/usr/local/bin
. - macOS: Install with Homebrew:
$ brew install peak/tap/s5cmd
- Windows: Download the latest release for your platform from the GitHub repository, unpack the archive, and place the
.exe
file in a directory in your system path.
Configuration
S5cmd credentials can be provided in a default file or through the environment.
Default file
The default configuration is located at ~/.aws/credentials
. Create this file if it doesn't exist and add your Object Storage credentials:
[default]
aws_access_key_id = YOUR_ACCESS_KEY
aws_secret_access_key = YOUR_SECRET_KEY
Environment
If environment variables are preferred, set AWS_ACCESS_KEY_ID
and AWS_SECRET_ACCESS_KEY
instead:
# Export your AWS access key and secret pair
export AWS_ACCESS_KEY_ID='<your-access-key-id>'
export AWS_SECRET_ACCESS_KEY='<your-secret-access-key>'
Required Endpoint URL
The endpoint URL must be defined to use s5cmd with CoreWeave Object Storage. It can be defined on the command line or in the local environment.
These examples use the LGA1 endpoint. Please see the Object Storage documentation for a list of available endpoints.
On the command Line
As an example, when copying files with the cp
command, specify the endpoint URL on the command line as shown:
s5cmd --endpoint-url=https://object.lga1.coreweave.com \
cp local-file.txt s3://bucket-name/remote-file.txt
In the environment
To set the endpoint URL in the local environment:
export S3_ENDPOINT_URL="https://object.lga1.coreweave.com"
After S3_ENDPOINT_URL
is set, s5cmd will fetch the endpoint from the environment.
s5cmd cp local-file.txt s3://bucket-name/remote-file.txt
As an alias (optional)
In some cases it may be helpful to define an alias in your profile with the endpoint URL:
alias s5="s5cmd --endpoint-url https://object.lga1.coreweave.com"
After defining the alias, the copy command can be run as:
s5 cp local-file.txt s3://bucket-name/remote-file.txt
Alternate Profiles
A credentials file can contain multiple profiles. For example, default
and alternate
profiles are defined like this:
[default]
aws_access_key_id = YOUR_ACCESS_KEY
aws_secret_access_key = YOUR_SECRET_KEY
[my_alternate]
aws_access_key_id = ALTERNATE_ACCESS_KEY
aws_secret_access_key = ALTERNATE_SECRET_KEY
To specify an alternate profile, use the --profile
option:
s5cmd --profile my_alternate \
cp local-file.txt s3://bucket-name/remote-file.txt
An alternate credentials file can be used with the --credentials-file
option.
s5cmd --credentials-file ~/.alternate-credentials-file \
cp local-file.txt s3://bucket-name/remote-file.txt
The options can also be combined.
s5cmd --credentials-file ~/.alternate-credentials-file \
--profile my_alternate \
cp local-file.txt s3://bucket-name/remote-file.txt
Basic usage
These are basic commands for using s5cmd. All examples in this guide assume that the endpoint URL is defined in the local environment as described above.
List buckets
s5cmd ls
Upload a file
s5cmd cp local-file.txt s3://bucket-name/remote-file.txt
Download a file
s5cmd cp s3://bucket-name/remote-file.txt local-file.txt
Delete a file
s5cmd rm s3://bucket-name/remote-file.txt
Advanced features
s5cmd supports a number of advanced features. Please see the official documentation for more information about these:
- Shell auto-completion
- Include and exclude filters
- Count objects and determine total size
- Use command files to run multiple commands in parallel
- Sync buckets
- Perform dry runs
- Pipe the contents of a remote object to other commands
- Select JSON object content using SQL
Best Practices
Use bulk operations
Use wildcards to operate on multiple files at once. This is faster than running individual commands.
s5cmd cp *.txt s3://bucket-name/
Error handling and logging
Use the --ignore-errors
flag to continue processing even when some files fail to transfer, and use --log
to specify a log file for better traceability.
s5cmd --ignore-errors \
--log=operation.log \
cp *.txt s3://bucket-name/remote-file.txt
Adjust concurrency
Adjust the --numworkers
and --concurrency
options to improve performance for commands such as cp
, select
, and run
.
--numworkers
sets the size of the global worker pool.
--concurrency
sets the number of parts that will be uploaded or downloaded in parallel for a single file.
- The default value for
numworkers
is 256. - The default value for
concurrency
is 5.
The options can be used together:
s5cmd --numworkers 10 \
--concurrency 10 \
cp *.txt s3://bucket-name/remote-file.txt
With extremely large filesystems with greater than one million files, s5cmd may exhibit unwanted behavior. In these cases, reducing concurrency
may help.
s5cmd --concurrency 2 cp *.txt s3://bucket-name/remote-file.txt
Additional Resources
To learn more about s5cmd, see the following resources: