Skip to main content

Use s5cmd with CoreWeave Object Storage

S5cmd is a fast and efficient command-line utility for interacting with S3-compatible Object Storage services. According to the author, it's multiple orders of magnitude faster than s3cmd or aws-cli.

This guide explains how to install, configure, and use s5cmd with CoreWeave Object Storage. Before proceeding, you'll need a Linux, macOS, or Windows machine with terminal access and a CoreWeave Object Storage Token.

Installation

To install s5cmd for your platform:

Configuration

S5cmd credentials can be provided in a default file or through the environment.

Default file

The default configuration is located at ~/.aws/credentials. Create this file if it doesn't exist and add your Object Storage credentials:

[default]
aws_access_key_id = YOUR_ACCESS_KEY
aws_secret_access_key = YOUR_SECRET_KEY

Environment

If environment variables are preferred, set AWS_ACCESS_KEY_ID and AWS_SECRET_ACCESS_KEY instead:

# Export your AWS access key and secret pair
export AWS_ACCESS_KEY_ID='<your-access-key-id>'
export AWS_SECRET_ACCESS_KEY='<your-secret-access-key>'

Required Endpoint URL

The endpoint URL must be defined to use s5cmd with CoreWeave Object Storage. It can be defined on the command line or in the local environment.

tip

These examples use the LGA1 endpoint. Please see the Object Storage documentation for a list of available endpoints.

On the command Line

As an example, when copying files with the cp command, specify the endpoint URL on the command line as shown:

$ s5cmd --endpoint-url=https://object.lga1.coreweave.com \
cp local-file.txt s3://bucket-name/remote-file.txt

In the environment

To set the endpoint URL in the local environment:

$ export S3_ENDPOINT_URL="https://object.lga1.coreweave.com"

After S3_ENDPOINT_URL is set, s5cmd will fetch the endpoint from the environment.

$ s5cmd cp local-file.txt s3://bucket-name/remote-file.txt

As an alias (optional)

In some cases it may be helpful to define an alias in your profile with the endpoint URL:

$ alias s5="s5cmd --endpoint-url https://object.lga1.coreweave.com"

After defining the alias, the copy command can be run as:

$ s5 cp local-file.txt s3://bucket-name/remote-file.txt

Alternate Profiles

A credentials file can contain multiple profiles. For example, default and alternate profiles are defined like this:

[default]
aws_access_key_id = YOUR_ACCESS_KEY
aws_secret_access_key = YOUR_SECRET_KEY

[my_alternate]
aws_access_key_id = ALTERNATE_ACCESS_KEY
aws_secret_access_key = ALTERNATE_SECRET_KEY

To specify an alternate profile, use the --profile option:

$ s5cmd --profile my_alternate \
cp local-file.txt s3://bucket-name/remote-file.txt

An alternate credentials file can be used with the --credentials-file option.

$ s5cmd --credentials-file ~/.alternate-credentials-file \
cp local-file.txt s3://bucket-name/remote-file.txt

The options can also be combined.

$ s5cmd --credentials-file ~/.alternate-credentials-file \
--profile my_alternate \
cp local-file.txt s3://bucket-name/remote-file.txt

Basic usage

These are basic commands for using s5cmd. All examples in this guide assume that the endpoint URL is defined in the local environment as described above.

List buckets

$ s5cmd ls

Upload a file

$ s5cmd cp local-file.txt s3://bucket-name/remote-file.txt

Download a file

$ s5cmd cp s3://bucket-name/remote-file.txt local-file.txt

Delete a file

$ s5cmd rm s3://bucket-name/remote-file.txt

Advanced features

s5cmd supports a number of advanced features. Please see the official documentation for more information about these:

Best Practices

Use bulk operations

Use wildcards to operate on multiple files at once. This is faster than running individual commands.

$ s5cmd cp *.txt s3://bucket-name/

Error handling and logging

Use the --ignore-errors flag to continue processing even when some files fail to transfer, and use --log to specify a log file for better traceability.

$ s5cmd --ignore-errors \
--log=operation.log \
cp *.txt s3://bucket-name/remote-file.txt

Adjust concurrency

Adjust the --numworkers and --concurrency options to improve performance for commands such as cp, select, and run.

--numworkers sets the size of the global worker pool.

--concurrency sets the number of parts that will be uploaded or downloaded in parallel for a single file.

  • The default value for numworkers is 256.
  • The default value for concurrency is 5.

The options can be used together:

$ s5cmd --numworkers 10 \
--concurrency 10 \
cp *.txt s3://bucket-name/remote-file.txt
tip

With extremely large filesystems with greater than one million files, s5cmd may exhibit unwanted behavior. In these cases, reducing concurrency may help.

$ s5cmd --concurrency 2 cp *.txt s3://bucket-name/remote-file.txt

Additional Resources

To learn more about s5cmd, see the following resources: