Skip to main content

Fine-tune Large Language Models with CoreWeave Cloud

Use Argo Workflows to orchestrate the fine-tuning

Introduction

This guide demonstrates how to fine-tune a Large Language Model (LLM) using a custom dataset with CoreWeave Cloud. It serves as a practical example of a starter project but is not intended to be a comprehensive production application in its current form.

Fine-tuning an LLM can be costly; however, this guide illustrates how to utilize CoreWeave's architecture with Argo Workflows to automatically scale down to zero active pods after completing the fine-tuning process. This releases the billable computation resources, making this a cost-effective system.

Features

The scenario in this guide fine-tunes the EleutherAI Pythia model on a single GPU, and can be extended with several advanced features:

  • Scalable configuration: Run up to 8 GPUs on the same node, and adjust the RAM and number of CPUs.
  • Model variety: This fine-tuner can automatically downloads model from Hugging Face, or use an uploaded model.
  • Prompt testing: Characterize the model during fine-tuning by periodically running a prompt file against it and logging the results.

The fine-tuner uses PyTorch 2.0, CUDA 11.8, and the latest version of DeepSpeed. It also supports CUDA 12.0-based images.

Hardware configuration

The default configuration uses 8 vCPUs, 128GB RAM, and a single NVIDIA A40 or A6000 with 48GB VRAM. This cost-effective and optimized configuration costs about $2.00 per hour with CoreWeave's resource-based pricing and is best suited for models with 155 million to 6 billion parameters.

After fine-tuning, this workflow can optionally deploy an inference endpoint configured with 4 vCPUs, 8GB RAM, and a single NVIDIA RTX A5000 with 24GB VRAM. This setup can efficiently perform inference on models with 6 billion parameters and is much more affordable than the fine-tuning configuration, at approximately $0.85 per hour.

If more performance is desired, use workflow parameters to scale up the number of GPUs, RAM, and vCPUs.

Prerequisite steps

Before proceeding with this demonstration, please complete these preliminary steps:

  1. Configure CoreWeave Kubernetes
  2. Install Argo Workflows
  3. Retrieve the Bearer token and log into the Argo web UI
  4. Install Argo CLI

Once those are complete, proceed with the steps below.

Download the source code repository

In a working folder, download and extract the demonstration code to your local workstation. The following command works in macOS, Linux, and Windows CMD, but not Windows PowerShell.

curl -L https://github.com/coreweave/kubernetes-cloud/archive/refs/heads/master.tar.gz | tar xzf -

Alternatively, clone the kubernetes-cloud GitHub repository with your preferred method.

Then, change to the repository's finetuner-workflow directory.

Create a PVC

The demonstration uses a shared filesystem called a PVC (Persistent Volume Claim) for storing the base model, fine-tuning data, and sharing them between multiple fine-tune runs.

Create the PVC using one of the two options below.

Option 1: Create the PVC with kubectl

To create the PVC, apply the finetune-pvc.yaml manifest with kubectl and confirm the PVC was created successfully.

kubectl apply -f finetune-pvc.yaml

Result:

persistentvolumeclaim/finetune-data created

The manifest has reasonable defaults, but if it requires editing, keep the following points in mind:

  • HDD storage is preferred over NVMe. It's more cost-effective and the fine-tuner does not need high I/O performance.
  • The model checkpoints require between 1024 and 2048 GB of storage. If more space is needed later, it's simple to expand it in place.
  • By default, the workflow expects a PVC named finetune-data. If this changes, remember to adjust the pvc parameter accordingly.

Option 2: Create the PVC with CoreWeave Cloud

For an interactive experience, create the PVC in a web browser.

Navigate to Storage Volumes in CoreWeave Cloud.

Click New Volume, and enter the requested fields:

  • Volume Name: finetune-data
  • Region: Chicago - ORD1
  • Disk Class: HDD
  • Storage Type: Shared Filesystem
  • Size (Gi): Choose a size between 1024 and 2048 Gi
Deploy a PVC

Click Create and verify that the PVC is created successfully.

Optional: Deploy a Filebrowser app

The filebrowser app provides an easy way to transfer data to and from the PVC created in the previous step.

note

It's also possible to use a Virtual Server to load files onto the PVC. This flexibility represents one of CoreWeave's key advantages.

  1. Navigate to CoreWeave Applications, and then click the Catalog tab.
  2. Locate filebrowser and click Deploy.
  3. Assign it a short name, such as finetune, to avoid SSL CNAME issues
  4. Select the ORD1 location.
  5. Hover over finetune-data in Available Volumes to reveal the + icon, and click it to attach the volume. Ensure that it is mounted, as demonstrated in the example below.
  6. Click Deploy.
Screenshot of filebrowser deployment

Load the fine-tuning dataset

Connect to the PVC using the filebrowser application, or use a virtual server.

Create a directory on the PVC called dataset, then load the fine-tuning files into that directory.

The data should be in individual plain text files with a .txt extension and in the exact format desired for prompts and responses. When the workflow runs, Argo will tokenize the dataset using a fast Golang component.

Partition the data into separate files to make it easier to add and remove subsets. When loading multiple datasets, create a meaningful directory name for each.

If using a different top-level directory name than dataset, change the dataset parameter to match before invoking the workflow.

About the repository files

This is an inventory of the files in the finetuner-workflow directory of the coreweave/kubernetes-cloud repository, cloned earlier in this guide.

FilenameDescription
finetune-workflow.yamlThe Argo workflow itself. Many parameters at the top of this file are discussed later.
finetune-pvc.yamlThe PVC declaration applied in the prerequisites section of this guide.
finetune-role.yamlThe ServiceAccount role used by Argo.
finetuner/DockerfileA Dockerfile used to build a fine-tuner image after modifying the finetuner.py code.
finetuner/ds_config.jsonThe deepspeed configuration placed in the container. We recommend not modifying this.
finetuner/evaluator.pyPrompt testing functions
finetuner/finetuner.pyThe simple reference example fine-tune training code.
finetuner/inference.pyInference endpoint
finetuner/requirements.txtThe Python requirements and versions. It's possible to create a venv, but this is mainly for the Dockerfile build.
finetuner/utils.pyUtility classes for the fine-tuner.

Set up the ServiceAccount

The workflow requires a ServiceAccount, Role, and a RoleBinding. Create those by applying the finetune-role.yaml manifest.

kubectl apply -f finetune-role.yaml

Output:

serviceaccount/finetune created
role.rbac.authorization.k8s.io/role:finetune created
rolebinding.rbac.authorization.k8s.io/rolebinding:finetune-finetune created

Run Argo Workflows

Most of the parameters defined in finetune-workflow.yaml have default values which can be overridden as needed and don't need to be specified.

Required parameter

The run_name parameter is required. Make this unique and short because it's used for the InferenceService DNS name.

Here's an example that sets run_name to my-example and runs an inference service when fine-tuning is complete.

argo submit finetune-workflow.yaml \
-p run_name=my-example \
-p run_inference=true \
--serviceaccount finetune

Expand to see the example output:

Click to expand - Example output
Name:                finetune-bl5f2
Namespace: tenant-96362f-dev
ServiceAccount: inference
Status: Pending
Created: Tue Apr 18 18:15:52 -0400 (now)
Progress:
Parameters:
run_name: my-example
run_inference: true
pvc: finetune-data
model: EleutherAI/pythia-2.8b-deduped
dataset: dataset
retokenize: false
tokenizer:
reorder:
sampling: 100
eot_token:
pad_token:
boundary_token: \n
boundary_index: -1
context: 2048
prompt_file:
prompt_every: 0
prompt_tokens: 200
prompt_samples: 5
top_k: 50
top_p: 0.95
temperature: 1
repetition_penalty: 1.1
train_ratio: 0.9
batch_size: -1
force_fp16: false
batch_size_divisor: 1.0
random_seed: 42
learn_rate: 5e-5
epochs: 1
gradients: 5
zero_stage: 3
no_resume: false
logs: logs
wandb_key:
project_id: huggingface
inference_only: false
region: ORD1
trainer_gpu: A40
trainer_gpus: 1
trainer_cores: 8
trainer_ram: 192
inference_gpu: RTX_A5000
model_downloader_image: ghcr.io/wbrown/gpt_bpe/model_downloader
model_downloader_tag: 73cceb0
tokenizer_image: ghcr.io/wbrown/gpt_bpe/dataset_tokenizer
tokenizer_tag: 73cceb0
finetuner_image: gooseai/finetuner
finetuner_tag: cuda-11-8-torch-2-rc10
inference_image: coreweave/ml-images
inference_tag: pytorch-huggingface-81d5ce11

Common parameters

The finetune-workflow.yaml manifest has many adjustable parameters. The table below lists the parameters that are most commonly adjusted.

ParameterDescriptionDefault Value
run_nameRequired: Used for DNS and inference service names. Make this unique and short. The results are stored in results-<run_name>.None
datasetPVC dataset directorydataset
reorderSort the input text files by size_ascending, size_descending, name_ascending, name_descending, random, none, or shuffle.
Note: shuffle sorts the path using the none format, but also randomizes the order of tokens, which is useful for large sequential datasets, such as novels. This is different from the trainer shuffling, which selects contexts randomly. When using an option other than shuffle or none, we recommend passing -no_shuffle to the trainer.
none
run_inferenceStart an test inference service when the fine-tuning exercise is done. This is not intended for production use.false
inference_onlyIf true, skip fine-tuning and only run the inference endpoint. Set this to false on the first run.false
pvcPVC name to use for the dataset and model artifacts.finetune-data
regionThe CoreWeave region. Generally, this should be ORD1.ORD1
datasetDataset directory relative to the pvc root.dataset
contextTraining context size in tokens. Affects the tokenization process as well.2048
epochsNumber of times the fine-tuner should train on the dataset.1
learn_rateHow quickly the model should learn the fine-tune data. Too high a learn rate can be counterproductive and replace the base model's training.5e-5
wandb_keyStrongly recommended. Use an API key from http://wandb.ai to report on fine-tuning metrics with nice charts.None

Parameters that adjust hardware

To adjust performance, modify these hardware parameters.

ParameterDescriptionDefault Value
trainer_gpuType of training GPUA40
trainer_gpusNumber of training GPUsDefault: 1. Valid Range is 1 to 7.
inference_gpuType of inference GPURTX_A5000
trainer_coresNumber of vCPU cores8
trainer_ramGigabytes of RAM192

How to change base models

The fine-tuner supports models that use the Hugging Face AutoModelForCausalLM transformer class.

The default model is EleutherAI/pythia-2.8-deduped. To change models, specify the Hugging Face model identifier to download. Models are cached in the PVC cache directory.

If a custom model is uploaded, specify the model's path from the PVC root.

ParameterDescriptionDefault Value
modelModel identifier, or path to an uploaded model.EleutherAI/pythia-2.8b-deduped

The fine-tuner assumes the model's tokenization format is BPE (Byte Pair Encoding), a common subword tokenization technique.

Parameters for prompt testing

It's often useful to characterize a model during fine-tuning, so the demonstration accepts a prompt file to periodically run against the model. The results are sent to Weights and Biases (wandb) and logged. The top_k, top_p, temperature, and repetition_penalty parameters are also adjustable.

To use prompt testing, set these parameters.

ParameterDescriptionDefault Value
prompt_filePrompts file to read when testing the model and reporting the outputs. This is useful for monitoring the progress and training effectiveness. One prompt per line, newlines escaped with \n if the prompt is multi-line.none
prompt_everyInterval of step counts to run the prompts tests. If 0, it will perform these tests at each checkpoint.0
prompt_tokensNumber of tokens to generate per prompt.200
prompt_samplesNumber of times to evaluate each prompt.5
top_kSampling parameter used during prompt evaluation.50
top_pSampling parameter used during prompt evaluation.0.95
temperatureSampling parameter used during prompt evaluation.1.0
repetition_penaltySampling parameter used during prompt evaluation.1.1

Other invocation methods

We've shown how to set workflow parameters on the command line, but there are other methods available.

Click to expand - How to use a parameters file

To use a parameters file, create finetune_params.yaml.

finetune_params.yaml
run_name: my-example
dataset: dataset
reorder: random
run_inference: true
inference_only: false
model: EleutherAI/pythia-2.8b-deduped

Then, modify the command line to use the --parameter-file option:

argo submit finetune-workflow.yaml \
--parameter-file finetune_params.yaml \
--serviceaccount finetune
Click to expand - How to use the web UI
  1. Navigate to your Argo Workflows deployment in the Applications Catalog.
  2. Navigate to the Access URL, found at the top of the page.
  3. Paste the Bearer token in the client authentication field, then click Login.
  4. Click Submit new workflow.
  5. Upload finetune-workflow.yaml, or paste the contents into the Workflow text area.
  6. In the Parameters section, modify the values according to your requirements.
  7. In the Service Account field, enter: finetune.
  8. Click Submit.
Click to expand - How to use the API

To use the Argo Workflows API, you'll use a tool like curl to send an HTTP request to the Argo server.

First, convert your workflow from YAML to JSON with argo list.

argo list -o json finetune-workflow.yaml > finetune-workflow.json

Next, create a JSON file with the parameter values you want to submit:

params.json
{
"run_name": "my-example",
"dataset": "dataset",
"reorder": "random",
"run_inference": "true",
"inference_only": "false",
"model": "EleutherAI/pythia-2.8b-deduped",
"serviceaccount": "finetune"
}

Submit the workflow to the API with curl.

curl -X POST -H "Content-Type: application/json" \
-H "Authorization: Bearer <the-token>" \
-d @finetune-workflow.json \
--data-urlencode [email protected] \
http://<argo-server-address>/api/v1/workflows/<namespace>/submit

You can find the namespace by running:

kubectl get deployments | grep argo-server
fine-tuner-argo-server 1/1 1 1 3d20h

The namespace in this example is fine-tuner-argo-server.

Click to expand - How to use Helm charts

To use a Helm chart, first create a new directory.

mkdir finetune-helm-chart
cd finetune-helm-chart

Then, initialize the chart. This command generates a basic Helm chart structure in the finetune directory.

helm create finetune

Move your existing finetune-workflow.yaml file into the templates directory within the finetune Helm chart.

mv ../finetune-workflow.yaml finetune/templates/

Create values.yaml in the finetune directory with the same parameters as the original Argo command line, plus the serviceaccount value.

finetune/values.yaml
run_name: my-example
dataset: dataset
reorder: random
run_inference: true
inference_only: false
model: EleutherAI/pythia-2.8b-deduped
serviceaccount: finetune

Modify the finetune-workflow.yaml file in the templates directory to use Helm chart parameters as desired. This is a simplified example of the parameter syntax, do not copy and paste this directly.

apiVersion: argoproj.io/v1alpha1
kind: Workflow
metadata:
generateName: "finetune-"
spec:
entrypoint: main
# Example Helm parameter syntax
arguments:
parameters:
- name: run_name
value: {{ .Values.run_name }}
- name: dataset
value: {{ .Values.dataset }}
- name: reorder
value: {{ .Values.reorder }}
- name: run_inference
value: {{ .Values.run_inference }}
- name: inference_only
value: {{ .Values.inference_only }}
- name: model
value: {{ .Values.model }}
serviceAccountName: {{ .Values.serviceaccount }}

Install the Helm chart with the helm install command:

helm install finetune-release ./finetune

This will deploy the workflow with the values from values.yaml.

More options

To customize the parameter values when deploying the Helm chart, either update the values.yaml file directly or use the --set option with the helm install command. For example:

helm install finetune-release ./finetune \
--set run_name=new-example,dataset=new-dataset

This deploys the workflow and uses the --set option to override the corresponding values in values.yaml.

Use the web UI to observe the workflow

Navigate to the Argo Workflows deployment in the Applications Catalog, then click the Access URL at the top of the page to open the login screen.

note

The deployment may require up to 5 minutes to request a TLS certificate. If an HTTPS security warning occurs, please wait for the certificate to be installed.

Paste the Bearer token, which was retrieved in the prerequisites section, into the client authentication box then click Login.

Screenshot of Argo login screen

Navigate to the Workflows menu to see the progress as it goes through the tokenization, fine-tune, and inference stages.

Argo Workflows HTTPS request, via the web UI

Use the Argo CLI to observe the workflow

To instruct Argo to observe the workflow progress in the CLI, add --watch to the command line.

argo submit finetune-workflow.yaml \
-p run_name=my-example \
-p run_inference=true \
--serviceaccount finetune \
--watch

To attach to a running workflow and observe it, first locate the workflow name.

argo list

Then, watch the workflow. Substitute the workflow name where indicated.

argo watch <workflow name>

The output is similar to this:

Click to expand - Argo watch output
Name:                finetune-bl5f2
Namespace: tenant-96362f-dev
ServiceAccount: inference
Status: Failed
Message: child 'finetune-bl5f2-4194046320' failed
Conditions:
PodRunning False
Completed True
Created: Tue Apr 18 18:15:52 -0400 (2 minutes ago)
Started: Tue Apr 18 18:15:52 -0400 (2 minutes ago)
Finished: Tue Apr 18 18:17:12 -0400 (1 minute ago)
Duration: 1 minute 20 seconds
Progress: 1/3
ResourcesDuration: 1m18s*(1 cpu),4m3s*(100Mi memory)
Parameters:
run_name: my-example
download_dataset: true
run_inference: true
pvc: finetune-data
model: EleutherAI/pythia-2.8b-deduped
dataset: dataset
retokenize: false
tokenizer:
reorder:
sampling: 100
eot_token:
pad_token:
boundary_token: \n
boundary_index: -1
context: 2048
prompt_file:
prompt_every: 0
prompt_tokens: 200
prompt_samples: 5
top_k: 50
top_p: 0.95
temperature: 1
repetition_penalty: 1.1
train_ratio: 0.9
batch_size: -1
force_fp16: false
batch_size_divisor: 1.0
random_seed: 42
learn_rate: 5e-5
epochs: 1
gradients: 5
zero_stage: 3
no_resume: false
logs: logs
wandb_key:
project_id: huggingface
inference_only: false
region: ORD1
trainer_gpu: A40
trainer_gpus: 1
trainer_cores: 8
trainer_ram: 192
inference_gpu: RTX_A5000
model_downloader_image: ghcr.io/wbrown/gpt_bpe/model_downloader
model_downloader_tag: 73cceb0
tokenizer_image: ghcr.io/wbrown/gpt_bpe/dataset_tokenizer
tokenizer_tag: 73cceb0
dataset_downloader_image: ghcr.io/coreweave/dataset-downloader/smashwords-downloader
dataset_downloader_tag: 7dba2c7
finetuner_image: gooseai/finetuner
finetuner_tag: cuda-11-8-torch-2-rc10
inference_image: coreweave/ml-images
inference_tag: pytorch-huggingface-81d5ce11

STEP TEMPLATE PODNAME DURATION MESSAGE
● finetune-bl5f2 main
├───✔ tokenizer(0) model-tokenizer finetune-bl5f2-2169410118 7s
└───● finetuner model-finetuner finetune-bl5f2-3837635091 1h

During fine-tuning, the time elapsed is displayed, along with the expected completion time. Checkpointing and loss reporting are also provided.

View Argo logs

important

If the process appears to hang at the message Loading the model, this is a known bug in the terminal display that can happen while downloading the model.

To fix this, kill the workflow job and resubmit it. The display issue should not reoccur after the first download.

To view the logs, run the following command and substitute the workflow name where indicated.

argo logs <workflow-name>

Artifacts and Inference

After fine-tuning is complete, the tuned model is in the /finetunes/{{run_name}}/final directory of the PVC. It can be downloaded directly, or if the inference service was deployed, query the new model from the inference service URL.

If KNative CLI is installed, discover the service URL with:

kn service list

Otherwise, use kubectl:

kubectl get ksvc
NAME URL LATEST AGE CONDITIONS READY REASON
inference-western-predictor-default http://inference-western-predictor-default.tenant-example.knative.chi.coreweave.com inference-western-predictor-default-00007 2d21h 3 OK / 3 True

Then, with the URL available and jq installed, use curl to run a test query:

curl -XPOST -H "Content-type: application/json" -d '{
"prompt": "She danced with him in the honky-tonk"
}' 'http://inference-western-predictor-default.tenant-example.knative.chi.coreweave.com/completion' |jq .

That should return a result similar to:

  % Total    % Received % Xferd  Average Speed   Time    Time     Time  Current
Dload Upload Total Spent Left Speed
100 935 100 828 100 107 147 19 0:00:05 0:00:05 --:--:-- 188

{
"generated_text": "She danced with him in the honky-tonk hall as if to say, \"You got me into this mess. Now I'll get you out of it. Let's split it and go our separate ways. Maybe I'll get lucky and make you my partner.\"\nHe grinned. \"You never know. Don't let anyone stop you. But if someone tries to arrest you, let them worry about that.\"\n\"I'll do that. Now, about that money?\"\n\"Money? What money?\"\n\"The loan they paid to your uncle to buy your brother out of that mine. I'm not sure why they did that.\"\nHe grinned. \"That's what I've been trying to figure out myself. They want more power over the land they're buying so they can put up cattle. But they're not taking long to figure out that I'm onto them, so keep this money safe until we figure out the best way to handle it. Don't try"
}

Summary

These are the basic steps for downloading a base model, tokenizing a dataset, fine-tuning the model, and running test inferences against the new model. Even though this just begins to explore the capabilities of fine-tuning, it provides a solid starting point and a guide to inspire more production projects.