Fine-tune Large Language Models with CoreWeave Cloud
Use Argo Workflows to orchestrate the fine-tuning process
This guide demonstrates how to fine-tune a Large Language Model (LLM) using a custom dataset with CoreWeave Cloud. It serves as a practical example of a starter project but is not intended to be a comprehensive production application in its current form.
Fine-tuning an LLM can be costly; however, this guide illustrates how to utilize CoreWeave's architecture with Argo Workflows to automatically scale down to zero active pods after completing the fine-tuning process. This releases the billable computation resources, making this a cost-effective system.
The scenario in this guide fine-tunes the EleutherAI Pythia model on a single GPU, and can be extended with several advanced features:
- Model variety: This fine-tuner can automatically downloads model from Hugging Face, or use an uploaded model.
- Prompt testing: Characterize the model during fine-tuning by periodically running a prompt file against it and logging the results.
The fine-tuner uses PyTorch 2.0, CUDA 11.8, and the latest version of DeepSpeed. It also supports CUDA 12.0-based images.
The default configuration uses 8 vCPUs, 128GB RAM, and a single NVIDIA A40 or A6000 with 48GB VRAM. This cost-effective and optimized configuration costs about $2.00 per hour with CoreWeave's resource-based pricing and is best suited for models with 155 million to 6 billion parameters.
After fine-tuning, this workflow can optionally deploy an inference endpoint configured with 4 vCPUs, 8GB RAM, and a single NVIDIA RTX A5000 with 24GB VRAM. This setup can efficiently perform inference on models with 6 billion parameters and is much more affordable than the fine-tuning configuration, at approximately $0.85 per hour.
If more performance is desired, use workflow parameters to scale up the number of GPUs, RAM, and vCPUs.
Before proceeding with this demonstration, please complete these preliminary steps:
- 1.
- 2.
- 3.
- 4.
Once those are complete, proceed with the steps below.
In a working folder, download and extract the demonstration code to your local workstation. The following command works in macOS, Linux, and Windows CMD, but not Windows PowerShell.
curl -L https://github.com/coreweave/kubernetes-cloud/archive/refs/heads/master.tar.gz | tar xzf -
Then, change to the repository's
finetuner-workflow
directory. The demonstration uses a shared filesystem called a PVC (Persistent Volume Claim) for storing the base model, fine-tuning data, and sharing them between multiple fine-tune runs.
Create the PVC using one of the two options below.
To create the PVC, apply the
finetune-pvc.yaml
manifest with kubectl
and confirm the PVC was created successfully.kubectl apply -f finetune-pvc.yaml
Result:
persistentvolumeclaim/finetune-data created
The manifest has reasonable defaults, but if it requires editing, keep the following points in mind:
- HDD storage is preferred over NVMe. It's more cost-effective and the fine-tuner does not need high I/O performance.
- The model checkpoints require between 1024 and 2048 GB of storage. If more space is needed later, it's simple to expand it in place.
- By default, the workflow expects a PVC named
finetune-data
. If this changes, remember to adjust thepvc
parameter accordingly.
For an interactive experience, create the PVC in a web browser.
Click New Volume, and enter the requested fields:
- Volume Name: finetune-data
- Region: Chicago - ORD1
- Disk Class: HDD
- Storage Type: Shared Filesystem
- Size (Gi): Choose a size between 1024 and 2048 Gi

Deploy a PVC
Click Create and verify that the PVC is created successfully.
The
filebrowser
app provides an easy way to transfer data to and from the PVC created in the previous step. Note
It's also possible to use a Virtual Server to load files onto the PVC. This flexibility represents one of CoreWeave's key advantages.
- 1.
- 2.Locate
filebrowser
and click Deploy. - 3.Assign it a short name, such as
finetune
, to avoid SSL CNAME issues - 4.Select the ORD1 location.
- 5.Hover over
finetune-data
in Available Volumes to reveal the+
icon, and click it to attach the volume. Ensure that it is mounted, as demonstrated in the example below. - 6.Click Deploy.

Screenshot of filebrowser deployment
Create a directory on the PVC called
dataset
, then load the fine-tuning files into that directory.The data should be in individual plain text files with a
.txt
extension and in the exact format desired for prompts and responses. When the workflow runs, Argo will tokenize the dataset using a fast Golang component.Partition the data into separate files to make it easier to add and remove subsets. When loading multiple datasets, create a meaningful directory name for each.
If using a different top-level directory name than
dataset
, change the dataset
parameter to match before invoking the workflow. This is an inventory of the files in the
finetuner-workflow
directory of the coreweave/kubernetes-cloud
repository, cloned earlier in this guide.Filename | Description |
---|---|
The Argo workflow itself. Many parameters at the top of this file are discussed later. | |
The ServiceAccount role used by Argo. | |
A Dockerfile used to build a fine-tuner image after modifying the finetuner.py code. | |
The deepspeed configuration placed in the container. We recommend not modifying this. | |
Prompt testing functions | |
The simple reference example fine-tune training code. | |
Inference endpoint | |
The Python requirements and versions. It's possible to create a venv , but this is mainly for the Dockerfile build. | |
Utility classes for the fine-tuner. |
The workflow requires a ServiceAccount, Role, and a RoleBinding. Create those by applying the
finetune-role.yaml
manifest. kubectl apply -f finetune-role.yaml
Output:
serviceaccount/finetune created
role.rbac.authorization.k8s.io/role:finetune created
rolebinding.rbac.authorization.k8s.io/rolebinding:finetune-finetune created
Most of the parameters defined in
finetune-workflow.yaml
have default values which can be overridden as needed and don't need to be specified. Require parameter
The
run_name
parameter is required. Make this unique and short because it's used for the InferenceService
DNS name. Here's an example that sets
run_name
to my-example
and runs an inference service when fine-tuning is complete.argo submit finetune-workflow.yaml \
-p run_name=my-example \
-p run_inference=true \
--serviceaccount finetune
Expand to see the example output:
Name: finetune-bl5f2
Namespace: tenant-96362f-dev
ServiceAccount: inference
Status: Pending
Created: Tue Apr 18 18:15:52 -0400 (now)
Progress:
Parameters:
run_name: my-example
run_inference: true
pvc: finetune-data
model: EleutherAI/pythia-2.8b-deduped
dataset: dataset
retokenize: false
tokenizer:
reorder:
sampling: 100
eot_token:
pad_token:
boundary_token: \n
boundary_index: -1
context: 2048
prompt_file:
prompt_every: 0
prompt_tokens: 200
prompt_samples: 5
top_k: 50
top_p: 0.95
temperature: 1
repetition_penalty: 1.1
train_ratio: 0.9
batch_size: -1
force_fp16: false
batch_size_divisor: 1.0
random_seed: 42
learn_rate: 5e-5
epochs: 1
gradients: 5
zero_stage: 3
no_resume: false
logs: logs
wandb_key:
project_id: huggingface
inference_only: false
region: ORD1
trainer_gpu: A40
trainer_gpus: 1
trainer_cores: 8
trainer_ram: 192
inference_gpu: RTX_A5000
model_downloader_image: ghcr.io/wbrown/gpt_bpe/model_downloader
model_downloader_tag: 73cceb0
tokenizer_image: ghcr.io/wbrown/gpt_bpe/dataset_tokenizer
tokenizer_tag: 73cceb0
finetuner_image: gooseai/finetuner
finetuner_tag: cuda-11-8-torch-2-rc10
inference_image: coreweave/ml-images
inference_tag: pytorch-huggingface-81d5ce11
The
finetune-workflow.yaml
manifest has many adjustable parameters. The table below lists the parameters that are most commonly adjusted.Parameter | Description | Default Value |
---|---|---|
run_name | Required Used for DNS and inference service names. Make this unique and short.
The results are stored in results-<run_name> . | None |
dataset | PVC dataset directory | dataset |
reorder | Sort the input text files by:
Note: shuffle sorts the path using the none format, but also randomizes the order of tokens, which is useful for large sequential datasets, such as novels. This is different from the trainer shuffling, which selects contexts randomly. When using an option other than shuffle or none , we recommend passing -no_shuffle to the trainer. | none |
run_inference | Start an test inference service when the fine-tuning exercise is done. This is not intended for production use. | false |
inference_only | If true , skip fine-tuning and only run the inference endpoint. Set this to false on the first run. | false |
pvc | PVC name to use for the dataset and model artifacts. | finetune-data |
region | The CoreWeave region. Generally, this should be ORD1 . | ORD1 |
dataset | Dataset directory relative to the pvc root. | dataset |
context | Training context size in tokens. Affects the tokenization process as well. | 2048 |
epochs | Number of times the fine-tuner should train on the dataset. | 1 |
learn_rate | How quickly the model should learn the fine-tune data. Too high a learn rate can be counterproductive and replace the base model's training. | 5e-5 |
wandb_key | Strongly recommended. Use an API key from http://wandb.ai to report on fine-tuning metrics with nice charts. | None |
To adjust performance, modify these hardware parameters.
Parameter | Description | Default Value |
---|---|---|
trainer_gpu | Type of training GPU | A40 |
trainer_gpus | Number of training GPUs | Default: 1 Valid Range is 1 to 7 . |
inference_gpu | Type of inference GPU | RTX_A5000 |
trainer_cores | Number of vCPU cores | 8 |
trainer_ram | Gigabytes of RAM | 192 |
The default model is EleutherAI/pythia-2.8-deduped. To change models, specify the Hugging Face model identifier to download. Models are cached in the PVC
cache
directory. If a custom model is uploaded, specify the model's path from the PVC root.
Parameter | Description | Default Value |
---|---|---|
model | Model identifier, or path to an uploaded model. | EleutherAI/pythia-2.8b-deduped |
The fine-tuner assumes the model's tokenization format is BPE (Byte Pair Encoding), a common subword tokenization technique.
It's often useful to characterize a model during fine-tuning, so the demonstration accepts a prompt file to periodically run against the model. The results are sent to Weights and Biases (wandb) and logged. The
top_k
, top_p
, temperature
, and repetition_penalty
parameters are also adjustable.To use prompt testing, set these parameters.
Parameter | Description | Default Value |
---|---|---|
prompt_file | Prompts file to read when testing the model and reporting the outputs. This is useful for monitoring the progress and training effectiveness. One prompt per line, newlines escaped with \n if the prompt is multi-line. | none |
prompt_every | Interval of step counts to run the prompts tests. If 0 , it will perform these tests at each checkpoint. | 0 |
prompt_tokens | Number of tokens to generate per prompt. | 200 |
prompt_samples | Number of times to evaluate each prompt. | 5 |
top_k | Sampling parameter used during prompt evaluation. | 50 |
top_p | Sampling parameter used during prompt evaluation. | 0.95 |
temperature | Sampling parameter used during prompt evaluation. | 1.0 |
repetition_penalty | Sampling parameter used during prompt evaluation. | 1.1 |
We've shown how to set workflow parameters on the command line, but there are other methods available.
finetune_params.yaml
run_name: my-example
dataset: dataset
reorder: random
run_inference: true
inference_only: false
model: EleutherAI/pythia-2.8b-deduped
Then, modify the command line to use the
--parameter-file
option:argo submit finetune-workflow.yaml \
--parameter-file finetune_params.yaml \
--serviceaccount finetune
- 1.
- 2.Navigate to the Access URL, found at the top of the page.
- 3.Paste the Bearer token in the client authentication field, then click Login.
- 4.Click Submit new workflow.
- 5.
- 6.In the Parameters section, modify the values according to your requirements.
- 7.In the Service Account field, enter:
finetune
. - 8.Click Submit.
To use the Argo Workflows API, you'll use a tool like
curl
to send an HTTP request to the Argo server. First, convert your workflow from YAML to JSON with
argo list
.argo list -o json finetune-workflow.yaml > finetune-workflow.json
Next, create a JSON file with the parameter values you want to submit:
params.json
{
"run_name": "my-example",
"dataset": "dataset",
"reorder": "random",
"run_inference": "true",
"inference_only": "false",
"model": "EleutherAI/pythia-2.8b-deduped",
"serviceaccount": "finetune"
}
Submit the workflow to the API with
curl
.curl -X POST -H "Content-Type: application/json" \
-H "Authorization: Bearer <the-token>" \
-d @finetune-workflow.json \
--data-urlencode [email protected] \
http://<argo-server-address>/api/v1/workflows/<namespace>/submit
- Replace
<namespace>
with your workflow namespace.
You can find the namespace by running:
$ kubectl get deployments | grep argo-server
fine-tuner-argo-server 1/1 1 1 3d20h
The namespace in this example is
fine-tuner-argo-server
.mkdir finetune-helm-chart
cd finetune-helm-chart
Then, initialize the chart. This command generates a basic Helm chart structure in the
finetune
directory.helm create finetune
Move your existing
finetune-workflow.yaml
file into the templates
directory within the finetune
Helm chart.mv ../finetune-workflow.yaml finetune/templates/
Create
values.yaml
in the finetune
directory with the same parameters as the original Argo command line, plus the serviceaccount
value.finetune/values.yaml
run_name: my-example
dataset: dataset
reorder: random
run_inference: true
inference_only: false
model: EleutherAI/pythia-2.8b-deduped
serviceaccount: finetune
Modify the
finetune-workflow.yaml
file in the templates
directory to use Helm chart parameters as desired. This is a simplified example of the parameter syntax, do not copy and paste this directly.apiVersion: argoproj.io/v1alpha1
kind: Workflow
metadata:
generateName: "finetune-"
spec:
entrypoint: main
# Example Helm parameter syntax
arguments:
parameters:
- name: run_name
value: {{ .Values.run_name }}
- name: dataset
value: {{ .Values.dataset }}
- name: reorder
value: {{ .Values.reorder }}
- name: run_inference
value: {{ .Values.run_inference }}
- name: inference_only
value: {{ .Values.inference_only }}
- name: model
value: {{ .Values.model }}
serviceAccountName: {{ .Values.serviceaccount }}
Install the Helm chart with the
helm install
command:helm install finetune-release ./finetune
This will deploy the workflow with the values from
values.yaml
.To customize the parameter values when deploying the Helm chart, either update the
values.yaml
file directly or use the --set
option with the helm install
command. For example:helm install finetune-release ./finetune \
--set run_name=new-example,dataset=new-dataset
This deploys the workflow and uses the
--set
option to override the corresponding values in values.yaml
.Navigate to the Argo Workflows deployment in the Applications Catalog, then click the Access URL at the top of the page to open the login screen.
Note
The deployment may require up to 5 minutes to request a TLS certificate. If an HTTPS security warning occurs, please wait for the certificate to be installed.
Paste the Bearer token, which was retrieved in the prerequisites section, into the client authentication box then click Login.

Screenshot of Argo login screen
Navigate to the Workflows menu to see the progress as it goes through the tokenization, fine-tune, and inference stages.

Argo Workflows HTTPS request, via the web UI
To instruct Argo to observe the workflow progress in the CLI, add
--watch
to the command line.argo submit finetune-workflow.yaml \
-p run_name=my-example \
-p run_inference=true \
--serviceaccount finetune \
--watch
To attach to a running workflow and observe it, first locate the workflow name.
argo list
Then, watch the workflow. Substitute the workflow name where indicated.
argo watch <workflow name>
The output is similar to this:
Name: finetune-bl5f2
Namespace: tenant-96362f-dev
ServiceAccount: inference
Status: Failed
Message: child 'finetune-bl5f2-4194046320' failed
Conditions:
PodRunning False
Completed True
Created: Tue Apr 18 18:15:52 -0400 (2 minutes ago)
Started: Tue Apr 18 18:15:52 -0400 (2 minutes ago)
Finished: Tue Apr 18 18:17:12 -0400 (1 minute ago)
Duration: 1 minute 20 seconds
Progress: 1/3
ResourcesDuration: 1m18s*(1 cpu),4m3s*(100Mi memory)
Parameters:
run_name: my-example
download_dataset: true
run_inference: true
pvc: finetune-data
model: EleutherAI/pythia-2.8b-deduped
dataset: dataset
retokenize: false
tokenizer:
reorder:
sampling: 100
eot_token:
pad_token:
boundary_token: \n
boundary_index: -1
context: 2048
prompt_file:
prompt_every: 0
prompt_tokens: 200
prompt_samples: 5
top_k: 50
top_p: 0.95
temperature: 1
repetition_penalty: 1.1
train_ratio: 0.9
batch_size: -1
force_fp16: false
batch_size_divisor: 1.0
random_seed: 42
learn_rate: 5e-5
epochs: 1
gradients: 5
zero_stage: 3
no_resume: false
logs: logs
wandb_key:
project_id: huggingface
inference_only: false
region: ORD1
trainer_gpu: A40
trainer_gpus: 1
trainer_cores: 8
trainer_ram: 192
inference_gpu: RTX_A5000
model_downloader_image: ghcr.io/wbrown/gpt_bpe/model_downloader
model_downloader_tag: 73cceb0
tokenizer_image: ghcr.io/wbrown/gpt_bpe/dataset_tokenizer
tokenizer_tag: 73cceb0
dataset_downloader_image: ghcr.io/coreweave/dataset-downloader/smashwords-downloader
dataset_downloader_tag: 7dba2c7
finetuner_image: gooseai/finetuner
finetuner_tag: cuda-11-8-torch-2-rc10
inference_image: coreweave/ml-images
inference_tag: pytorch-huggingface-81d5ce11
STEP TEMPLATE PODNAME DURATION MESSAGE
● finetune-bl5f2 main
├───✔ tokenizer(0) model-tokenizer finetune-bl5f2-2169410118 7s
└───● finetuner model-finetuner finetune-bl5f2-3837635091 1h
During fine-tuning, the time elapsed is displayed, along with the expected completion time. Checkpointing and loss reporting are also provided.
Important
If the process appears to hang at the message
Loading the model
, this is a known bug in the terminal display that can happen while downloading the model. To fix this, kill the workflow job and resubmit it. The display issue should not reoccur after the first download.
To view the logs, run the following command and substitute the workflow name where indicated.
argo logs <workflow-name>
After fine-tuning is complete, the tuned model is in the
/finetunes/{{run_name}}/final
directory of the PVC. It can be downloaded directly, or if the inference service was deployed, query the new model from the inference service URL. kn service list
Otherwise, use
kubectl
:kubectl get ksvc
NAME URL LATEST AGE CONDITIONS READY REASON
inference-western-predictor-default http://inference-western-predictor-default.tenant-example.knative.chi.coreweave.com inference-western-predictor-default-00007 2d21h 3 OK / 3 True
curl -XPOST -H "Content-type: application/json" -d '{
"prompt": "She danced with him in the honky-tonk"
}' 'http://inference-western-predictor-default.tenant-example.knative.chi.coreweave.com/completion' |jq .
That should return a result similar to:
% Total % Received % Xferd Average Speed Time Time Time Current
Dload Upload Total Spent Left Speed
100 935 100 828 100 107 147 19 0:00:05 0:00:05 --:--:-- 188
{
"generated_text": "She danced with him in the honky-tonk hall as if to say, \"You got me into this mess. Now I'll get you out of it. Let's split it and go our separate ways. Maybe I'll get lucky and make you my partner.\"\nHe grinned. \"You never know. Don't let anyone stop you. But if someone tries to arrest you, let them worry about that.\"\n\"I'll do that. Now, about that money?\"\n\"Money? What money?\"\n\"The loan they paid to your uncle to buy your brother out of that mine. I'm not sure why they did that.\"\nHe grinned. \"That's what I've been trying to figure out myself. They want more power over the land they're buying so they can put up cattle. But they're not taking long to figure out that I'm onto them, so keep this money safe until we figure out the best way to handle it. Don't try"
}
These are the basic steps for downloading a base model, tokenizing a dataset, fine-tuning the model, and running test inferences against the new model. Even though this just begins to explore the capabilities of fine-tuning, it provides a solid starting point and a guide to inspire more production projects.