Finetuning Machine Learning Models

Finetuning and training machine learning models can be computationally expensive. CoreWeave Cloud allows for easy, on-demand compute resources to train models, along with the infrastructure to support it.
This guide is intended to be a reference example of how to use Argo Workflows to set up a machine learning pipeline on CoreWeave.
The reference example utilizes GPT-type transformer models with the Hugging Face Transformers library, and assumes that the model's tokenization format is BPE.
This reference example is not intended to be a production application; rather, it is a guide on how to utilize CoreWeave resources to set up a pipeline.
The base model being trained on can be provided directly in a PVC (PersistentVolumeClaim), or in a model identifier from Hugging Face's model repository. The dataset trained upon needs to be in the same PVC, and in pure text format.
It is recommended that you partition your data into separate files for easy addition and removal of subsets.
Presently, the reference example uses the following container configuration to train models on:
  • 8 vCPU (AMD EPYC, usually)
  • 128GB RAM
  • Nvidia A40/A6000 (48GB VRAM)
The above configuration has been found to be optimal for training a variety of GPT models from a 155m to a 6b parameter size on a single GPU. The above configuration is billed at $2.00/hr through CoreWeave's resource based pricing model.
There is an optional test Inference endpoint that can be enabled and deployed automatically when the model completes finetuning. This Inference container defaults to the following configuration:
  • 4 vCPU
  • 8GB RAM
  • Nvidia RTX A5000 (24GB VRAM)
This configuration is able to do 6b models comfortably, and is less expensive than the finetuner, as it requires less resources at $0.85/hr.
kubernetes-cloud/finetuner-workflow at master · coreweave/kubernetes-cloud
Check out the code on GitHub

The following Kubernetes-based components are required:
    • You can deploy Argo Workflows using the application Catalog. From the application deployment menu, click on the Catalog tab, then search for argo-workflows to find and deploy the application.
Argo Workflows
  • PVC
    • Create a ReadWriteMany PVC storage volume from the Storage menu.
    • 1TB to 2TB is the recommended size for the volume, as the model checkpoints take up a lot of space! These PVCs can be shared between multiple finetune runs. We recommend using HDD type storage, as the finetuner does not require high random I/O performance.
Configuring a PVC storage volume from the Cloud UI
Note It is easy to increase the size of a PVC as needed.
  • This workflow expects a default PVC name of finetune-data. This name can be changed once you are more comfortable with the workflow and configure it.
If you prefer, the PVC can also be deployed using the YAML snippet below, applied using kubectl apply -f:
apiVersion: v1
kind: PersistentVolumeClaim
name: finetune-data
storageClassName: shared-hdd-ord1
- ReadWriteMany
storage: 2000Gi

The following components are optional, but may make your interaction easier:

This application allows you to share out and access your PVC using an easy application that lets you upload and download files and folders. You can find and deploy the filebrowser over at the same application Catalog that you used to deploy Argo Workflows.
It is recommended that the name you give the filebrowser application be very short, or you will run into SSL CNAME issues. We recommend using the name finetune.
Simply select the finetune-data PVC that you created earlier. Make sure that you actually add your PVC to the filebrowser list of mounts!
Some people may prefer to use a Virtual Server to interact with their PVC via ssh or another mechanism. This flexibility is one of the key advantages of CoreWeave.
The filebrowser application

At this point, you should have a PVC set up that you can access via the filebrowser application or some other mechanism. For each dataset you want to use, you should create a directory and give it a meaningful name. However, the workflow read the finetune dataset from the dataset directory by default.
The data should be individual plaintext files in the precise format that you want the prompt and responses to come in.

Here we have a western-romance directory below with novels, in a clean and normalized plaintext format, with all extra whitespace removed.
western-romance dataset with text files for each novel.
The dataset will automatically be tokenized by a dataset_tokenizer component written in golang as a step in the Argo Workflow. It is quite fast, and has different options for how to partition the data.

To automatically create an InferenceService, the Argo Workflow job you submit needs special permissions. The below YAML snippet shows an example ServiceAccount with the corresponding required permissions.
Copy the below into a file titled inference-role.yaml:
apiVersion: v1
kind: ServiceAccount
name: inference
kind: Role
name: role:inference
- apiGroups:
- inferenceservices
- '*'
- apiGroups:
- services
- revisions
- '*'
kind: RoleBinding
name: rolebinding:inference-inference
kind: Role
name: role:inference
- kind: ServiceAccount
name: inference
Invoking kubectl apply -f inference-role.yaml will apply the permissions detailed above.

The example code is available on GitHub. It is recommended that you use git checkout to pull down the latest copy of the code.
This repository includes the following files:
  • finetune-workflow.yaml - The Argo Workflow itself.
  • inference-role.yaml - The role you set up earlier in this document.
  • finetune-pvc.yaml - A model storage volume, as described earlier in this document.
  • finetuner/Dockerfile - A Dockerfile that can be used to build your own finetuner image, should you modify the code.
  • finetuner/ - The simple reference example finetune training code.
  • finetuner/ds_config.json - The deepspeed configuration placed in the container. It is recommended that you not modify this.
  • finetuner/requirements.txt - The Python requirements and versions. You can create a venv, but this is mainly for the Dockerfile build.
For reference, a copy of the finetune-workflow.yaml is at the bottom of this document, but the GitHub repository has the authoritative version.
Assuming that you have pulled a copy of finetune-workflow.yaml, the Argo Workflows are invoked using the following:
Argo submit example
$ argo submit finetune-workflow.yaml \
-p run_name=example-gpt-j-6b \
-p dataset=dataset \
-p run_inference=true \
-p model=EleutherAI/gpt-j-6B \
--serviceaccount inference
The parameters included in the above are:
  • run_name - The only absolutely required parameter. It is strongly recommended that it be unique, as it is what is used to name the InferenceService. Consequently, the run_name must meet DNS standards.
  • dataset - The name of the dataset directory on the PVC.
  • run_inference - This parameter explicitly tells the Workflow that we want to run a test inference service when this is done. It is not intended to be a production service, but to provide an end-to-end demonstration, allowing you to test the finetuned model.
  • model - This example uses a Hugging Face model identifier to pull down gpt-j-6B. This will be cached on subsequent runs on your PVC, under cache.
  • --serviceaccount inference - Required for run_inference to work correctly.
Note There are easier ways to parameterize your jobs than the command line such as:
Once the job is submitted, you should see output that looks very much like the following:
Name: finetune-wtd2k
Namespace: tenant-goosewes-1
ServiceAccount: inference
Status: Pending
Created: Fri Apr 22 12:50:34 -0400 (now)
dataset: dataset
run_name: example-gpt-j-6b
run_inference: true
model: EleutherAI/gpt-j-6B
pvc: finetune-data
retokenize: false
boundary_token: \n
context: 2048
train_ratio: 0.9
batch_size: -1
force_fp16: false
batch_size_divisor: 1.0
random_seed: 42
learn_rate: 5e-5
epochs: 1
gradients: 5
zero_stage: 3
no_resume: false
logs: ./logs
project_id: huggingface
inference_only: false
region: ORD1
inference_image: coreweave/ml-images:pytorch-huggingface-81d5ce1

At this point, we can observe the job via several mechanisms, now that we have the Name of finetune-wtd2k:

Invoking argo watch finetune-wtd2k tells Argo that we want to watch the job as it goes through the stages of:
  • model-tokenization
  • model-finetune and
  • model-inference

Name: finetune-wtd2k
Namespace: tenant-goosewes-1
ServiceAccount: inference
Status: Running
PodRunning True
Created: Fri Apr 22 11:00:15 -0400 (1 hour ago)
Started: Fri Apr 22 11:00:15 -0400 (1 hour ago)
Duration: 1 hour 54 minutes
Progress: 1/2
ResourcesDuration: 1s*(1 cpu),1s*(100Mi memory)
dataset: dataset
run_name: cassandra-gpt-j-6b-fp16
run_inference: true
model: EleutherAI/gpt-j-6B
zero_stage: 3
gradients: 5
force_fp16: true
pvc: finetune-data
retokenize: false
boundary_token: \n
context: 2048
train_ratio: 0.9
batch_size: -1
batch_size_divisor: 1.0
random_seed: 42
learn_rate: 5e-5
epochs: 1
no_resume: false
logs: logs
project_id: huggingface
inference_only: false
region: ORD1
inference_image: coreweave/ml-images:pytorch-huggingface-81d5ce11
● finetune-ckl8r main
├───✔ tokenizer(0) model-tokenizer finetune-ckl8r-2169410118 7s
└───● finetuner model-finetuner finetune-ckl8r-3837635091 1h

Invoking argo logs -f finetune-wtd2k watches the logs in real time.
Important If it appears to hang on Loading the model, this is due to a bug in the terminal display code when it downloads and caches the model for the first time. You can simply kill the pod in question or the job, then resubmit it, and it will display progress correctly.

finetune-ckl8r-2169410118: 2022/04/22 15:00:21 Newest source `/finetune-data/dataset/Alastair Reynolds - [Revelation Space] Chasm City.txt` is older than `/finetune-data/dataset-EleutherAI_gpt_j_6B-2048.tokens`, not retokenizing. Use -retokenize to force retokenization.
finetune-ckl8r-3837635091: RUN_NAME: cassandra-gpt-j-6b-fp16
finetune-ckl8r-3837635091: HOST: finetune-ckl8r-3837635091
finetune-ckl8r-3837635091: CUDA: 11.3
finetune-ckl8r-3837635091: TORCH: 1.10.0a0+git302ee7b
finetune-ckl8r-3837635091: TRANSFORMERS: 4.17.0
finetune-ckl8r-3837635091: CPU: (maxrss: 297mb F: 811,255mb) GPU: (U: 19mb F: 51,033mb T: 51,052mb) TORCH: (R: 0mb/0mb, A: 0mb/0mb)
finetune-ckl8r-3837635091: DATASET: /finetune-data/dataset-EleutherAI_gpt_j_6B-2048.tokens
finetune-ckl8r-3837635091: DATASET SIZE: 194.27mb, 101,855,232 tokens, 49,734 contexts
finetune-ckl8r-3837635091: TRAIN_DATASET: 44,760 examples
finetune-ckl8r-3837635091: VALUE_DATASET: 4,974 examples
finetune-ckl8r-3837635091: LAST CHECKPOINT: None
finetune-ckl8r-3837635091: RANDOM SEED: 42
finetune-ckl8r-3837635091: FORCE FP16: True
finetune-ckl8r-3837635091: Loading EleutherAI/gpt-j-6B
finetune-ckl8r-3837635091: CPU: (maxrss: 48,240mb F: 761,345mb) GPU: (U: 13,117mb F: 37,935mb T: 51,052mb) TORCH: (R: 12,228mb/12,228mb, A: 12,219mb/12,219mb)
finetune-ckl8r-3837635091: CPU: (maxrss: 48,240mb F: 785,595mb) GPU: (U: 13,117mb F: 37,935mb T: 51,052mb) TORCH: (R: 12,228mb/12,228mb, A: 12,219mb/12,219mb)
finetune-ckl8r-3837635091: Setting batch size to 4
finetune-ckl8r-3837635091: [2022-04-22 15:03:12,856] [INFO] [] Initializing torch distributed with backend: nccl
finetune-ckl8r-3837635091: Using amp half precision backend
finetune-ckl8r-3837635091: [2022-04-22 15:03:12,863] [INFO] [] [Rank 0] DeepSpeed info: version=0.6.1, git-hash=unknown, git-branch=unknown
4% 99/2238 [1:48:23<35:43:37, 60.13s/it]
finetune-ckl8r-3837635091: CPU: (maxrss: 114,754mb F: 674,206mb) GPU: (U: 37,289mb F: 13,763mb T: 51,052mb) TORCH: (R: 36,033mb/36,033mb, A: 22,016mb/23,665mb)
4% 100/2238 [1:49:59<42:08:58, 70.97s/it]{'loss': 2.6446, 'learning_rate': 5e-05, 'epoch': 0.04}
5% 101/2238 [1:51:00<40:18:00, 67.89s/it]
finetune-ckl8r-3837635091: CPU: (maxrss: 114,754mb F: 674,205mb) GPU: (U: 37,289mb F: 13,763mb T: 51,052mb) TORCH: (R: 36,033mb/36,033mb, A: 22,016mb/23,665mb)
5% 103/2238 [1:53:00<37:54:38, 63.92s/it]
finetune-ckl8r-3837635091: CPU: (maxrss: 114,754mb F: 674,205mb) GPU: (U: 37,289mb F: 13,763mb T: 51,052mb) TORCH: (R: 36,033mb/36,033mb, A: 22,016mb/23,665mb)
5% 105/2238 [1:55:01<36:50:48, 62.19s/it]
finetune-ckl8r-3837635091: CPU: (maxrss: 114,754mb F: 674,205mb) GPU: (U: 37,289mb F: 13,763mb T: 51,052mb) TORCH: (R: 36,033mb/36,033mb, A: 22,016mb/23,665mb)
During finetuning, the time elapsed is displayed, alongside the expected time to complete. Checkpointing and loss reporting is also reported.
You can instantly watch a submitted workflow by using the --watch option when running the submit command: argo submit --watch

You can access your Argo Workflow application via HTTPS to see all the finetuner jobs, and to check their statuses.
Argo Workflows HTTPS request, via the Web UI

The following section outlines some useful workflow parameters. This is not intended to be a complete or exhaustive reference on all exposed parameters.
Default Value
The run name used to name artifacts and report metrics. Should be unique.
The only required option; no default
The PVC to use for dataset and model artifacts
The region to run the Argo jobs in. Generally, this should be ORD1.
The dataset folder relative to the pvc root.
The model to train on. It can be a relative path to the pvc root; if it can't be found, the finetuner will attempt to download the model from Huggingface.
Training context size in tokens. Affects the tokenization proces as well.
Sort the input text files provided to the tokenizer according to one of the following selected criteria: * size_ascending * size_descending * name_ascending * name_descending * random This is different from the trainer shuffling, which selects contexts randomly. If you use one of the above options, you will most likely want to disable this behavior by passing -no_shuffle to the trainer.
Not set
The number of times the finetuner should train on the dataset.
How quickly the model should learn the finetune data. Too high a learn rate can be counterproductive and replace the base model's training.
Strongly recommended. Use an API key from to report on finetuning metrics with nice charts.
Whether to run the example InferenceService on the finetuned model.
When false, do not run the tokenization or finetune. Intended to quickly run only the InferenceService on a previously trained model.

Once the model completes finetuning, the model artifacts should be found under a directory with a name patterned after{{pvc}}/{{run_name}}/final.
You can download the model at this point, or you can run the InferenceService on the model.
If you followed the directions for Inference Service, and have installed the KNative client, you should be able to get an URL by invoking kn service list.
Services can also be listed without the KNative Client by executing kubectl get ksvc:

inference-western-predictor-default inference-western-predictor-default-00007 2d21h 3 OK / 3 True
We can run curl to do a test query:
Note This assumes jq is installed.
curl \
-H 'Content-Type: application/json; charset=utf-8' \
--data-binary @- << EOF | jq .
{"parameters": {"min_length":150,
"instances": ["She danced with him in the honky-tonk"]}

The above command should yield a result similar to the following:
% Total % Received % Xferd Average Speed Time Time Time Current
Dload Upload Total Spent Left Speed
100 935 100 828 100 107 147 19 0:00:05 0:00:05 --:--:-- 188
"predictions": [
"generated_text": "She danced with him in the honky-tonk hall as if to say, \"You got me into this mess. Now I'll get you out of it. Let's split it and go our separate ways. Maybe I'll get lucky and make you my partner.\"\nHe grinned. \"You never know. Don't let anyone stop you. But if someone tries to arrest you, let them worry about that.\"\n\"I'll do that. Now, about that money?\"\n\"Money? What money?\"\n\"The loan they paid to your uncle to buy your brother out of that mine. I'm not sure why they did that.\"\nHe grinned. \"That's what I've been trying to figure out myself. They want more power over the land they're buying so they can put up cattle. But they're not taking long to figure out that I'm onto them, so keep this money safe until we figure out the best way to handle it. Don't try"
And there we have it - taking a model and dataset through the tokenization and finetuning process to do test inferences against the new model.
While this barely scratches the surface of finetuning, this is a good place to get started.
Copy link
On this page
Optional components
Dataset Setup
Permissions Setup
Getting and Running the Workflow
Observing the Argo Workflow
Argo commands
Workflow options
Artifacts and Inference