CoreWeave
Search…
Getting Started
Workflows on CoreWeave Cloud run on Argo Workflows, which is a great tool to orchestrate parallel execution of GPU and CPU jobs. With Workflows, you can manage retries and parallelism automatically. Workflows can also be submitted via CLI, Rest API and the Kubernetes API.
Argo Web UI

After logging into CoreWeave Cloud, go to the CoreWeave application Catalog.
The Catalog link on the Cloud UI
A new window will open onto the CoreWeave application Catalog, where you can browse all available applications. In the search field, type argo-workflows. Then, select the argo-workflows application once it appears.
In the upper right-hand corner of the next screen, select the latest version of the Helm chart under Chart Version, then click the Deploy button.
The following deployment form will prompt you to enter a name for the application. The remaining parameters will be set to CoreWeave's suggested defaults, but can be changed to suit your requirements. When ready to deploy, click the Deploy button at the bottom of the page.
Important
The server authentication mode does not require credentials and is strongly discouraged. We suggest using theclient mode for better security.
The Argo Workflows configuration screen
After a few minutes, the deployment will be ready. If you selected Expose UI via public Ingress, Argo Workflows will be accessible outside the cluster.
Click the ingress link to open the Argo Workflows UI in a new window.
Important
It can take up to five minutes for a TLS certificate to be issued. Prior to this, you may encounter a TLS certificate error, but once the certificate is issued, this error should disappear on its own.
To run a sample workflow:
  1. 1.
    Click +SUBMIT NEW WORKFLOW, then Edit using workflow options.
  2. 2.
    Click +CREATE. After a few minutes, once successful, the workflow indication color will change to green.

Note
The following assumes you have already obtained CoreWeave Cloud access credentials and set up your kubeconfig file.
The Argo CLI tool can be obtained from the Argo Project GitHub.
To run an example workflow, save an example workflow into the file gpu-say-workflow.yaml.
Example
apiVersion: argoproj.io/v1alpha1
kind: Workflow
metadata:
generateName: gpu-say
spec:
entrypoint: main
activeDeadlineSeconds: 300 # Cancel operation if not finished in 5 minutes
ttlStrategy:
secondsAfterCompletion: 86400 # Clean out old workflows after a day
# Parameters can be passed/overridden via the argo CLI.
# To override the printed message, run `argo submit` with the -p option:
# $ argo submit examples/arguments-parameters.yaml -p messages='["CoreWeave", "Is", "Fun"]'
arguments:
parameters:
- name: messages
value: '["Argo", "Is", "Awesome"]'
templates:
- name: main
steps:
- - name: echo
template: gpu-echo
arguments:
parameters:
- name: message
value: "{{item}}"
withParam: "{{workflow.parameters.messages}}"
- name: gpu-echo
inputs:
parameters:
- name: message
retryStrategy:
limit: 1
script:
image: nvidia/cuda:11.4.1-runtime-ubuntu20.04
command: [bash]
source: |
nvidia-smi
echo "Input was: {{inputs.parameters.message}}"
resources:
requests:
memory: 128Mi
cpu: 500m # Half a core
limits:
nvidia.com/gpu: 1 # Allocate one GPU
affinity:
nodeAffinity:
requiredDuringSchedulingIgnoredDuringExecution:
# This will REQUIRE the Pod to be run on a system with a GPU with 8 or 16GB VRAM
nodeSelectorTerms:
- matchExpressions:
- key: gpu.nvidia.com/vram
operator: In
values:
- "8"
- "16"
Submit the new workflow file (gpu-say-workflow.yaml ). According to the specifications above, this workflow takes a JSON array to spin up Pods with one GPU allocated for each, in parallel. The nvidia-smi output, as well as the parameter entry assigned for that Pod, is outputted to the log:
$ argo submit --watch gpu-say-workflow.yaml -p messages='["Argo", "Is", "Awesome"]'
Name: gpu-sayzfwxc
Namespace: tenant-test
ServiceAccount: default
Status: Running
Created: Mon Feb 10 19:31:17 -0500 (15 seconds ago)
Started: Mon Feb 10 19:31:17 -0500 (15 seconds ago)
Duration: 15 seconds
Parameters:
messages: ["Argo", "Is", "Awesome"]
STEP PODNAME DURATION MESSAGE
● gpu-sayzfwxc (main)
└-·-✔ echo(0:Argo)(0) (gpu-echo) gpu-sayzfwxc-391007373 10s
├-● echo(1:Is)(0) (gpu-echo) gpu-sayzfwxc-3501791705 15s
└-✔ echo(2:Awesome)(0) (gpu-echo) gpu-sayzfwxc-3986963301 12s
To print the log output from all parallel containers, use the logs command:
$ argo logs -w gpu-sayrbr6z
echo(0:Argo)(0): Tue Feb 11 00:25:30 2020
echo(0:Argo)(0): +-----------------------------------------------------------------------------+
echo(0:Argo)(0): | NVIDIA-SMI 440.44 Driver Version: 440.44 CUDA Version: 10.2 |
echo(0:Argo)(0): |-------------------------------+----------------------+----------------------+
echo(0:Argo)(0): | GPU Name Persistence-M| Bus-Id Disp.A | Volatile Uncorr. ECC |
echo(0:Argo)(0): | Fan Temp Perf Pwr:Usage/Cap| Memory-Usage | GPU-Util Compute M. |
echo(0:Argo)(0): |===============================+======================+======================|
echo(0:Argo)(0): | 0 NVIDIA Graphics... Off | 00000000:08:00.0 Off | N/A |
echo(0:Argo)(0): | 28% 51C P5 16W / 180W | 18MiB / 8119MiB | 0% Default |
echo(0:Argo)(0): +-------------------------------+----------------------+----------------------+
echo(0:Argo)(0):
echo(0:Argo)(0): +-----------------------------------------------------------------------------+
echo(0:Argo)(0): | Processes: GPU Memory |
echo(0:Argo)(0): | GPU PID Type Process name Usage |
echo(0:Argo)(0): |=============================================================================|
echo(0:Argo)(0): +-----------------------------------------------------------------------------+
echo(0:Argo)(0): Input was: Argo
echo(1:Is)(0): Tue Feb 11 00:25:30 2020
echo(1:Is)(0): +-----------------------------------------------------------------------------+
echo(1:Is)(0): | NVIDIA-SMI 440.44 Driver Version: 440.44 CUDA Version: 10.2 |
echo(1:Is)(0): |-------------------------------+----------------------+----------------------+
...

Argo requires a Service Account token for authentication. The following steps are best practices based on Argo's access token creation.
Warning
server auth mode is strongly discouraged, as it opens up Argo Workflows to public access and is therefore a security risk. ​

First, we create a role with minimal permissions:
$ kubectl create role argo-role --verb=list,update --resource=workflows.argoproj.io
​ To get full list of resources, issue the api-resources command:
$ kubectl api-resources --api-group=argoproj.io --namespaced=true -o wide
with the following parameters:
Name
Shortnames
Kind
Verbs
cronworkflows
cwf,cronwf
CronWorkflow
[delete deletecollection get list patch create update watch]
workfloweventbindings
wfeb
WorkflowEventBinding
[delete deletecollection get list patch create update watch]
workflows
wf
Workflow
[delete deletecollection get list patch create update watch]
workflowtaskresults
WorkflowTaskResult
[delete deletecollection get list patch create update watch]
workflowtasksets
wfts
WorkflowTaskSet
[delete deletecollection get list patch create update watch]
workflowtemplates
wftmpl
WorkflowTemplate
[delete deletecollection get list patch create update watch]
Using the command line to create a tailored role with the many resources and verbs can be inefficient. Alternatively, you can use a YAML manifest instead:
apiVersion: rbac.authorization.k8s.io/v1
kind: Role
metadata:
name: argo-role
rules:
- apiGroups:
- argoproj.io
resources:
- workflows
verbs:
- list
- update
Then, apply the manifest using kubectl apply:
$ kubectl apply -f <manifest_roles.yaml>

To create a unique Service Account, use kubectl create sa.
For example, here we create a Service Account with the name argo-sa:
$ kubectl create sa argo-sa
Then, create a rolebinding for the new Service Account, bound to the argo-role created:
$ kubectl create rolebinding argo-role-binding --role=argo-role --serviceaccount=<namespace>:argo-sa
​ ​where namespace is the namespace in which Argo is running.

Generate the token to be used with the Service Account:
​$ export SECRET=$(kubectl get sa argo-sa -o=jsonpath='{.secrets[0].name}') export ARGO_TOKEN="Bearer $(kubectl get secret $SECRET -o=jsonpath='{.data.token}' | base64 --decode)" echo $ARGO_TOKEN
Then, inside the box for client authentication, copy and paste the newly generated token into the Argo UI:
The Argo Workflow UI with a Bearer token pasted into the client authentication box
Finally, to log in, click the Login button after adding the token.

We recommend the following retry strategy on your workflow / steps.
retryStrategy:
limit: 2
retryPolicy: Always
backoff:
duration: "15s"
factor: 2
affinity:
nodeAntiAffinity: {}
We also recommend setting an activeDeadlineSeconds on each step, but not on the entire workflow. This allows a step to retry but prevents it from taking unreasonably long time to finish.
Copy link
On this page
Getting Started
Installing Argo CLI
Security
Tailored permissions
Service account ​
Generate the token
Recommendations