Use the Argo Workflows CLI
Use the power of the command line to control Argo Workflows
The Argo Command Line Interface (CLI) is a powerful tool used to create, submit, manage, and monitor workflows directly from the command line. The CLI also allows creating reusable templates, making it easier to define and share common workflow patterns. Many of CoreWeave's example projects use Argo CLI as the primary way to demonstrate different techniques.
Use Argo CLI
To submit a job with the CLI, first verify the CoreWeave Kubernetes environment is set up.
Next, download the latest Argo CLI from the GitHub releases page and follow their Quick Start installation instructions.
To test the installation, run:
$argo version
The result should be similar to:
argo: v3.4.7BuildDate: 2023-04-11T17:19:48ZGitCommit: f2292647c5a6be2f888447a1fef71445cc05b8fdGitTreeState: cleanGitTag: v3.4.7GoVersion: go1.19.7Compiler: gcPlatform: linux/amd64
Next, create a working folder and a file named workflow.yaml
.
$mkdir example$cd example$nano workflow.yaml
Expand the section below, and copy/paste the contents into the file.
Click to expand - workflow.yaml
apiVersion: argoproj.io/v1alpha1kind: Workflowmetadata:generateName: gpu-sayspec:entrypoint: mainactiveDeadlineSeconds: 300 # Cancel operation if not finished in 5 minutesttlStrategy:secondsAfterCompletion: 86400 # Clean out old workflows after a day# Parameters can be passed/overridden via the argo CLI.# To override the printed message, run `argo submit` with the -p option:# $ argo submit examples/arguments-parameters.yaml -p messages='["CoreWeave", "Is", "Fun"]'arguments:parameters:- name: messagesvalue: '["Argo", "Is", "Awesome"]'- name: foovalue: "bar"templates:- name: mainsteps:- - name: echotemplate: gpu-echoarguments:parameters:- name: messagevalue: "{{item}}"withParam: "{{workflow.parameters.messages}}"- name: gpu-echoinputs:parameters:- name: messageretryStrategy:limit: 1script:image: nvidia/cuda:11.4.1-runtime-ubuntu20.04command: [bash]source: |nvidia-smiecho "Input was: {{inputs.parameters.message}}"resources:requests:memory: 128Micpu: 500m # Half a corelimits:nvidia.com/gpu: 1 # Allocate one GPUaffinity:nodeAffinity:requiredDuringSchedulingIgnoredDuringExecution:# This will REQUIRE the Pod to be run on a system with a GPU with 8 or 16GB VRAMnodeSelectorTerms:- matchExpressions:- key: gpu.nvidia.com/vramoperator: Invalues:- "8"- "16"
Then, submit the workflow file.
$argo submit workflow.yaml
Here's the output.
Name: gpu-say64d9mNamespace: tenant-96362f-devServiceAccount: unset (will run with the default ServiceAccount)Status: PendingCreated: Thu May 04 17:38:04 -0400 (now)Progress:Parameters:messages: ["Argo", "Is", "Awesome"]foo: bar
This workflow uses the JSON array as parameters to spin up three Pods, with one GPU allocated for each, in parallel.
There are two parameters in this file:
messages
, which is an array of stringsfoo
, which has the valuebar
How to override parameters
One technique used often in CoreWeave examples is setting default values in the workflow YAML, and then overriding a few of them on the command line or with a parameters file.
Override parameters on the command line
Override workflow parameters on the command line with the -p
option:
$argo submit workflow.yaml \-p messages='["CoreWeave", "Is", "Fun"]' \-p foo='Something Else'
In this case, the output shows the overridden parameters:
Name: gpu-sayxgsrvNamespace: tenant-96362f-devServiceAccount: unset (will run with the default ServiceAccount)Status: PendingCreated: Thu May 04 17:40:30 -0400 (now)Progress:Parameters:messages: ["CoreWeave", "Is", "Fun"]foo: Something Else
Override parameters with a YAML file
Another technique is using a YAML file to override parameters. To do that, create params.yaml
in the example folder and paste this into it.
messages: ["Use", "Anything", "Here"]foo: "Another new string"
Then, use the --parameter-file
option to pass params.yaml
to the workflow.
$argo submit workflow.yaml --parameter-file params.yaml
The output shows the new parameters:
Name: gpu-saygppcqNamespace: tenant-96362f-devServiceAccount: unset (will run with the default ServiceAccount)Status: PendingCreated: Thu May 04 17:43:57 -0400 (now)Progress:Parameters:foo: Another new stringmessages: ["Use","Anything","Here"]
List workflows
After submitting a job, use the argo
command to interact with it.
To find a job's name, use the list
command.
$argo list
The output shows the name, gpu-sayn5p6w
in this case.
NAME STATUS AGE DURATION PRIORITY MESSAGEgpu-sayn5p6w Succeeded 41m 37s 0
View Argo logs
To see the logs for a workflow job, use the logs
command with workflow name we fetched earlier. It looks like this:
$argo logs gpu-sayn5p6w
Expand the log output below to see the result:
Click to expand - Argo log output
gpu-sayn5p6w-gpu-echo-1273146457: Wed May 3 21:30:56 2023gpu-sayn5p6w-gpu-echo-1273146457: +-----------------------------------------------------------------------------+gpu-sayn5p6w-gpu-echo-1273146457: | NVIDIA-SMI 510.60.02 Driver Version: 510.60.02 CUDA Version: 11.6 |gpu-sayn5p6w-gpu-echo-1273146457: |-------------------------------+----------------------+----------------------+gpu-sayn5p6w-gpu-echo-1273146457: | GPU Name Persistence-M| Bus-Id Disp.A | Volatile Uncorr. ECC |gpu-sayn5p6w-gpu-echo-1273146457: | Fan Temp Perf Pwr:Usage/Cap| Memory-Usage | GPU-Util Compute M. |gpu-sayn5p6w-gpu-echo-1273146457: | | | MIG M. |gpu-sayn5p6w-gpu-echo-1273146457: |===============================+======================+======================|gpu-sayn5p6w-gpu-echo-1273146457: | 0 NVIDIA RTX A4000 On | 00000000:02:00.0 Off | Off |gpu-sayn5p6w-gpu-echo-1273146457: | 46% 33C P8 17W / 140W | 65MiB / 16376MiB | 0% Default |gpu-sayn5p6w-gpu-echo-1273146457: | | | N/A |gpu-sayn5p6w-gpu-echo-1273146457: +-------------------------------+----------------------+----------------------+gpu-sayn5p6w-gpu-echo-1273146457:gpu-sayn5p6w-gpu-echo-1273146457: +-----------------------------------------------------------------------------+gpu-sayn5p6w-gpu-echo-1273146457: | Processes: |gpu-sayn5p6w-gpu-echo-1273146457: | GPU GI CI PID Type Process name GPU Memory |gpu-sayn5p6w-gpu-echo-1273146457: | ID ID Usage |gpu-sayn5p6w-gpu-echo-1273146457: |=============================================================================|gpu-sayn5p6w-gpu-echo-1273146457: | 0 N/A N/A 2181 G 63MiB |gpu-sayn5p6w-gpu-echo-1273146457: +-----------------------------------------------------------------------------+gpu-sayn5p6w-gpu-echo-1273146457: Input was: Isgpu-sayn5p6w-gpu-echo-1273146457: time="2023-05-03T21:30:57.804Z" level=info msg="sub-process exited" argo=true error="<nil>"gpu-sayn5p6w-gpu-echo-3614462693: Wed May 3 21:31:16 2023gpu-sayn5p6w-gpu-echo-3614462693: +-----------------------------------------------------------------------------+gpu-sayn5p6w-gpu-echo-3614462693: | NVIDIA-SMI 510.60.02 Driver Version: 510.60.02 CUDA Version: 11.6 |gpu-sayn5p6w-gpu-echo-3614462693: |-------------------------------+----------------------+----------------------+gpu-sayn5p6w-gpu-echo-3614462693: | GPU Name Persistence-M| Bus-Id Disp.A | Volatile Uncorr. ECC |gpu-sayn5p6w-gpu-echo-3614462693: | Fan Temp Perf Pwr:Usage/Cap| Memory-Usage | GPU-Util Compute M. |gpu-sayn5p6w-gpu-echo-3614462693: | | | MIG M. |gpu-sayn5p6w-gpu-echo-3614462693: |===============================+======================+======================|gpu-sayn5p6w-gpu-echo-3614462693: | 0 NVIDIA RTX A4000 On | 00000000:01:00.0 Off | Off |gpu-sayn5p6w-gpu-echo-3614462693: | 47% 33C P8 15W / 140W | 65MiB / 16376MiB | 0% Default |gpu-sayn5p6w-gpu-echo-3614462693: | | | N/A |gpu-sayn5p6w-gpu-echo-3614462693: +-------------------------------+----------------------+----------------------+gpu-sayn5p6w-gpu-echo-3614462693:gpu-sayn5p6w-gpu-echo-3614462693: +-----------------------------------------------------------------------------+gpu-sayn5p6w-gpu-echo-3614462693: | Processes: |gpu-sayn5p6w-gpu-echo-3614462693: | GPU GI CI PID Type Process name GPU Memory |gpu-sayn5p6w-gpu-echo-3614462693: | ID ID Usage |gpu-sayn5p6w-gpu-echo-3614462693: |=============================================================================|gpu-sayn5p6w-gpu-echo-3614462693: | 0 N/A N/A 2290 G 63MiB |gpu-sayn5p6w-gpu-echo-3614462693: +-----------------------------------------------------------------------------+gpu-sayn5p6w-gpu-echo-3614462693: Input was: Awesomegpu-sayn5p6w-gpu-echo-3614462693: time="2023-05-03T21:31:17.624Z" level=info msg="sub-process exited" argo=true error="<nil>"gpu-sayn5p6w-gpu-echo-2418828045: Wed May 3 21:31:22 2023gpu-sayn5p6w-gpu-echo-2418828045: +-----------------------------------------------------------------------------+gpu-sayn5p6w-gpu-echo-2418828045: | NVIDIA-SMI 510.60.02 Driver Version: 510.60.02 CUDA Version: 11.6 |gpu-sayn5p6w-gpu-echo-2418828045: |-------------------------------+----------------------+----------------------+gpu-sayn5p6w-gpu-echo-2418828045: | GPU Name Persistence-M| Bus-Id Disp.A | Volatile Uncorr. ECC |gpu-sayn5p6w-gpu-echo-2418828045: | Fan Temp Perf Pwr:Usage/Cap| Memory-Usage | GPU-Util Compute M. |gpu-sayn5p6w-gpu-echo-2418828045: | | | MIG M. |gpu-sayn5p6w-gpu-echo-2418828045: |===============================+======================+======================|gpu-sayn5p6w-gpu-echo-2418828045: | 0 NVIDIA RTX A4000 On | 00000000:C1:00.0 Off | Off |gpu-sayn5p6w-gpu-echo-2418828045: | 42% 31C P8 13W / 140W | 94MiB / 16376MiB | 0% Default |gpu-sayn5p6w-gpu-echo-2418828045: | | | N/A |gpu-sayn5p6w-gpu-echo-2418828045: +-------------------------------+----------------------+----------------------+gpu-sayn5p6w-gpu-echo-2418828045:gpu-sayn5p6w-gpu-echo-2418828045: +-----------------------------------------------------------------------------+gpu-sayn5p6w-gpu-echo-2418828045: | Processes: |gpu-sayn5p6w-gpu-echo-2418828045: | GPU GI CI PID Type Process name GPU Memory |gpu-sayn5p6w-gpu-echo-2418828045: | ID ID Usage |gpu-sayn5p6w-gpu-echo-2418828045: |=============================================================================|gpu-sayn5p6w-gpu-echo-2418828045: | 0 N/A N/A 2218 G 92MiB |gpu-sayn5p6w-gpu-echo-2418828045: +-----------------------------------------------------------------------------+gpu-sayn5p6w-gpu-echo-2418828045: Input was: Argogpu-sayn5p6w-gpu-echo-2418828045: time="2023-05-03T21:31:23.231Z" level=info msg="sub-process exited" argo=true error="<nil>"
Watch the submission
Use the --watch
option to observe the workflow in progress.
$argo submit workflow.yaml --watch
If the --watch
option wasn't used when submitting the workflow, the watch
command is used with the workflow name to observe it.
$argo watch gpu-sayn5p6w
In either case, the output looks the same.
Name: gpu-sayn5p6wNamespace: tenant-d0b59f-dfinstServiceAccount: unset (will run with the default ServiceAccount)Status: RunningConditions:PodRunning FalseCreated: Wed May 03 17:30:50 -0400 (26 seconds ago)Started: Wed May 03 17:30:50 -0400 (26 seconds ago)Duration: 26 secondsProgress: 1/3ResourcesDuration: 3s*(100Mi memory),1s*(1 nvidia.com/gpu),3s*(1 cpu)Parameters:messages: ["Argo", "Is", "Awesome"]STEP TEMPLATE PODNAME DURATION MESSAGE● gpu-sayn5p6w main└─┬─◷ echo(0:Argo)(0) gpu-echo gpu-sayn5p6w-gpu-echo-2418828045 26s PodInitializing├─✔ echo(1:Is)(0) gpu-echo gpu-sayn5p6w-gpu-echo-1273146457 6s└─◷ echo(2:Awesome)(0) gpu-echo gpu-sayn5p6w-gpu-echo-3614462693 25s PodInitializing
Use a specific ServiceAccount
In the previous example, notice that the third line says:
ServiceAccount: unset (will run with the default ServiceAccount)
If no specific Kubernetes ServiceAccount was declared, the workflow uses the default one for the namespace. To use a specific ServiceAccount, create one and then use the --serviceaccount
option by following the steps that follow. The security best practices guide has more details.
First, define a ServiceAccount and assign it some defined Roles. Then, define a RoleBinding to bind them together.
This can be done in a single YAML file. Create my-example-sa.yaml
and paste the contents below.
Click to expand - my-example-sa.yaml
apiVersion: v1kind: ServiceAccountmetadata:name: my-example---apiVersion: rbac.authorization.k8s.io/v1kind: Rolemetadata:name: my-example-rolerules:- apiGroups:- ""resources:- podsverbs:- 'patch'- apiGroups:- serving.kubeflow.orgresources:- inferenceservicesverbs:- '*'- apiGroups:- serving.knative.devresources:- services- revisionsverbs:- '*'---apiVersion: rbac.authorization.k8s.io/v1kind: RoleBindingmetadata:name: my-example-rolebindingroleRef:apiGroup: rbac.authorization.k8s.iokind: Rolename: my-example-rolesubjects:- kind: ServiceAccountname: my-example
There are three sections in this file, separated with ---
.
- First, it creates a new ServiceAccount named
my-example
. - Next, it creates a Role named
my-example-role
with several permissions. - Finally, it creates a RoleBinding named
my-example-rolebinding
which binds the ServiceAccount to the Role.
Create the ServiceAccount
Apply the YAML to the cluster with kubectl
.
$kubectl apply -f my-example-sa.yaml
The output looks like this:
serviceaccount/my-example createdrole.rbac.authorization.k8s.io/my-example-role createdrolebinding.rbac.authorization.k8s.io/my-example-rolebinding created
Use the ServiceAccount
Now, use the new ServiceAccount with Argo, like this:
$argo submit workflow.yaml --serviceaccount my-example
The output looks like this:
Name: gpu-say72dm5Namespace: tenant-96362f-devServiceAccount: my-exampleStatus: PendingCreated: Thu May 04 18:24:58 -0400 (now)Progress:Parameters:messages: ["Argo", "Is", "Awesome"]foo: bar
Notice that the third line now shows the new ServiceAccount name. Using a specific ServiceAccount with limited permissions is a security best practice.
More information
These are only a few of the most common Argo CLI commands used in CoreWeave's examples and demonstrations. For a full list, please see the Argo documentation or see these Argo Workflows resources: