Skip to main content

Documentation Index

Fetch the complete documentation index at: https://docs.coreweave.com/llms.txt

Use this file to discover all available pages before exploring further.

Running jobs and management tasks in the Slurm cluster requires connecting to the Slurm login node. You can access the login node through SSH or kubectl exec, depending on your directory service configuration. Connecting through SSH requires a directory service pre-configured for SSH access, while kubectl exec does not. For information about initial setup of Slurm login nodes, see Configure Slurm individual login nodes.

Connect through SSH

Accessing the login node through SSH requires a directory service with users configured for SSH access.
First, use the kubectl get svc slurm-login command to identify the login service’s IP address or DNS record. The EXTERNAL-IP field in the command output contains the relevant IP address. In the following example, the target IP address is 203.0.113.100:
Obtain the External IP address
kubectl get svc slurm-login
You should see output similar to the following:
NAME          TYPE           CLUSTER-IP       EXTERNAL-IP      PORT(S)   AGE
slurm-login   LoadBalancer   192.0.2.100      203.0.113.100    22/TCP    2d21h
Then, use SSH to log in with either the IP address or the DNS record created for the node.
Log in with SSH
ssh example-user@203.0.113.100
You should see output similar to the following:
Welcome to Ubuntu 22.04.1 LTS (GNU/Linux 5.13.0-40-generic x86_64)

example-user@slurm-login-0:~$
You are now logged into the Slurm login node and can run Slurm commands.
SSH is the preferred method of access for Slurm login nodes. However, we do not recommend directly accessing Slurm compute nodes through SSH to run tasks. Bypassing Slurm can interfere with currently running jobs and may cause nodes to drain unintentionally, leading to temporary loss of resources. SSH to Slurm compute nodes should only be used for debugging existing jobs on the nodes.

Connect through port forwarding

If there is no public IP address allocated for the node, first port-forward the service with the kubectl port-forward command, then log in through SSH using the port-forwarded address. Each login pod has an associated headless service, allowing users to refer to the pod by name without specifying a Fully Qualified Domain Name (FQDN). To access an individual login pod with port-forwarding, use the kubectl port-forward and ssh commands, as demonstrated below:
Log in with port-forwarding
kubectl port-forward svc/slurm-login-slurmuser1 10022:22
ssh example-user@localhost -p 10022
The port-forwarding command in this example, kubectl port-forward svc/slurm-login-slurmuser1 10022:22, works as follows:
  • The kubectl port-forward command creates a port-forward.
  • svc/ specifies that the targeted resource is a Service.
  • slurm-login-slurmuser1 is the exact name of the targeted Kubernetes Service. Replace this value with the name used within your namespace.
  • 10022:22 defines the port mapping. In this case, it forwards traffic from local port 10022 to port 22 on the target Service.
The SSH command, ssh example-user@localhost -p 10022, then connects to the local port 10022. Due to the port-forwarding performed in the prior command, this traffic is sent to port 22 of the specified Kubernetes services. You are now logged into the Slurm login node and can run Slurm commands.

Run Slurm commands

After logging in, you will have access to all normal Slurm operations to submit jobs or manage the cluster. SchedMD provides extensive documentation for Slurm commands and some handy printable cheat-sheets To verify that the cluster is working, run a simple job. For example, discover the hostname on 6 nodes, as shown below:
root@slurm-login-0:~# srun -N 6 hostname
slurm-rtx4000-3
slurm-rtx4000-1
slurm-rtx4000-0
slurm-cpu-epyc-0
slurm-cpu-epyc-1
slurm-rtx4000-2
If you run into any errors such as “Invalid partition name specified”, or “Invalid account or account/partition combination specified” it’s likely you have not been added as a Slurm user. To do so, follow the steps below:
Add yourself as a Slurm user
sudo su
sacctmgr create user -i account=root adminlevel=admin name=YOUR_USERNAME
exit
If your Slurm cluster uses accounts other than root, run the command above for each account you need to be added to.

Troubleshooting

For troubleshooting purposes in cases where SSH is not possible, kubectl exec can be used to access the Slurm login node as root. This method is useful for debugging and maintenance tasks.
Access the Slurm login node with kubectl exec
kubectl exec -it slurm-login-0 -c sshd -- bash
root@slurm-login-0:/tmp#
Last modified on April 20, 2026