Skip to main content
Slurm job states track the lifecycle of a specific workload initiated by a user in the cluster. This page explains how to monitor those job states so that you can confirm a workload is progressing as expected, diagnose why a job is waiting, or review the resource usage of a job that’s already finished. Slurm job states are distinct from Slurm node states, which describe the availability and health of the hardware used to support the Slurm cluster. CoreWeave Grafana includes dashboards for tracking metrics related to job performance and resource usage within your SUNK cluster. Slurm also provides commands for tracking job states and steps. The following sections describe when to use each one:
CommandData typePurpose
squeueReal-timeOverview of jobs in queue or currently running
scontrolReal-timeDetailed, live metadata about a specific job
sacctHistoricAccounting and history; resource usage of completed jobs
Run all Slurm commands, including squeue, scontrol, and sacct, from within the Slurm login pod shell.

View a summary of current jobs with squeue

Use the squeue command to view real-time status information about Slurm jobs and job steps. squeue provides an overview of jobs currently running or in queue. For more detailed real-time information about a particular job, use scontrol. Run the squeue command by itself to display the status of every active job in the system:
squeue
To filter for specific information, add flags to the squeue command. The following examples show common filters. To display details about a specific job, use the -j flag:
squeue -j [JOB-ID]
To view information about jobs in a specific state, use squeue with the -t flag. Enter Slurm job state codes as a comma-separated list.
squeue -t [STATE]
To view jobs started by a specific user, use the -u flag:
squeue -u [USERNAME]
By default, squeue may omit job steps to keep the output concise. To display job steps with squeue, use the -s flag:
squeue -s

Determine the reason a job is PENDING

Running a Slurm job is a request for resource allocation, not an instantaneous execution. Until the requested resources are available, the job remains in a PENDING state. This is expected behavior and is part of the Slurm job lifecycle. The squeue command includes a REASON column that provides insight into why a job is in a given state. In a healthy Slurm queue, these reasons are common:
Listed reasonMeaning
ResourcesThe cluster doesn’t currently have the resources available to execute the job.
PriorityOther jobs with a higher priority are preempting this job. When the higher-priority jobs are complete, this job runs.
DependencyOccurs when the --dependency flag is in use, and the job being waited on hasn’t yet reached the required state.
JobArraySizeLimitYou have reached the maximum number of simultaneously running tasks allowed for a single job array.
Running squeue sends a remote procedure call to slurmctld. To avoid overloading the Slurm Controller and impacting performance, limit squeue calls to the minimum necessary.

View detailed information about a specific Slurm job with scontrol

The scontrol command provides a more detailed view about a particular job, compared to the summary offered by squeue. Like squeue, scontrol displays live, real-time data, and you can configure it to display job steps. To view detailed information about a specific job, use scontrol show job with the relevant [JOB-ID]:
scontrol show job [JOB-ID]
To view granular details about a specific job step, use scontrol show step with the relevant [JOB-ID] and [STEP-ID]:
scontrol show step [JOB-ID].[STEP-ID]

View information about completed Slurm jobs with sacct

To view information about completed jobs, including job steps and resource usage, use the sacct command. Unlike squeue and scontrol, sacct displays data about jobs that aren’t currently running or in queue. The output of sacct displays job steps by default. For a comprehensive view of job steps for a given job, use the sacct command with the -j flag:
sacct -j [JOB-ID]
Add the --showsteps flag to the preceding command to explicitly list individual steps in the output:
sacct -j [JOB-ID] --showsteps
Use the --format flag with sacct to format the output in a more readable manner:
sacct --format=JobID,JobName,State,ExitCode

Slurm job state codes

Use this reference to interpret the state codes that appear in the output of the preceding commands. Slurm job states are designated by codes, which appear in the STATE or ST columns of the squeue or sacct command output.
Job stateShorthandMeaning
PENDINGPDThe job is waiting for resource allocation. Use squeue to view the reason the job is in this state.
RUNNINGRThe job has an allocation of nodes and is currently executing job steps.
CONFIGURINGCFResources have been allocated, but the nodes are still being configured.
COMPLETINGCGThe job is finishing. Cleanup scripts are executing and nodes are returning to the nodepool.
COMPLETEDCDThe job finished successfully with an exit code of 0.
FAILEDFThe job terminated with a failure condition, or non-zero exit code.
TIMEOUTTOThe job terminated after reaching its requested time limit.
NODE_FAILNFThe job terminated after one or more of the nodes running it crashed or became unresponsive.
PREEMPTEDPRThe job was removed from its allocated nodes to make room for a higher-priority job.
Last modified on May 27, 2026