Using task plugins, you can set Slurm parameters to bind a task to specific subsets of resources on a node, like CPU cores or GPUs. You can optimize the performance of a task in Slurm by selecting resources within a common Non-Uniform Memory Access (NUMA) node. Resources can access memory within their own NUMA node much more quickly than memory in a separate NUMA node, which can reduce data transfer latency and improve job performance.
This guide explains how to manage resource binding in SUNK with task plugins. It’s intended for cluster administrators who configure Slurm and for users who submit jobs that benefit from CPU or GPU affinity. The following sections describe how to enable resource binding, configure the task cgroup plugin, and bind tasks to GPUs and CPU cores.
Enable resource binding
To enable resource binding, modify the TaskPlugin variable in the Slurm configuration section of the SUNK Helm chart.
In the slurmConfig section of the Slurm values.yaml file, set the TaskPlugin variable to task/affinity,task/cgroup:
slurmConfig:
slurmCtld:
TaskPlugin: "task/affinity,task/cgroup"
This enables the task/affinity and task/cgroup plugins, which work together to optimize resource allocations in the SUNK cluster. The task/affinity plugin controls how processes bind to CPU resources on a Compute node. The task/cgroup plugin uses the cgroup filesystem and its controllers to enforce the resource limits and binding policies specified by Slurm.
With resource binding enabled, the next step is to configure how Slurm enforces those bindings. SUNK supports Linux cgroups through the cgroup.conf value, slurmConfig.cgroupConfig, which uses kernel cgroups to enforce CPU, GPU, and memory constraints on each task.
To use Linux cgroups in SUNK, do the following:
- Add the
task/cgroup value to the TaskPlugin variable, as shown in the Enable resource binding section.
- In the
slurmConfig section of the Slurm values.yaml, set the procTrackType variable to proctrack/cgroup. If you don’t set this parameter correctly, Slurm doesn’t apply your Linux cgroups settings.
This enables cgroups with the following settings:
slurmConfig.cgroupConfig: |
CgroupPlugin=autodetect
IgnoreSystemd=yes
ConstrainCores=yes
ConstrainDevices=yes
ConstrainRAMSpace=yes
The Constrain settings enforce binding and limits for different resources, as follows:
ConstrainCores=yes enforces CPU binding.
ConstrainDevices=yes enforces limits on GPU devices.
ConstrainRAMSpace=yes enforces memory limits.
Bind tasks to GPUs
Once the task plugins and cgroup configuration are in place, you can control how Slurm assigns individual jobs to GPUs. Use the --gpu-bind parameter in your job script’s #SBATCH directives to manage how Slurm assigns tasks to GPUs:
#SBATCH --gpu-bind=single:1,verbose
To print information about which GPU resources each task binds to, add the verbose option to your other binding options, separated by a comma. This can be helpful when debugging or checking your binding strategy.
Performance optimizationTo optimize performance, use the --gpu-bind=single:1 option when starting your Slurm job. This ensures that Slurm assigns each task on a node a single GPU, and that the CPU cores and GPU are on the same NUMA node. Matching CPU and GPU NUMA affinities is important for performance, so use the appropriate parameters when launching tasks with Slurm. If you don’t use the --gpu-bind parameter, Slurm could assign your task a GPU with a different NUMA affinity than the assigned CPU cores, which could lead to suboptimal performance.
The --gpu-bind parameter supports multiple options, including:
| Option | Meaning |
|---|
none | No GPU binding. Slurm doesn’t enforce any specific binding between tasks and GPUs. This option may be suitable if your application handles GPU selection internally. |
single:<count> | Ensures that a task receives the number of GPUs specified with <count>, and attempts to place the task on the same NUMA node as the CPU cores allocated to that task. |
closest | Attempts to bind a task to the GPUs “closest” to the CPU cores the task runs on, based on the system topology. This setting may assign multiple GPUs to a task if they share the same NUMA node, regardless of what you’ve specified in the --gpus-per-task setting. |
For a complete list of available --gpu-bind options, see SchedMD’s Slurm documentation.
Bind tasks to CPU cores
In addition to GPU binding, you can pin tasks to specific CPU cores to improve cache locality and reduce contention. Use the --cpu-bind parameter in your job script’s #SBATCH directives to control which CPU cores your tasks bind to. For example:
#SBATCH --cpu-bind=map_cpu:1,verbose
To print information about which CPU resources each task binds to, add the verbose option to your other binding options, separated by a comma. This can be helpful when debugging or checking your binding strategy.
The --cpu-bind parameter supports multiple options, including:
| Option | Meaning |
|---|
none or no | No CPU binding. The operating system schedules tasks on any available CPU resources. This can lead to suboptimal resource usage, and isn’t generally recommended when performance is critical. |
map_cpu:<list> | Lets you provide a comma-separated list of CPU IDs to specify the exact CPUs on which to run your task. |
mask_cpu:<list> | Functions similarly to map_cpu, but uses a hexadecimal CPU mask for even more granular control. |
For a full list of available --cpu-bind options, see SchedMD’s Slurm documentation.