> ## Documentation Index
> Fetch the complete documentation index at: https://docs.coreweave.com/llms.txt
> Use this file to discover all available pages before exploring further.

# Manage resource binding with task plugins

> Optimize performance by binding tasks to common resources within a node

Using task plugins, you can set Slurm parameters to bind a task to specific subsets of resources on a node, like CPU cores or GPUs. You can optimize the performance of a task in Slurm by selecting resources within a common Non-Uniform Memory Access (NUMA) node. Resources can access memory within their own NUMA node much more quickly than memory in a separate NUMA node, which can reduce data transfer latency and improve job performance.

This guide explains how to manage resource binding in SUNK with task plugins. It's intended for cluster administrators who configure Slurm and for users who submit jobs that benefit from CPU or GPU affinity. The following sections describe how to enable resource binding, configure the task cgroup plugin, and bind tasks to GPUs and CPU cores.

## Enable resource binding

To enable resource binding, modify the `TaskPlugin` variable in the Slurm configuration section of the SUNK Helm chart.

In the `slurmConfig` section of the Slurm `values.yaml` file, set the `TaskPlugin` variable to `task/affinity,task/cgroup`:

```yaml theme={"system"}
slurmConfig:
  slurmCtld:
    TaskPlugin: "task/affinity,task/cgroup"
```

This enables the `task/affinity` and `task/cgroup` plugins, which work together to optimize resource allocations in the SUNK cluster. The `task/affinity` plugin controls how processes bind to CPU resources on a Compute node. The `task/cgroup` plugin uses the cgroup filesystem and its controllers to enforce the resource limits and binding policies specified by Slurm.

## Configure the task cgroup plugin

With resource binding enabled, the next step is to configure how Slurm enforces those bindings. SUNK supports Linux cgroups through the `cgroup.conf` value, [`slurmConfig.cgroupConfig`](/products/sunk/reference/slurm-parameters), which uses kernel cgroups to enforce CPU, GPU, and memory constraints on each task.

To use Linux cgroups in SUNK, do the following:

1. Add the `task/cgroup` value to the `TaskPlugin` variable, as shown in the [Enable resource binding](#enable-resource-binding) section.
2. In the `slurmConfig` section of the Slurm `values.yaml`, set the `procTrackType` variable to `proctrack/cgroup`. If you don't set this parameter correctly, Slurm doesn't apply your Linux cgroups settings.

This enables cgroups with the following settings:

```yaml theme={"system"}
slurmConfig.cgroupConfig: |
    CgroupPlugin=autodetect
    IgnoreSystemd=yes
    ConstrainCores=yes
    ConstrainDevices=yes
    ConstrainRAMSpace=yes
```

The `Constrain` settings enforce binding and limits for different resources, as follows:

* `ConstrainCores=yes` enforces CPU binding.
* `ConstrainDevices=yes` enforces limits on GPU devices.
* `ConstrainRAMSpace=yes` enforces memory limits.

## Bind tasks to GPUs

Once the task plugins and cgroup configuration are in place, you can control how Slurm assigns individual jobs to GPUs. Use the `--gpu-bind` parameter in your job script's `#SBATCH` directives to manage how Slurm assigns tasks to GPUs:

```bash theme={"system"}
#SBATCH --gpu-bind=single:1,verbose
```

To print information about which GPU resources each task binds to, add the `verbose` option to your other binding options, separated by a comma. This can be helpful when debugging or checking your binding strategy.

<Tip>
  **Performance optimization**

  To optimize performance, use the `--gpu-bind=single:1` option when starting your Slurm job. This ensures that Slurm assigns each task on a node a single GPU, and that the CPU cores and GPU are on the same NUMA node. Matching CPU and GPU NUMA affinities is important for performance, so use the appropriate parameters when launching tasks with Slurm. If you don't use the `--gpu-bind` parameter, Slurm could assign your task a GPU with a different NUMA affinity than the assigned CPU cores, which could lead to suboptimal performance.
</Tip>

The `--gpu-bind` parameter supports multiple options, including:

| Option           | Meaning                                                                                                                                                                                                                                                                |
| ---------------- | ---------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------- |
| `none`           | No GPU binding. Slurm doesn't enforce any specific binding between tasks and GPUs. This option may be suitable if your application handles GPU selection internally.                                                                                                   |
| `single:<count>` | Ensures that a task receives the number of GPUs specified with `<count>`, and attempts to place the task on the same NUMA node as the CPU cores allocated to that task.                                                                                                |
| `closest`        | Attempts to bind a task to the GPUs "closest" to the CPU cores the task runs on, based on the system topology. This setting may assign multiple GPUs to a task if they share the same NUMA node, regardless of what you've specified in the `--gpus-per-task` setting. |

For a complete list of available `--gpu-bind` options, see [SchedMD's Slurm documentation](https://slurm.schedmd.com/srun.html#OPT_gpu-bind).

## Bind tasks to CPU cores

In addition to GPU binding, you can pin tasks to specific CPU cores to improve cache locality and reduce contention. Use the `--cpu-bind` parameter in your job script's `#SBATCH` directives to control which CPU cores your tasks bind to. For example:

```bash theme={"system"}
#SBATCH --cpu-bind=map_cpu:1,verbose
```

To print information about which CPU resources each task binds to, add the `verbose` option to your other binding options, separated by a comma. This can be helpful when debugging or checking your binding strategy.

The `--cpu-bind` parameter supports multiple options, including:

| Option            | Meaning                                                                                                                                                                                        |
| ----------------- | ---------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------- |
| `none` or `no`    | No CPU binding. The operating system schedules tasks on any available CPU resources. This can lead to suboptimal resource usage, and isn't generally recommended when performance is critical. |
| `map_cpu:<list>`  | Lets you provide a comma-separated list of CPU IDs to specify the exact CPUs on which to run your task.                                                                                        |
| `mask_cpu:<list>` | Functions similarly to `map_cpu`, but uses a hexadecimal CPU mask for even more granular control.                                                                                              |

For a full list of available `--cpu-bind` options, see [SchedMD's Slurm documentation](https://slurm.schedmd.com/srun.html#OPT_cpu-bind).
