Skip to main content
This page explains how to use the cache-dropper sidecar to flush the Linux page cache between exclusive Slurm jobs on a SUNK compute node. It’s intended for cluster administrators and job authors who need to mitigate memory fragmentation in CPU-intensive training workloads. Between exclusive jobs on a node, dropping the page cache can improve performance and lower memory access times caused by memory fragmentation. Memory fragmentation can lead to Out of Memory (OOM) errors and slowdowns in CPU-intensive training jobs. SUNK includes a sidecar container, cache-dropper, to handle page cache flushes. This container runs in privileged mode, which lets it drop the cache without requiring the main slurmd container to run as privileged. The sidecar checks for the presence of a specific trigger file and drops the page cache if that file appears. The cache-dropper sidecar writes to the drop_caches sysctl file to free both page cache and reclaimable slab objects. Dropping the cache is a non-destructive operation, but it can incur additional CPU and I/O overhead as dropped objects are recreated.

Enable the cache-dropper sidecar in the Slurm chart

To enable the cache-dropper sidecar, set .compute.cacheDropper.enabled to true in the Slurm values.yaml file. When enabled, cache-dropper is present in every compute pod. It periodically checks for the existence of the /run/enroot/drop_caches file as a signal to proceed. If this file exists, it triggers the cache drop operation.

Drop the page cache with a Slurm job

After the sidecar is enabled, individual Slurm jobs trigger a cache drop by creating the trigger file the sidecar watches for. To use cache-dropper, add the following touch command to a Slurm job script:
touch /run/enroot/drop_caches
The cache drops each time the job runs this command. With the sidecar enabled and the trigger in place, jobs that include this command release the page cache before subsequent work runs on the node.
Last modified on May 27, 2026