Skip to main content
SUNK can run custom scripts on Compute and Login nodes with s6 using s6-rc, a service manager for s6-based systems. This guide explains how to set up and run two types of scripts: longrun for continuous processes and oneshot for tasks that execute and terminate. This guide is for cluster administrators who need to automate custom scripts on Compute and Login nodes within a SUNK cluster. Whether you install packages or keep services running, this method provides a straightforward way to manage processes.

Define scripts in values.yaml

Define scripts in the appropriate sections of the Slurm chart’s values.yaml file. Define Compute node scripts in the compute.s6 section, and Login node scripts in the login.s6 section. Each script needs a name, type, and the script itself. The following example shows a script definition within a Compute node:
compute:
  s6:
    packages:
      type: oneshot
      script: |
        #!/usr/bin/env bash
        apt -y update
        apt -y install nginx
    nginx:
      type: longrun
      timeoutUp: 30000 # 30 seconds
      timeoutDown: 0
      script: |
        #!/usr/bin/env bash
        nginx -g "daemon off;"
The preceding example includes two scripts:
  • packages: This oneshot script installs nginx using the package manager.
  • nginx: A longrun script that starts the nginx process and keeps it running.
The nginx script is assigned a timeoutUp of 30000 milliseconds, which means it has up to 30 seconds to start successfully.

Define and schedule different script types

The following sections explain how to choose the right script type for your task and how to avoid scheduling conflicts that can occur when scripts extend node startup time.

Determine the appropriate script type

Decide whether the script is a longrun or a oneshot based on its purpose:
  • Use longrun for scripts that should run continuously, like a web server.
  • Use oneshot for scripts that run once to perform a setup task, like installing software.

Avoid scheduling conflicts

If your oneshot job installs many packages or performs tasks that otherwise extend startup time, you must account for this by modifying the value of the orphanedPodDelay parameter in the syncer configuration section of the Slurm values.yaml chart. The full path for this parameter is syncer.config.syncer.orphanedPodDelay. By default, the value of orphanedPodDelay is 120s, or 120 seconds. If the time required for a oneshot job to run exceeds the value set in orphanedPodDelay, increase the value to avoid scheduling conflicts.

Set timeouts for scripts

Timeouts prevent scripts from hanging indefinitely and help keep nodes healthy. For finer control, you can set timeouts for your scripts. The timeoutUp parameter sets the time allowed for the script to start, and timeoutDown sets the time allowed for it to stop. These parameters are optional and set in milliseconds. By default, they’re set to 0, which means the script doesn’t time out.
Without a configured timeout, the s6 script can become unresponsive indefinitely and cause the Slurm compute Pod to stay in a Not Ready state.
  • For oneshot scripts, only timeoutUp is relevant as it’s the maximum completion time for the script.
  • For longrun scripts, both timeoutUp and timeoutDown control how long the process has to start and stop.

Define behavior for failed scripts

When a user-defined script fails, the container can continue running silently, provide an error message, or stop. In SUNK, containers stop on script failure by default. You can control this behavior with the S6_BEHAVIOUR_IF_STAGE2_FAILS parameter in the appropriate env section of the values.yaml file.
  • For Login nodes, change the value of the parameter in the login.env section of the values.yaml file.
  • For Compute nodes, change the value of the parameter in the compute.nodes.[TARGET-NODE].env sections of the values.yaml file.
The S6_BEHAVIOUR_IF_STAGE2_FAILS parameter can contain the following values:
ValueBehavior
0If a script fails, the container continues to run silently, without providing an error message.
1If a script fails, the container continues to run, but provides an error message warning about the failure.
2If a script fails, the container stops running. This is the default setting on SUNK.
For more information about the values for S6_BEHAVIOUR_IF_STAGE2_FAILS, see the s6-overlay customization options.
Last modified on May 27, 2026