Run custom scripts with s6
Run user-defined scripts on SUNK
SUNK can run custom scripts on Compute and Login nodes with s6 using s6-rc, a service manager for s6-based systems. This guide explains how to set up and run two types of scripts: longrun for continuous processes and oneshot for tasks that execute and terminate.
With this setup, you can automate the execution of custom scripts on Compute nodes within the SUNK cluster. Whether you're installing necessary packages or keeping essential services running, this method provides a straightforward solution to manage processes efficiently.
Define scripts in values.yaml
Define scripts in the appropriate sections of the Slurm chart's values.yaml file. Define Compute node scripts in the compute.s6 section, and Login node scripts in the login.s6 section. Each script needs a name, type, and the script itself. The following example shows a script definition within a Compute node:
compute:s6:packages:type: oneshotscript: |#!/usr/bin/env bashapt -y updateapt -y install nginxnginx:type: longruntimeoutUp: 30000 # 30 secondstimeoutDown: 0script: |#!/usr/bin/env bashnginx -g "daemon off;"
The above example includes two scripts:
packages: Thisoneshotscript installsnginxusing the package manager.nginx: Alongrunscript that starts thenginxprocess and keeps it running.
The nginx script is assigned a timeoutUp of 30000 milliseconds, which means it has up to 30 seconds to start successfully.
Define and schedule different script types
Determine the appropriate script type
Decide whether the script is a longrun or a oneshot based on its purpose:
- Use
longrunfor scripts that should run continuously, like a web server. - Use
oneshotfor scripts that run once to perform a setup task, like installing software.
Avoid scheduling conflicts
If your oneshot job installs numerous packages or performs tasks that otherwise extend startup time, you must account for this by modifying the value of the orphanedPodDelay parameter in the syncer configuration section of the Slurm values.yaml chart.
The full path for this parameter is syncer.config.syncer.orphanedPodDelay. By default, the value of orphanedPodDelay is 120s, or 120 seconds.
If the time required for a oneshot job to run exceeds the value set in orphanedPodDelay, increase the value to avoid scheduling conflicts.
Set timeouts for scripts
For finer control, you can set timeouts for your scripts. The timeoutUp parameter sets the time allowed for the script to start, and the timeoutDown parameter is for it to stop. These parameters are optional and set in milliseconds. By default, they are set to 0, which indicates that the script will not time out.
- For
oneshotscripts, onlytimeoutUpis relevant as it's the maximum completion time for the script. - For
longrunscripts, bothtimeoutUpandtimeoutDowncontrol how long the process has to start and stop.
Define behavior for failed scripts
When a user-defined script fails, the container can continue running silently, provide an error message, or stop completely. In SUNK, the default setting has containers stop upon script failure.
You can control this behavior with the S6_BEHAVIOUR_IF_STAGE2_FAILS parameter in the appropriate env section of the values.yaml file.
- For Login nodes, change the value of the parameter in the
login.envsection of thevalues.yamlfile. - For Compute nodes, change the value of the parameter in the
compute.nodes.<target node>.envsections of thevalues.yamlfile.
The S6_BEHAVIOUR_IF_STAGE2_FAILS parameter can contain the following values:
| Value | Behavior |
|---|---|
0 | If a script fails, the container will continue to run silently, without providing an error message. |
1 | If a script fails, the container will continue to run, but will provide an error message warning about the failure. |
2 | If a script fails, the container will stop running completely. This is the default setting on SUNK. |
For more information on the values for S6_BEHAVIOUR_IF_STAGE2_FAILS, see the s6-overlay customization options.