Skip to main content

Run custom scripts with s6

Run user-defined scripts on SUNK

SUNK can run custom scripts on Compute and Login nodes with s6 using s6-rc, a service manager for s6-based systems. This guide explains how to set up and run two types of scripts: longrun for continuous processes and oneshot for tasks that execute and terminate.

With this setup, you can automate the execution of custom scripts on Compute nodes within the SUNK cluster. Whether you're installing necessary packages or keeping essential services running, this method provides a straightforward solution to manage processes efficiently.

Define scripts in values.yaml

Define scripts in the appropriate sections of the Slurm chart's values.yaml file. Define Compute node scripts in the compute.s6 section, and Login node scripts in the login.s6 section. Each script needs a name, type, and the script itself. The following example shows a script definition within a Compute node:

Example
compute:
s6:
packages:
type: oneshot
script: |
#!/usr/bin/env bash
apt -y update
apt -y install nginx
nginx:
type: longrun
timeoutUp: 30000 # 30 seconds
timeoutDown: 0
script: |
#!/usr/bin/env bash
nginx -g "daemon off;"

The above example includes two scripts:

  • packages: This oneshot script installs nginx using the package manager.
  • nginx: A longrun script that starts the nginx process and keeps it running.

The nginx script is assigned a timeoutUp of 30000 milliseconds, which means it has up to 30 seconds to start successfully.

Define and schedule different script types

Determine the appropriate script type

Decide whether the script is a longrun or a oneshot based on its purpose:

  • Use longrun for scripts that should run continuously, like a web server.
  • Use oneshot for scripts that run once to perform a setup task, like installing software.

Avoid scheduling conflicts

If your oneshot job installs numerous packages or performs tasks that otherwise extend startup time, you must account for this by modifying the value of the orphanedPodDelay parameter in the syncer configuration section of the Slurm values.yaml chart.

The full path for this parameter is syncer.config.syncer.orphanedPodDelay. By default, the value of orphanedPodDelay is 120s, or 120 seconds.

If the time required for a oneshot job to run exceeds the value set in orphanedPodDelay, increase the value to avoid scheduling conflicts.

Set timeouts for scripts

For finer control, you can set timeouts for your scripts. The timeoutUp parameter sets the time allowed for the script to start, and the timeoutDown parameter is for it to stop. These parameters are optional and set in milliseconds. By default, they are set to 0, which indicates that the script will not time out.

  • For oneshot scripts, only timeoutUp is relevant as it's the maximum completion time for the script.
  • For longrun scripts, both timeoutUp and timeoutDown control how long the process has to start and stop.

Define behavior for failed scripts

When a user-defined script fails, the container can continue running silently, provide an error message, or stop completely. In SUNK, the default setting has containers stop upon script failure.

You can control this behavior with the S6_BEHAVIOUR_IF_STAGE2_FAILS parameter in the appropriate env section of the values.yaml file.

  • For Login nodes, change the value of the parameter in the login.env section of the values.yaml file.
  • For Compute nodes, change the value of the parameter in the compute.nodes.<target node>.env sections of the values.yaml file.

The S6_BEHAVIOUR_IF_STAGE2_FAILS parameter can contain the following values:

ValueBehavior
0If a script fails, the container will continue to run silently, without providing an error message.
1If a script fails, the container will continue to run, but will provide an error message warning about the failure.
2If a script fails, the container will stop running completely. This is the default setting on SUNK.

For more information on the values for S6_BEHAVIOUR_IF_STAGE2_FAILS, see the s6-overlay customization options.