Skip to main content

Use sidecar containers with SUNK

Add sidecar containers to Slurm compute and login node pods

Slurm compute and login node pods can be extended with sidecar containers. This section illustrates how this can be done with a few concrete examples.

How to add a sidecar

You can add sidecars to both login and compute node pods with the containers value of the Helm chart for Slurm. The Helm chart also contains additional fields, such as volume and dnsConfig, to allow for further customization of the sidecars if required. The exact location of these configuration changes differs between login and compute pods, as detailed in the examples below.

Add a sidecar to a login pod

For login pods, add sidecars under login.containers.

The login pod configuration offers the following options:

login:
enabled: true
...
containers: [] # Add sidecar containers here
volumes: [] # Define volumes needed for the sidecars
... # Additional fields for configuring sidecars may be available

Add a sidecar to a compute pod

To add sidecars to a compute pod, you must apply the configuration at the node level.

Add sidecars under compute.nodes.<nodeType>.containers, where <nodeType> represents a custom name assigned to a specific compute type.

compute:
nodes:
simple-cpu:
enabled: true
replicas: 2 # Adjust to desired amount or scale manually after deploy
...
containers: [] # Add sidecar containers here
volumes: [] # Define volumes needed for the sidecars
dnsPolicy: ...
dnsConfig: ...
... # Other fields may exist to help with sidecar configurations

Additional fields are available for configuration. See the following examples of how sidecars can be introduced.

Sidecar example: Knot Resolver

Knot Resolver (KR) is a full caching DNS resolver implementation, and can be used as either a proxy to CoreDNS or an ad-hoc replacement. This example shows how to use it as a proxy in a sidecar, which can be used to improve DNS performance in jobs that involve web scraping. The process involves adding a sidecar container and a corresponding ConfigMap, and then deploying both to the Kubernetes namespace.

  1. To add the KR sidecar container, adjust the containers, volumes, dnsPolicy, and dnsConfig fields as shown below:

    compute:
    ...
    nodes:
    ...
    containers:
    - name: kresd
    image: cznic/knot-resolver:v5.5.3
    command: ["/usr/sbin/kresd", "-c", "/opt/kresd/kresd.conf", "-n"]
    resources:
    limits:
    memory: 64Gi
    requests:
    cpu: 1
    memory: 1Gi
    volumeMounts:
    - name: knot-resolver-conf
    mountPath: /opt/kresd
    readOnly: true
    - name: knot-cache
    mountPath: /var/cache/knot-resolver
    volumes:
    - name: knot-resolver-conf
    configMap:
    name: slurm-knot-resolver-conf
    - name: knot-cache
    emptyDir:
    medium: Memory
    dnsPolicy: "None"
    dnsConfig:
    nameservers:
    - 127.0.0.1 # kresd runs on this address on the node
  2. Create the corresponding ConfigMap YAML file knot-resolver-configmap.yaml with the following contents:

    kind: ConfigMap
    metadata:
    name: slurm-knot-resolver-conf
    apiVersion: v1
    data:
    kresd.conf: |
    -- Network interface configuration
    net.listen('127.0.0.1', 53, { kind = 'dns' })
    net.listen('127.0.0.1', 853, { kind = 'tls' })
    net.listen('127.0.0.1', 443, { kind = 'doh2' })
    net.listen(net.lo, 8053, { kind = 'webmgmt' })
    modules = {
    'http',
    }
    -- Refer to manual for optimal cache size
    cache.size = 8 * GB
    internalDomains = policy.todnames({'cluster.local'}) # define additional internal networks here
    policy.add(policy.suffix(policy.FLAGS({'NO_CACHE'}), internalDomains)) # let CoreDNS deal with the internal cluster
    policy.add(policy.suffix(policy.STUB({'10.96.0.10'}), internalDomains)) # forward internal traffic to K8s CoreDNS (the default set address is 10.96.0.10)
    policy.add(policy.all(policy.FORWARD({'1.1.1.1', '4.4.4.4', '8.8.8.8'})))

    Further examples of kresd.conf configuration can be found in the Knot-Resolver Github repository.

  3. Deploy the ConfigMap into the slurm namespace before deploying changes for the sidecars, as in the following example:

    $
    kubectl apply -f knot-resolver-configmap.yaml -n slurm

    If a ConfigMap with the same name already exists in the specified Kubernetes namespace, the apply command will update the existing file with the new configuration. If there is no such file, apply will create one.

    With the -f flag, we specify the file name to create or update. In this example, the file to be created or update is knot-resolver-configmap.yaml.

    The -n flag specifies the Kubernetes namespace to create or update the file.

Testing the Knot-Resolver sidecar

To test if the DNS server is up and running, run the dig command within a Slurm compute node.

First, open a shell in a worker node in the slurmd container. In this example, the worker node is slurm-cpu-epyc-000-002.

To update the local package list and install the dnsutils package, which includes the dig command, run the following command:

root@slurm-cpu-epyc-000-002:/#
apt update && apt install -y dnsutils

To check if the DNS resolver is functioning correctly, use the dig command:

root@slurm-cpu-epyc-000-002:/#
dig @127.0.0.1 example.com

If kresd is working, the result will resemble the following:

Example dig response
; <<>> DiG 9.16.50-Debian <<>> @127.0.0.1 example.com
; (1 server found)
;; global options: +cmd
;; Got answer:
;; ->>HEADER<<- opcode: QUERY, status: NOERROR, id: 28153
;; flags: qr rd ra ad; QUERY: 1, ANSWER: 1, AUTHORITY: 0, ADDITIONAL: 1
;; OPT PSEUDOSECTION:
; EDNS: version: 0, flags:; udp: 1232
;; QUESTION SECTION:
;example.com. IN A
;; ANSWER SECTION:
example.com. 942 IN A 93.184.215.14
;; Query time: 2019 msec
;; SERVER: 127.0.0.1#53(127.0.0.1)
;; WHEN: Thu Oct 17 09:23:40 UTC 2024
;; MSG SIZE rcvd: 56

To check that Slurm communications are still working between the controller and compute node, run scontrol ping:

root@slurm-cpu-epyc-000-002:/#
scontrol ping
Slurmctld(primary) at slurm-controller is UP

Sidecar example: Tailscale

Tailscale is a VPN service that makes your devices and applications accessible anywhere in the world. In this example, we attach our Slurm login node to our Tailscale network via a userspace-sidecar, allowing us to connect to it via our Tailscale network.

Tip

This example is also available in Tailscale's documentation. Cloning the Tailscale GitHub repository will provide most of the necessary YAML manifest for this example.

  1. Log in to Tailscale's admin console to create a reusable, ephemeral auth key for the machine. You will use this auth key to authenticate the login node against the Tailscale network.

  2. Create a secret for the TS_AUTHKEY via a YAML manifest, e.g. ts-secret.yaml:

    apiVersion: v1
    kind: Secret
    metadata:
    name: tailscale-auth
    stringData:
    TS_AUTHKEY: tskey-0123456789abcdef
  3. Add the secret to the slurm namespace with the following command:

    $
    kubectl -f ts-secret.yaml -n slurm
  4. Configure the RBAC surrounding the secret to allow the sidecar to obtain the secret. This requires you to edit 3 files: rolebinding.yaml, role.yaml, and sa.yml. If you have cloned the Tailscale repository, you can find these files under tailscale/docs/k8s. Adjust the values in the files to match the configuration shown below:

    sa.yaml:

    sa.yaml
    apiVersion: v1
    kind: ServiceAccount
    metadata:
    name: tailscale

    role.yaml:

    role.yaml
    apiVersion: rbac.authorization.k8s.io/v1
    kind: Role
    metadata:
    name: tailscale
    rules:
    - apiGroups: [""] # "" indicates the core API group
    resources: ["secrets"]
    # Create can not be restricted to a resource name.
    verbs: ["create"]
    - apiGroups: [""] # "" indicates the core API group
    resourceNames: ["tailscale-auth"]
    resources: ["secrets"]
    verbs: ["get", "update", "patch"]

    rolebinding.yaml:

    rolebinding.yaml
    apiVersion: rbac.authorization.k8s.io/v1
    kind: RoleBinding
    metadata:
    name: tailscale
    subjects:
    - kind: ServiceAccount
    name: tailscale
    roleRef:
    kind: Role
    name: tailscale
    apiGroup: rbac.authorization.k8s.io
  5. Deploy these manifests into the slurm namespace by running the following command:

    $
    make rbac | kubectl apply -f- -n slurm
  6. Edit the login node's containers field to include the Tailscale sidecar. Set the serviceAccountName to tailscale, and automountServiceAccountToken to true.

    The following example shows these edits in the charts/slurm/values.yaml manifest:

    charts/slurm/values.yaml
    login:
    enabled: true
    ...
    serviceAccountName: tailscale
    automountServiceAccountToken: true
    containers:
    - name: nginx # for testing
    image: nginx
    - name: ts-sidecar
    imagePullPolicy: Always
    image: "ghcr.io/tailscale/tailscale:latest"
    securityContext:
    runAsUser: 1000
    runAsGroup: 1000
    env:
    # Store the state in a k8s secret
    - name: TS_KUBE_SECRET
    value: tailscale-auth
    - name: TS_USERSPACE
    value: "true"
    - name: TS_AUTHKEY
    valueFrom:
    secretKeyRef:
    name: tailscale-auth
    key: TS_AUTHKEY
    optional: true
  7. Deploy the sidecar.

Testing the Tailscale sidecar

We can check from a machine connected to the Tailscale network that the Slurm login node is now connected and present:

my_machine@abcdef:~/$
tailscale status
100.103.123.42 my_machine Myself123@ linux -
100.95.67.54 slurm-login-0 Myself123@ linux idle, tx 1540 rx 2908

Since we've added nginx as a machine to test with, we can also run the following from another machine connected to the same Tailscale network:

Test with curl
# !prompt my_machine@abcdef:~/$
curl http://slurm-login-0
<!DOCTYPE html>
<html>
<head>
<title>Welcome to nginx!</title>
<style>
html { color-scheme: light dark; }
body { width: 35em; margin: 0 auto;
font-family: Tahoma, Verdana, Arial, sans-serif; }
</style>
</head>
<body>
<h1>Welcome to nginx!</h1>
<p>If you see this page, the nginx web server is successfully installed and
working. Further configuration is required.</p>
<p>For online documentation and support please refer to
<a href="http://nginx.org/">nginx.org</a>.<br/>
Commercial support is available at
<a href="http://nginx.com/">nginx.com</a>.</p>
<p><em>Thank you for using nginx.</em></p>
</body>
</html>

If you have ssh configured, you can test that with the following command:

Example
my_machine@abcdef:~/$ ssh user1@slurm-login-0 -t exec bash -l
user1@slurm-login-0:~$