Use sidecar containers with SUNK
Add sidecar containers to Slurm compute and login node pods
Slurm compute and login node pods can be extended with sidecar containers. This section illustrates how this can be done with a few concrete examples.
How to add a sidecar
You can add sidecars to both login and compute node pods with the containers
value of the Helm chart for Slurm. The Helm chart also contains additional fields, such as volume
and dnsConfig
, to allow for further customization of the sidecars if required. The exact location of these configuration changes differs between login and compute pods, as detailed in the examples below.
Add a sidecar to a login pod
For login pods, add sidecars under login.containers
.
The login pod configuration offers the following options:
login:enabled: true...containers: [] # Add sidecar containers herevolumes: [] # Define volumes needed for the sidecars... # Additional fields for configuring sidecars may be available
Add a sidecar to a compute pod
To add sidecars to a compute pod, you must apply the configuration at the node level.
Add sidecars under compute.nodes.<nodeType>.containers
, where <nodeType>
represents a custom name assigned to a specific compute type.
compute:nodes:simple-cpu:enabled: truereplicas: 2 # Adjust to desired amount or scale manually after deploy...containers: [] # Add sidecar containers herevolumes: [] # Define volumes needed for the sidecarsdnsPolicy: ...dnsConfig: ...... # Other fields may exist to help with sidecar configurations
Additional fields are available for configuration. See the following examples of how sidecars can be introduced.
Sidecar example: Knot Resolver
Knot Resolver (KR) is a full caching DNS resolver implementation, and can be used as either a proxy to CoreDNS or an ad-hoc replacement. This example shows how to use it as a proxy in a sidecar, which can be used to improve DNS performance in jobs that involve web scraping. The process involves adding a sidecar container and a corresponding ConfigMap, and then deploying both to the Kubernetes namespace.
-
To add the KR sidecar container, adjust the
containers
,volumes
,dnsPolicy
, anddnsConfig
fields as shown below:compute:...nodes:...containers:- name: kresdimage: cznic/knot-resolver:v5.5.3command: ["/usr/sbin/kresd", "-c", "/opt/kresd/kresd.conf", "-n"]resources:limits:memory: 64Girequests:cpu: 1memory: 1GivolumeMounts:- name: knot-resolver-confmountPath: /opt/kresdreadOnly: true- name: knot-cachemountPath: /var/cache/knot-resolvervolumes:- name: knot-resolver-confconfigMap:name: slurm-knot-resolver-conf- name: knot-cacheemptyDir:medium: MemorydnsPolicy: "None"dnsConfig:nameservers:- 127.0.0.1 # kresd runs on this address on the node -
Create the corresponding ConfigMap YAML file
knot-resolver-configmap.yaml
with the following contents:kind: ConfigMapmetadata:name: slurm-knot-resolver-confapiVersion: v1data:kresd.conf: |-- Network interface configurationnet.listen('127.0.0.1', 53, { kind = 'dns' })net.listen('127.0.0.1', 853, { kind = 'tls' })net.listen('127.0.0.1', 443, { kind = 'doh2' })net.listen(net.lo, 8053, { kind = 'webmgmt' })modules = {'http',}-- Refer to manual for optimal cache sizecache.size = 8 * GBinternalDomains = policy.todnames({'cluster.local'}) # define additional internal networks herepolicy.add(policy.suffix(policy.FLAGS({'NO_CACHE'}), internalDomains)) # let CoreDNS deal with the internal clusterpolicy.add(policy.suffix(policy.STUB({'10.96.0.10'}), internalDomains)) # forward internal traffic to K8s CoreDNS (the default set address is 10.96.0.10)policy.add(policy.all(policy.FORWARD({'1.1.1.1', '4.4.4.4', '8.8.8.8'})))Further examples of
kresd.conf
configuration can be found in the Knot-Resolver Github repository. -
Deploy the ConfigMap into the
slurm
namespace before deploying changes for the sidecars, as in the following example:$kubectl apply -f knot-resolver-configmap.yaml -n slurmIf a ConfigMap with the same name already exists in the specified Kubernetes namespace, the
apply
command will update the existing file with the new configuration. If there is no such file,apply
will create one.With the
-f
flag, we specify the file name to create or update. In this example, the file to be created or update isknot-resolver-configmap.yaml.
The
-n
flag specifies the Kubernetes namespace to create or update the file.
Testing the Knot-Resolver sidecar
To test if the DNS server is up and running, run the dig
command within a Slurm compute node.
First, open a shell in a worker node in the slurmd
container. In this example, the worker node is slurm-cpu-epyc-000-002
.
To update the local package list and install the dnsutils
package, which includes the dig
command, run the following command:
root@slurm-cpu-epyc-000-002:/#apt update && apt install -y dnsutils
To check if the DNS resolver is functioning correctly, use the dig
command:
root@slurm-cpu-epyc-000-002:/#dig @127.0.0.1 example.com
If kresd
is working, the result will resemble the following:
; <<>> DiG 9.16.50-Debian <<>> @127.0.0.1 example.com; (1 server found);; global options: +cmd;; Got answer:;; ->>HEADER<<- opcode: QUERY, status: NOERROR, id: 28153;; flags: qr rd ra ad; QUERY: 1, ANSWER: 1, AUTHORITY: 0, ADDITIONAL: 1;; OPT PSEUDOSECTION:; EDNS: version: 0, flags:; udp: 1232;; QUESTION SECTION:;example.com. IN A;; ANSWER SECTION:example.com. 942 IN A 93.184.215.14;; Query time: 2019 msec;; SERVER: 127.0.0.1#53(127.0.0.1);; WHEN: Thu Oct 17 09:23:40 UTC 2024;; MSG SIZE rcvd: 56
To check that Slurm communications are still working between the controller and compute node, run scontrol ping
:
root@slurm-cpu-epyc-000-002:/#scontrol pingSlurmctld(primary) at slurm-controller is UP
Sidecar example: Tailscale
Tailscale is a VPN service that makes your devices and applications accessible anywhere in the world. In this example, we attach our Slurm login node to our Tailscale network via a userspace-sidecar, allowing us to connect to it via our Tailscale network.
This example is also available in Tailscale's documentation. Cloning the Tailscale GitHub repository will provide most of the necessary YAML manifest for this example.
-
Log in to Tailscale's admin console to create a reusable, ephemeral auth key for the machine. You will use this auth key to authenticate the login node against the Tailscale network.
-
Create a secret for the
TS_AUTHKEY
via a YAML manifest, e.g.ts-secret.yaml
:apiVersion: v1kind: Secretmetadata:name: tailscale-authstringData:TS_AUTHKEY: tskey-0123456789abcdef -
Add the secret to the
slurm
namespace with the following command:$kubectl -f ts-secret.yaml -n slurm -
Configure the RBAC surrounding the secret to allow the sidecar to obtain the secret. This requires you to edit 3 files:
rolebinding.yaml
,role.yaml
, andsa.yml
. If you have cloned the Tailscale repository, you can find these files undertailscale/docs/k8s
. Adjust the values in the files to match the configuration shown below:sa.yaml
:sa.yamlapiVersion: v1kind: ServiceAccountmetadata:name: tailscalerole.yaml
:role.yamlapiVersion: rbac.authorization.k8s.io/v1kind: Rolemetadata:name: tailscalerules:- apiGroups: [""] # "" indicates the core API groupresources: ["secrets"]# Create can not be restricted to a resource name.verbs: ["create"]- apiGroups: [""] # "" indicates the core API groupresourceNames: ["tailscale-auth"]resources: ["secrets"]verbs: ["get", "update", "patch"]rolebinding.yaml
:rolebinding.yamlapiVersion: rbac.authorization.k8s.io/v1kind: RoleBindingmetadata:name: tailscalesubjects:- kind: ServiceAccountname: tailscaleroleRef:kind: Rolename: tailscaleapiGroup: rbac.authorization.k8s.io -
Deploy these manifests into the
slurm
namespace by running the following command:$make rbac | kubectl apply -f- -n slurm -
Edit the login node's
containers
field to include the Tailscale sidecar. Set theserviceAccountName
totailscale
, andautomountServiceAccountToken
totrue
.The following example shows these edits in the
charts/slurm/values.yaml
manifest:charts/slurm/values.yamllogin:enabled: true...serviceAccountName: tailscaleautomountServiceAccountToken: truecontainers:- name: nginx # for testingimage: nginx- name: ts-sidecarimagePullPolicy: Alwaysimage: "ghcr.io/tailscale/tailscale:latest"securityContext:runAsUser: 1000runAsGroup: 1000env:# Store the state in a k8s secret- name: TS_KUBE_SECRETvalue: tailscale-auth- name: TS_USERSPACEvalue: "true"- name: TS_AUTHKEYvalueFrom:secretKeyRef:name: tailscale-authkey: TS_AUTHKEYoptional: true -
Deploy the sidecar.
Testing the Tailscale sidecar
We can check from a machine connected to the Tailscale network that the Slurm login node is now connected and present:
my_machine@abcdef:~/$tailscale status100.103.123.42 my_machine Myself123@ linux -100.95.67.54 slurm-login-0 Myself123@ linux idle, tx 1540 rx 2908
Since we've added nginx as a machine to test with, we can also run the following from another machine connected to the same Tailscale network:
# !prompt my_machine@abcdef:~/$curl http://slurm-login-0<!DOCTYPE html><html><head><title>Welcome to nginx!</title><style>html { color-scheme: light dark; }body { width: 35em; margin: 0 auto;font-family: Tahoma, Verdana, Arial, sans-serif; }</style></head><body><h1>Welcome to nginx!</h1><p>If you see this page, the nginx web server is successfully installed andworking. Further configuration is required.</p><p>For online documentation and support please refer to<a href="http://nginx.org/">nginx.org</a>.<br/>Commercial support is available at<a href="http://nginx.com/">nginx.com</a>.</p><p><em>Thank you for using nginx.</em></p></body></html>
If you have ssh
configured, you can test that with the following command:
my_machine@abcdef:~/$ ssh user1@slurm-login-0 -t exec bash -luser1@slurm-login-0:~$