Enable Docker-in-Docker
To run Docker inside SUNK compute nodes, you must disable AppArmor and run privileged Pods. The following changes to the Slurm Helm chart install the Docker daemon on each compute node, mount persistent storage for Docker, and relax the security context so the daemon can operate. To disable AppArmor and run privileged Pods, edit thevalues.yaml file of the Slurm Helm chart.
In the compute section of the slurm Helm chart:
-
Create an
s6block to install and enable Docker. -
Add a volume mount for Docker using the
volumeMountandvolumesblocks. -
Under
pyxis, apply the following settings:- Set
pyxis.enabledtotrue. - Set
pyxis.appArmorProfiletonullto disable the AppArmor profile. - Set
pyxis.podSecurityContexttonullto remove the pod security context.
- Set
-
Under
securityContext, setsecurityContext.privilegedtotrue. The result should resemble the following: -
Verify that the AppArmor profile is correctly disabled.
Replace
[SLURMD-NODE-NAME]with the name of your slurmd Pod.This should return no output. If it outputsLocalhostProfile: profiles/enroot, then the configuration isn’t applied correctly.
Grant users Docker access
With the Helm chart changes applied, Docker is available on the compute nodes, but onlyroot can use it by default. The next step grants non-root users the permissions they need to run Docker commands.
After you disable AppArmor and enable privileged Pods, grant users access to Docker.
If not running as root, users must belong to the sudoers group. If the SUNK cluster manages users with SUNK User Provisioning (SUP), set nsscache.sudoGroups to the groups that should have sudo privileges.
SUP-based clusters
For SUP-based clusters, configurensscache: to provision access through SCIM, and list the sudo-enabled groups under sudoGroups:. Group names must exactly match the group names in CoreWeave IAM or your upstream IdP. Refer to the SUP documentation on creating user groups.
SSSD-based clusters
For SSSD-based clusters, set thesudoGroups: configuration in directoryService: to a list of Unix groups from all directories with sudo privileges.
- The group names are not fully qualified for the default directory. For example, the default directory’s group name should be
group1instead ofgroup1@example.com. - The group names are fully qualified for any additional directories.
Known security risks
Enabling Docker-in-Docker requires loosening several of SUNK’s default isolation guarantees. Before you apply the preceding configuration, review the risks in this section so you can decide whether the trade-offs are acceptable for your environment. Kubernetes provides a list of pod security standards. The following sections outline the subset most relevant to SUNK, but consider all of the standards listed in the Kubernetes documentation when assessing risk. The setup described in this guide:- Runs Docker inside a Pod.
- Requires privileged mode.
- Disables AppArmor protections.
This setup grants full access to the
/proc and sys directories, and allows kernel-level operations, including mounting, pivoting root, and process tracing.Item at risk: cgroups
cgroups handle management and monitoring of resources used by Slurm jobs. In privileged mode, users can bypass cgroup restrictions, reconfigure the resource limits set by Slurm, and move processes outside of their assigned cgroups. This is critical to understand when providing support for multi-tenancy in a single CKS cluster. The same guardrails you depend on for isolation do not apply here.
Item at risk: System and security
Disabling AppArmor on a privileged Pod removes additional security features, including protections against kernel exploits. This allows unrestricted access to the/proc and /sys directories. Without AppArmor protections in place, users can gain access to and tamper with other containers in the Pod, or access other devices outside the intended allocation.