Schedule Kubernetes Pods with Slurm
This conceptual explanation is for customers who want to schedule Kubernetes Pods with the Slurm Scheduler and learn about the benefits of scheduling Kubernetes Pods with Slurm.
Kubernetes Nodes are capitalized as proper nouns in this documentation, while Slurm nodes are lowercase.
Overview
In SUNK, each Slurm node is a Kubernetes Pod running a slurmd container. This design enables dynamic provisioning and scaling of Slurm nodes using native Kubernetes mechanisms. Note that Slurm nodes (Pods) are distinct from Kubernetes Nodes, which are the underlying worker machines that host these Pods.
In a typical research cluster, jobs are submitted to Slurm using its familiar Slurm commands like srun
and sinfo
. However, some workloads, such as real-time inference, are better suited to run as standalone Kubernetes Pods. Managing Kubernetes and Slurm clusters separately can be cumbersome, and dynamically moving Nodes between them is often impractical. To address this, SUNK includes the SUNK Scheduler, a customer Kubernetes scheduler, which allows users to schedule both Slurm jobs and Kubernetes Pods in the same cluster—for example, training and inference workloads coexisting seamlessly. The SUNK Scheduler is enabled by default.
As a prerequisite for scheduling Kubernetes Pods with Slurm, the Slurm cluster must be deployed with the SUNK Scheduler, which is enabled by default. When a Pod is marked for scheduling via the SUNK Scheduler, it creates a placeholder job in Slurm that the Slurm scheduler can manage, then allocates the Pod to the Node selected by Slurm.
The SUNK Scheduler operates as a conventional Kubernetes reconciler, watching Pod objects and events from the Slurm Controller. It ensures that Kubernetes Pods are scheduled on the same Kubernetes Nodes as Slurm nodes; that the state of the Slurm cluster is synchronized with Kubernetes; and that events in Kubernetes are propagated into Slurm. It can preempt Kubernetes workloads in favor of Slurm jobs running on the same Node, or vice versa, using the same logic.
Limitations
When scheduling Kubernetes Pods on the same Nodes as Slurm jobs, be aware of the following hardware limitations:
- Node sharing: Either a Slurm job or a Kubernetes Pod can run on a given Node, but not both at the same time.
- Memory availability: Avoid setting high memory requests for the slurmd container, as this can reduce scheduling flexibility for both Slurm jobs and Kubernetes Pods. When memory requests are too high, it limits how many workloads can be placed on a Node, leading to underutilization and longer queue or wait times. Configuring more conservative memory requests (e.g., 50% of the Node's memory) allows for more efficient use of resources across the cluster.
Benefits
Kubernetes and Slurm are both powerful workload management and scheduling tools, each with different strengths. Kubernetes excels at container orchestration and microservice management, while Slurm is designed for high-performance computing (HPC) workloads and job scheduling.
By using SUNK to integrate Slurm with Kubernetes, you can:
- Leverage the strengths of both systems. Use Kubernetes for container orchestration and Slurm for HPC job scheduling.
- Gain flexibility. Run Kubernetes and Slurm workloads side by side in the same cluster, such as training with Slurm and serving inference with Kubernetes, while sharing compute resources to maximize efficiency.
- Simplify cluster management. Manage both Kubernetes and Slurm workloads from a single platform.