Run notebooks on SUNK
Learn how to run interactive notebooks on SUNK
Interactive notebooks provide a powerful environment for data exploration, model development, and visualization on SUNK. This guide covers running notebooks on SUNK's Slurm-managed clusters for both interactive development and batch job execution.
Consider using notebooks on SUNK if:
- You need an interactive environment for data exploration or model prototyping.
- You want to iterate quickly on code while leveraging GPU resources.
- You need to visualize results during development before running full batch jobs.
This guide uses marimo as the primary example. marimo notebooks are pure Python scripts that can run both interactively and as batch jobs. The same port forwarding and container techniques also work for Jupyter notebooks.
Prerequisites
Before completing the steps in this guide, be sure you have the following:
- Access to a SUNK cluster with permissions to submit jobs.
- Familiarity with Slurm commands, such as
srun,sbatch, andsqueue. - A shared directory for storing notebooks, such as
/mnt/dataor your home directory.
Install marimo
You can install marimo within a container or conda environment. The recommended approach is to use containers for reproducibility.
Using containers
To create a container with marimo installed, pull a base Python container and save it as a squash file.
-
From the login node, pull a Python container and install marimo:
Example$ srun --container-image=python:3.11-slim \--container-remap-root --container-mounts=/mnt/home:/mnt/home \--container-save ${HOME}/marimo.sqsh --pty bash -i -
Within the container, install marimo:
Example$ pip install marimo # ... and other packages like torch, jax, etc.. -
Exit the container to save it. The container is now available at
${HOME}/marimo.sqsh.
For more information about working with containers on SUNK, see the SUNK Training guide.
Using conda
Alternatively, you can create a conda environment with marimo:
$ conda create --name marimo-env python=3.11 pip$ conda activate marimo-env$ pip install marimo
Interactive development
For interactive notebook development, you can run marimo in headless mode and connect via port forwarding.
Submit an interactive job
Create a script named run_marimo.sh:
#!/bin/bash#SBATCH --job-name=marimo#SBATCH --output=marimo-%j.out#SBATCH --cpus-per-task=4#SBATCH --mem=16GB#SBATCH --time=4:00:00# Activate your environment (choose one)# Option 1: Using conda# eval "$(conda shell.bash hook)"# conda activate marimo-env# Option 2: Using container (add --container flags to srun below)# Start marimo in headless modepython -m marimo edit /mnt/home/${USER}/notebook.py --headless --host 0.0.0.0 --port 3000
Submit the job:
$ sbatch run_marimo.sh
Connect via port forwarding
-
Find your compute node using
squeue:Example$ squeue -u $USERYou should see output similar to:
ExampleJOBID PARTITION NAME USER ST TIME NODES NODELIST(REASON)1234 h100 marimo user_name R 0:30 1 slurm-h100-231-147 -
Establish an SSH tunnel from your local machine to the compute node through the login node:
Example$ ssh -L 3000:slurm-h100-231-147:3000 username@login-nodeReplace
slurm-h100-231-147with your actual compute node name. -
Access the marimo interface at
http://localhost:3000in your web browser.
Using containers for interactive sessions
To run marimo interactively within a container:
$ srun -p h100 --exclusive \--container-image=${HOME}/marimo.sqsh \--container-mounts=/mnt/home:/mnt/home \--pty bash -c "python -m marimo edit /mnt/home/${USER}/notebook.py --headless --host 0.0.0.0 --port 3000"
Batch job execution
marimo notebooks can run as batch jobs using command-line arguments via mo.cli_args().
Create a notebook for batch execution
Create a marimo notebook that accepts command-line arguments:
import marimo...app = marimo.App()with app.setup:import marimo as modef _():# Access command-line argumentsargs = mo.cli_args()learning_rate = float(args.get("learning-rate", 0.01))epochs = int(args.get("epochs", 100))print(f"Training with learning_rate={learning_rate}, epochs={epochs}")# Your training code here...if __name__ == "__main__":app.run()
Submit as a batch job
Create a batch script batch_notebook.sh:
#!/bin/bash#SBATCH --job-name=marimo-batch#SBATCH --output=marimo-batch-%j.out#SBATCH --cpus-per-task=8#SBATCH --mem=32GB#SBATCH --time=2:00:00eval "$(conda shell.bash hook)"conda activate marimo-envpython /mnt/home/${USER}/notebook.py -- --learning-rate 0.01 --epochs 100
Submit the job:
$ sbatch batch_notebook.sh
GPU configuration
To run notebooks with GPU access, add GPU resources to your SBATCH directives:
#!/bin/bash#SBATCH --job-name=marimo-gpu#SBATCH --output=marimo-gpu-%j.out#SBATCH --partition=h100#SBATCH --gpus-per-task=1#SBATCH --cpus-per-task=8#SBATCH --mem=64GB#SBATCH --time=4:00:00eval "$(conda shell.bash hook)"conda activate marimo-envpython -m marimo edit /mnt/home/${USER}/gpu_notebook.py --headless --host 0.0.0.0 --port 3000
For multi-GPU workloads:
#SBATCH --gpus-per-task=8#SBATCH --exclusive
VS Code integration
You can use notebooks with VS Code tunnels on SUNK for a more integrated development experience. After setting up a VS Code tunnel to your compute node, install the marimo VS Code extension or use Jupyter notebooks directly within VS Code.