> ## Documentation Index > Fetch the complete documentation index at: https://docs.coreweave.com/llms.txt > Use this file to discover all available pages before exploring further. # Run SWE-bench in SUNK with Docker > Run the SWE-bench LLM benchmark on SUNK using Docker-in-Docker with GPU resources SWE-bench is a benchmark for evaluating large language models on software issues collected from GitHub. SWE-bench uses Docker to create reproducible artifacts that can be ported to different platforms. This guide is for SUNK users who want to evaluate large language models against the SWE-bench suite on GPU-backed nodes. By the end of the guide, you have SWE-bench installed in a Python virtual environment on a SUNK node and a successful benchmark run that produces a JSON report. This guide explains how to run [SWE-bench](https://www.swebench.com/original.html) on SUNK with the following steps: 1. [Enable support for Docker in SUNK](/products/sunk/run_workloads/docker-in-docker) 2. [Select a node to run the benchmark on](#acquire-gpu-resources) 3. [Install SWE-bench in a Python environment on the selected node](#clone-swe-bench-and-set-up-python) ## Tested versions This guide is tested and verified on the following configurations: * SUNK `cgroup/v1` and `cgroup/v2` * v6.9.1 * v7.1.0 * NVIDIA L40 and H100 GPUs ## Prerequisites To run SWE-bench on SUNK, you first need to enable Docker support. For instructions, see the [guide on using Docker in SUNK](/products/sunk/run_workloads/docker-in-docker). Using Docker in SUNK requires enabling privileged Pods and disabling the recommended AppArmor profile. This process grants elevated kernel capabilities and weakens isolation guarantees. See the [known security risks](/products/sunk/run_workloads/docker-in-docker#known-security-risks) section for more details. **It is your responsibility to verify that third-party code is safe to execute alongside your other workloads**. To run SWE-bench in SUNK, select a node to run the benchmark on, set up a Python environment on that node, and then install SWE-bench in the Python environment. ## Acquire GPU resources SWE-bench needs an interactive shell on a GPU node so the benchmark harness can build Docker images and run evaluations against the GPU. First, identify a node or partition on which to run the benchmark. The following examples use an H100 node in the `h100` partition. Choose **one** of the following methods: ### Option 1: `exec` into an existing GPU pod List the Pods in your namespace: ```bash theme={"system"} kubectl get pods -n [NAMESPACE] ``` In this example, the target Pod is named `h100-123-123`. Open an interactive terminal session inside the Pod with `kubectl exec`: ```bash theme={"system"} kubectl exec -it -n [NAMESPACE] h100-123-123 -- bash ``` ### Option 2: Start an interactive job within a Slurm login pod Use this option when you don't already have a GPU Pod running and want Slurm to allocate one for the session. In this example, the Slurm login Pod is `tenant-slurm-login-0`: ```bash theme={"system"} kubectl exec -it -n [NAMESPACE] tenant-slurm-login-0 -- bash ``` Use `srun` to start an interactive session on your chosen partition. In this example, the partition is `h100`: ```bash theme={"system"} srun --nodes=1 --gres=gpu:1 --partition=h100 --pty bash ``` ## Clone SWE-bench and set up Python With an interactive shell on a GPU node ready, you can install SWE-bench and run the benchmark. Clone SWE-bench and set up the Python environment. The following examples use [`uv`](https://docs.astral.sh/uv/) to create a Python virtual environment. For `venv` and `pip` versions of this process, see the [Python documentation](https://python.land/virtual-environments/virtualenv). 1. Install `uv` with `curl`: ```bash theme={"system"} curl -LsSf https://astral.sh/uv/install.sh | sh source $HOME/.local/bin/env ``` Follow the instructions in the provided output about sourcing to add `uv` to your `PATH`. 2. Clone the `SWE-bench` repository: ```bash theme={"system"} git clone https://github.com/SWE-bench/SWE-bench.git cd SWE-bench ``` 3. Create a Python virtual environment: ```bash theme={"system"} uv venv ``` 4. Install the current directory, `SWE-bench`, as a Python package in the virtual environment: ```bash theme={"system"} uv pip install . ``` 5. Execute the benchmark inside the Pod: ```bash theme={"system"} uv run python -m swebench.harness.run_evaluation \ --predictions_path gold \ --max_workers 1 \ --instance_ids sympy__sympy-20590 \ --namespace '' \ --run_id validate-gold ``` The expected output is as follows: ```text theme={"system"} Built swebench @ file:///opt/nccl-tests/SWE-bench Uninstalled 2 packages in 4ms Installed 2 packages in 4ms :128: RuntimeWarning: 'swebench.harness.run_evaluation' found in sys.modules after import of package 'swebench.harness', but prior to execution of 'swebench.harness.run_evaluation'; this may result in unpredictable behaviour Using gold predictions README.md: 3.67kB [00:00, 25.5MB/s] data/dev-00000-of-00001.parquet: 100%|████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████| 107k/107k [00:00<00:00, 488kB/s] data/test-00000-of-00001.parquet: 100%|████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████| 1.11M/1.11M [00:00<00:00, 13.0MB/s] Generating dev split: 100%|████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████| 23/23 [00:00<00:00, 5326.84 examples/s] Generating test split: 100%|████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████| 300/300 [00:00<00:00, 25575.54 examples/s] Building base image (sweb.base.py.x86_64:latest) Base images built successfully. Total environment images to build: 1 All environment images built successfully. Running 1 instances... Evaluation: 100%|██████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████| 1/1 [00:56<00:00, 56.05s/it, ✓=1, ✖=0, error=0]All instances run. Evaluation: 100%|██████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████| 1/1 [00:56<00:00, 56.05s/it, ✓=1, ✖=0, error=0] Cleaning cached images... Removed 0 images. Total instances: 1 Instances submitted: 1 Instances completed: 1 Instances incomplete: 0 Instances resolved: 1 Instances unresolved: 0 Instances with empty patches: 0 Instances with errors: 0 Unstopped containers: 0 Unremoved images: 0 Report written to gold.validate-gold.json ``` A successful run creates a report file named `gold.validate-gold.json` in the working directory. ## Known limitations ### H200 GPU compile error SWE-bench does not compile on an H200 GPU. The benchmark terminates with the following error: ```text wrap theme={"system"} ebench.harness.docker_build.BuildImageError: Error building image sweb.base.py.x86_64:latest: The command '/bin/sh -c apt update && apt install -y wget git build-essential libffi-dev libtiff-dev python3 python3-pip python-is-python3 jq curl locales locales-all tzdata && rm -rf /var/lib/apt/lists/*' returned a non-zero code: 255 Check (logs/build_images/base/sweb.base.py.x86_64__latest/build_image.log) for more information. ``` ### Non-InfiniBand node behavior Enabling privileged Pods on non-InfiniBand nodes may result in NCCL failing to use `eth0` correctly. To force NCCL to use `eth0`, set the following environment variables: ```bash theme={"system"} export NCCL_SOCKET_IFNAME=eth0 export UCX_NET_DEVICES=eth0 export NCCL_COLLNET_ENABLE=0 export NCCL_IB_HCA=eth0 ```