- Enable support for Docker in SUNK
- Select a node to run the benchmark on
- Install SWE-bench in a Python environment on the selected node
Tested versions
This guide is tested and verified on the following configurations:- SUNK
cgroup/v1andcgroup/v2- v6.9.1
- v7.1.0
- NVIDIA L40 and H100 GPUs
Prerequisites
To run SWE-bench on SUNK, you first need to enable Docker support. For instructions, see the guide on using Docker in SUNK. To run SWE-bench in SUNK, select a node to run the benchmark on, set up a Python environment on that node, and then install SWE-bench in the Python environment.Acquire GPU resources
SWE-bench needs an interactive shell on a GPU node so the benchmark harness can build Docker images and run evaluations against the GPU. First, identify a node or partition on which to run the benchmark. The following examples use an H100 node in theh100 partition.
Choose one of the following methods:
Option 1: exec into an existing GPU pod
List the Pods in your namespace:
h100-123-123. Open an interactive terminal session inside the Pod with kubectl exec:
Option 2: Start an interactive job within a Slurm login pod
Use this option when you don’t already have a GPU Pod running and want Slurm to allocate one for the session. In this example, the Slurm login Pod istenant-slurm-login-0:
srun to start an interactive session on your chosen partition. In this example, the partition is h100:
Clone SWE-bench and set up Python
With an interactive shell on a GPU node ready, you can install SWE-bench and run the benchmark. Clone SWE-bench and set up the Python environment. The following examples useuv to create a Python virtual environment. For venv and pip versions of this process, see the Python documentation.
-
Install
uvwithcurl:Follow the instructions in the provided output about sourcing to adduvto yourPATH. -
Clone the
SWE-benchrepository: -
Create a Python virtual environment:
-
Install the current directory,
SWE-bench, as a Python package in the virtual environment: -
Execute the benchmark inside the Pod:
The expected output is as follows:A successful run creates a report file named
gold.validate-gold.jsonin the working directory.
Known limitations
H200 GPU compile error
SWE-bench does not compile on an H200 GPU. The benchmark terminates with the following error:Non-InfiniBand node behavior
Enabling privileged Pods on non-InfiniBand nodes may result in NCCL failing to useeth0 correctly. To force NCCL to use eth0, set the following environment variables: