Profile Python applications on SUNK
This guide covers how to profile Python applications running in Kubernetes using py-spy and Linux perf tools. It assumes you have a working SUNK deployment and a Python application Pod.
Prerequisites
To profile a Python application in Kubernetes, you need a working SUNK deployment and a Python application Pod Then, you'll configure the application Pod and the DaemonSet to allow profiling. You'll need to:
- Add the
SYS_PTRACEcapability to thesecurityContextsection of your Pod'sspec - Set the
kernel.yama.ptrace_scope=0andkernel.perf_event_paranoid=-1kernel parameters on the Nodes
1. Add the SYS_PTRACE capability
Your Python application Pod needs the SYS_PTRACE capability to allow profilers to attach. Add the following to the securityContext section of your Pod's spec:
spec:securityContext:capabilities:add:- SYS_PTRACE
2. Set the kernel parameters
The Kubernetes Nodes need to have the kernel.yama.ptrace_scope=0 and kernel.perf_event_paranoid=-1 kernel parameters set. This can be done by deploying a DaemonSet that runs a privileged container on each Kubernetes Node to set the required kernel parameters.
- The
kernel.yama.ptrace_scope=0kernel parameter allows processes to attach and read memory from other processes - The
kernel.perf_event_paranoid=-1kernel parameter allows unprivileged access to performance monitoring
To set the kernel parameters, create a DaemonSet that runs a privileged container on each Node to set the required kernel parameters.
-
Create a file called
py-perf-ds.yamlwith the following content:py-perf-ds.yamlapiVersion: apps/v1kind: DaemonSetmetadata:name: perf-debugnamespace: tenant-slurmspec:selector:matchLabels:name: perf-debugtemplate:metadata:labels:name: perf-debugspec:tolerations:- key: "sunk.coreweave.com/lock"operator: "Exists"- key: "sunk.coreweave.com/node"operator: "Exists"containers:- name: perf-debugimage: busyboxcommand:- /bin/sh- -c- >sysctl -w kernel.perf_event_paranoid=-1 &&sysctl -w kernel.yama.ptrace_scope=0 &&sleep infinitySet the kernel parameters, then sleep to keep the container running.securityContext:privileged: trueThis DaemonSet sets the required kernel parameters on every Node.
- It uses
sleep infinityto keep the container running so that the kernel parameters persist while the container is running. - The
securityContextsection runs the container in privileged mode to allow it to set the kernel parameters.
- It uses
-
Deploy the DaemonSet by applying the file we created earlier.
$kubectl apply -f py-perf-ds.yaml -
Verify the DaemonSet is running on all Nodes.
$kubectl get ds perf-debug -n tenant-slurm -
Verify the Pods are running.
$kubectl get pods -n tenant-slurm -l name=perf-debug -
Verify the kernel parameters are set on one of the Pods.
$kubectl get pods -n tenant-slurm -l name=perf-debug -
Verify the kernel parameters are set on one of the Pods.
$kubectl exec -n tenant-slurm MY-POD-NAME -- \sysctl kernel.perf_event_paranoid kernel.yama.ptrace_scope
A successful output should show the kernel parameters are set to -1 and 0 respectively.
kernel.perf_event_paranoid = -1kernel.yama.ptrace_scope = 0
Use py-spy
py-spy is a sampling profiler specifically designed for Python. It shows Python-level stack traces with function names, file paths, and line numbers.
Install py-spy
To install py-spy, you need to start a debug container with the target Pod, then install py-spy in the debug container.
-
First, get the name of the target Pod.
$PYTHON_POD=$(kubectl get pod -n tenant-slurm -l app=your-app -o jsonpath='{.items[0].metadata.name}') -
Start a debug container with the target Pod.
$kubectl debug $PYTHON_POD -n tenant-slurm \--target=python-container-name \--image=python:3.12-slim \--profile=general \-it -- bash -
Inside the debug container, install
py-spy.$pip install py-spy
Show live top view in py-spy
To show live top view of the profiling data, updated continuously, run the following command:
$py-spy top --pid 1
A successful output should show the real-time profiling data.
Collecting samples from 'python /app.py' (pid: 1)Total Samples 1000GIL: 100%, Active: 100%, Threads: 1%Own %Total OwnTime TotalTime Function (filename:line)45.00% 45.00% 4.50s 4.50s compute_hash (app.py:7)30.00% 75.00% 3.00s 7.50s process_data (app.py:11)15.00% 15.00% 1.50s 1.50s dumps (json/__init__.py:231)5.00% 95.00% 0.50s 9.50s main (app.py:18)5.00% 5.00% 0.50s 0.50s sleep (time.py:123)
The output shows the following:
%Own: Percentage of time spent in this function itself%Total: Percentage of time spent in this function + functions it callsOwnTime: Total time spent in this function itselfTotalTime: Total time spent in this function + functions it callsGIL: Percentage of time holding the Global Interpreter LockFunction (filename:line): Function name, filename, and line number
Record to SVG flamegraph
To record profiling data and generate an SVG flamegraph:
-
Run the following command:
$py-spy record --pid 1 --duration 30 -o /tmp/profile.svgA successful output should show the profiling data has been written to the
/tmp/profile.svgfile.py-spy> Sampling process 100 times a second for 30 seconds. Press Control-C to exit early.py-spy> Wrote flamegraph data to '/tmp/profile.svg'. Samples: 3000 -
Open a new terminal in your local machine and copy the SVG file to your local machine.
$kubectl cp tenant-slurm/$PYTHON_POD:/tmp/profile.svg ./profile.svg -c python-container-name -
Open
profile.svgin a browser to see the flamegraph.
Record to Speedscope format
Speedscope is a web-based viewer for performance profiles. To record profiling data and generate a Speedscope JSON file:
-
Run the following command:
$py-spy record --pid 1 --duration 30 --format speedscope -o /tmp/profile.speedscope.json -
Open a new terminal in your local machine and copy the JSON file to your local machine.
$kubectl cp tenant-slurm/$PYTHON_POD:/tmp/profile.speedscope.json ./profile.speedscope.json -c python-container-name -
Upload the JSON file to Speedscope for analysis.
Show thread activity
To show what each thread is doing, run the following command:
$py-spy dump --pid 1
A successful output should show the thread activity.
Process 1: python /app.pyThread 1 (active): "MainThread"compute_hash (app.py:7)process_data (app.py:11)main (app.py:18)<module> (app.py:28)
Only show threads holding GIL
The GIL (Global Interpreter Lock) is a mutex that protects the Python interpreter from concurrent execution. It's used to ensure that only one thread can execute Python code at a time. Monitoring the GIL can help you understand Python CPU usage (ignoring I/O wait).
To only show threads holding GIL, run the following command:
$py-spy top --pid 1 --gil
py-spy options
py-spy has several options that can be used to configure the profiling data.
rate option
The rate option can be used to set the sampling rate. The default is 100 Hz. To sample at a higher rate, such as 500 Hz, run the following command:
$py-spy record --pid 1 --rate 500 -o profile.svg
native option
The native option can be used to show native (C/C++) extensions. To show native extensions, run the following command:
$py-spy record --pid 1 --native -o profile.svg
idle option
The idle option can be used to show idle threads. To show idle threads, run the following command:
$py-spy record --pid 1 --idle -o profile.svg
nonblocking option
The nonblocking option can be used to run in non-blocking mode, which doesn't pause the target process. To run in non-blocking mode, run the following command:
$py-spy record --pid 1 --nonblocking -o profile.svg
Use perf
Linux perf is a powerful performance analysis tool that shows system-level and native code performance.
Install perf
To install perf, you need to start a debug container with the target Pod, then install perf in the debug container.
-
First, get the name of the target Pod.
$PYTHON_POD=$(kubectl get pod -n tenant-slurm -l app=your-app -o jsonpath='{.items[0].metadata.name}') -
Start a debug container with Ubuntu. The Ubuntu distribution has
perftools available.$kubectl debug $PYTHON_POD -n tenant-slurm \--target=python-container-name \--image=ubuntu:22.04 \--profile=general \-it -- bash -
Inside the debug container, install
perf.$apt-get update && apt-get install -y linux-tools-generic -
Locate the
perfbinary. The location varies by kernel version.$PERF=$(find /usr/lib/linux-tools -name perf | head -1)
Show live top view in perf
To show real-time usage by function, run the following command:
$$PERF top -p 1
A successful output should show the real-time usage by function.
Samples: 8K of event 'cycles', 4000 Hz, Event count (approx.): 2841251931 lost: 0/0 drop: 0/0Overhead Shared Object Symbol12.50% python3.12 [.] _PyEval_EvalFrameDefault8.30% python3.12 [.] PyObject_GetAttr6.20% [kernel] [k] copy_user_enhanced_fast_string5.10% python3.12 [.] _PyObject_GenericGetAttrWithDict4.80% _hashlib.so [.] EVP_MD_CTX_copy_ex3.90% python3.12 [.] PyDict_GetItem3.20% libc.so.6 [.] __memcpy_avx_unaligned2.80% python3.12 [.] PyUnicode_AsUTF8AndSize
The output shows the following:
Overhead: CPU time percentageShared Object: The library/binary (python3.12, kernel, libc)Symbol: Function name[.]: User-space function[k]: Kernel function
Record and generate a report
To record for a specific duration, then generate a report, use the record command, then the report command.
This records at 99 Hz for 30 seconds with call graphs.
$$PERF record -F 99 -p 1 -g -- sleep 30
Use the report command to view the report.
$$PERF report
The report command shows output similar to the following:
# Samples: 2K of event 'cycles'# Event count (approx.): 1984327896## Overhead Command Shared Object Symbol# ........ ....... ................ .................................#15.23% python python3.12 [.] _PyEval_EvalFrameDefault|---_PyEval_EvalFrameDefault|--45.00%--compute_hash|--30.00%--process_data|--15.00%--json_dumps8.91% python _hashlib.so [.] EVP_DigestUpdate|---EVP_DigestUpdateHASH_Update_hashlib_openssl_sha256_update
Use arrow keys to navigate the report and press Enter to expand call chains.
Generate a text report
To generate a text report to save or share results, use the report command with the --stdio option.
$$PERF report --stdio > perf_report.txt
View detailed statistics
To view detailed statistics, use the stat command. This shows statistics for the process with the given PID for the specified duration.
$$PERF stat -p 1 -- sleep 10
A successful output should show the statistics.
Performance counter stats for process id '1':10,234.56 msec task-clock # 1.023 CPUs utilized1,234 context-switches # 120.567 /sec45 cpu-migrations # 4.398 /sec123 page-faults # 12.024 /sec38,456,789,012 cycles # 3.758 GHz24,123,456,789 instructions # 0.63 insn per cycle5,678,901,234 branches # 554.932 M/sec12,345,678 branch-misses # 0.22% of all branches10.003456789 seconds time elapsed
Generate flamegraph data
To generate flamegraph data, use the record command, then the script command. This will output the data as a script that can be used with the FlameGraph tool.
-
Record the data. This records at 99 Hz for 30 seconds with call graphs.
$$PERF record -F 99 -p 1 -g -- sleep 30 -
Use the
scriptcommand to output the data as a script.$# Output as script format$PERF script > perf.data.script -
Copy the script file to your local machine.
$kubectl cp tenant-slurm/$PYTHON_POD:/path/to/perf.data.script ./perf.data.script -c python-container-name -
Use the FlameGraph tool to generate a flamegraph from the script file.
perf options
perf has several options that can be used to configure the profiling data.
-F option
The -F option can be used to set the sampling rate. The default is 99 Hz. To sample at a higher rate, such as 999 Hz, run the following command:
$$PERF record -F 999 -p 1 -g -- sleep 30
The -a option can be used to record all CPUs.
$$PERF record -F 99 -a -g -- sleep 30
The -e option can be used to record specific events.
$$PERF record -e cycles,instructions -p 1 -g -- sleep 30
The --call-graph option can be used to record with call-graph using dwarf, which is more accurate but also incurs more overhead.
$$PERF record -F 99 -p 1 --call-graph dwarf -- sleep 30
Comparison: py-spy vs perf
py-spy and perf are two different tools for profiling Python applications in Kubernetes. They have different strengths and weaknesses.
| Feature | py-spy | perf |
|---|---|---|
| Focus | Python code only | All code (Python, C, kernel) |
| Output | Function names, file:line | Native symbols, may need debug symbols |
| Ease of use | Very easy, Python-specific | More complex, general purpose |
| Overhead | Very low (~1-2%) | Low (~1-5%) |
| Best for | Python performance issues | System/native code issues, CPU/cache analysis |
| GIL detection | Yes, built-in | No |
| Multi-threaded | Shows Python threads clearly | Shows all threads |
| Setup | Just install py-spy | Need kernel tools, may need debug symbols |
| Output formats | SVG, speedscope, text | Text report, script for flamegraphs |
Use py-spy when you need to profile pure Python code. It provides a quick, easy-to-read view of the Python code. It has very low overhead, so it's suitable to use in production. It's ideal when you need to see Python function names and line numbers, or want to understand GIL contention.
Use perf when you need to profile system-level performance. It provides a detailed view of the C extensions and kernel code, including CPU cache, branch prediction, and hardware counter data. It's ideal when you suspect issues in native libraries (numpy, pandas C code, etc.), or need to correlate Python and kernel activity.
Use both tools when you need to profile complex performance issues and get a complete picture of the performance. This is particularly useful if your code uses significant C extensions.
Troubleshooting: "Process not found"
Both py-spy and perf can encounter a "Process not found" error when trying to profile a Python application.
Both tools need to attach to the target process. If the target process is not running, or the PID is incorrect, the tools will fail with a "Process not found" error.
To fix this, you need to verify the target process is running and the PID is correct.
- Verify the target process is running by running the following command:
ps aux | grep python - Check you're in the right container/namespace by running the following command:
kubectl get pods -n tenant-slurm - If using
kubectl debug, ensure--targetmatches your container name.
Troubleshooting py-spy
"Permission Denied" error
You may encounter a "Permission Denied" error when trying to profile a Python application with py-spy. This is because the target Pod does not have the SYS_PTRACE capability.
For example:
Error: Permission Denied: Try running again with elevated permissions
To fix this, you need to verify that the DaemonSet is running and the kernel parameters are set correctly.
-
Verify the DaemonSet is running with the following command:
kubectl get pods -n tenant-slurm -l name=perf-debug -
Check the kernel parameters:
$kubectl exec -n tenant-slurm -l name=perf-debug -- \sysctl kernel.yama.ptrace_scopeA successful output should show the kernel parameters are set to
0.kernel.yama.ptrace_scope = 0 -
Ensure the target Pod has the
SYS_PTRACEcapability. -
Use
--profile=generalwhen runningkubectl debug
"Failed to find python version" error
You may encounter a "Failed to find python version" error when trying to profile a Python application with py-spy. This is because the target Pod is not a Python process, or the Python process is not the main process.
For example:
Error: Failed to find python version from target process
To fix this, you need to find the correct Python PID and use it. First, find the correct Python PID by running the following command:
$ps aux | grep python
Then, use that PID to profile the application by using the --pid option.
$py-spy top --pid -MY-ACTUAL-PID
Alternatively, make sure Python is PID 1 by using exec python app.py in your container command
Troubleshooting perf
"failed with EPERM" error
You may encounter a "failed with EPERM" error when trying to profile a Python application with perf.
Error: perf_event_open(...) failed with EPERM
To fix this, you need to verify that the DaemonSet is running with the following command:
$kubectl get pods -n tenant-slurm -l name=perf-debug
Next, verify the kernel parameters are set correctly. Run the following command:
$kubectl exec -n tenant-slurm -l name=perf-debug -- \sysctl kernel.perf_event_paranoid
The output should show kernel.perf_event_paranoid is set to -1.
No symbols found
If perf shows hex addresses instead of function names, it means the debug symbols are not installed.
To fix this, you need to install the debug symbols for Python.
$apt-get install python3-dbg
Use the --call-graph dwarf option for better stack traces.
$$PERF record --call-graph dwarf -p 1 -- sleep 30
Tips for successful profiling
- Start with py-spy. It's easier to use than
perfand usually sufficient for Python issues. - Use low sampling rates in production. 99-100 Hz is usually enough resolution for most applications.
- Record for at least 30 seconds. This gives representative samples.
- Profile during representative load. Idle or startup profiles aren't useful.
- Copy profiles out immediately. Debug containers are ephemeral and will be deleted after the profiling session.
- Don't leave the DaemonSet running in production long-term. It uses privileged access and it's a security risk to leave it running after the profiling session.
- Use flamegraphs. They're much easier to understand than text output. Flamegraphs show the most time-consuming functions and their callers, making it easy to identify bottlenecks.
- Compare before and after. Profile before optimization to establish baseline. This helps you understand the impact of your optimizations.
Typical workflow
Here's a typical workflow for profiling a Python application in Kubernetes. This workflow assumes you have a working SUNK deployment and a Python application Pod.
-
First, deploy the DaemonSet to set the kernel parameters on all Nodes. See the example DaemonSet file in the Set the kernel parameters prerequisites step. Then, deploy the DaemonSet by running the following command:
$kubectl apply -f py-perf-ds.yaml -
Start a profiling session.
$PYTHON_POD=$(kubectl get pod -n tenant-slurm -l app=my-app -o jsonpath='{.items[0].metadata.name}')$kubectl debug $PYTHON_POD -n tenant-slurm \--target=app-container \--image=python:3.12-slim \--profile=general \-it -- bash -
Inside the debug container, install
py-spyand check the top view.$pip install py-spy$py-spy top --pid 1 -
Record a flamegraph for analysis.
$py-spy record --pid 1 --duration 60 -o /tmp/profile.svg -
From another terminal, copy the flamegraph to your local machine. This is important because the debug container will be deleted after the profiling session.
$kubectl cp tenant-slurm/$PYTHON_POD:/tmp/profile.svg ./profile.svg -c app-container -
If needed, dive deeper with
perf.$apt-get update && apt-get install -y linux-tools-generic$PERF=$(find /usr/lib/linux-tools -name perf | head -1)$$PERF record -F 99 -p 1 -g -- sleep 60$$PERF report -
Clean up the DaemonSet when done.
$kubectl delete -f py-perf-ds.yaml