Dumping Multi-Threaded Python Stacks

November 1, 2017

Today I released version 1.6.0 of Pyflame, which implements a new feature I've wanted to add for a while: the ability to dump the stacks of all running threads. This functionality is roughly equivalent to thread apply all bt with GDB, except the stack traces you get are Python stacks, not C stacks. This is not just useful for multi-threaded programs: it's also incredibly useful for debugging single-threaded programs.

Here's what it looks like in action. In this example, I'm asking Pyflame to dump the stacks for all threads belonging to process 9558:

$ pyflame -d -p 9558
140302210512640:
tests/dijkstra.py:generate:58
tests/dijkstra.py:run:154
tests/dijkstra.py:run_times:169
/usr/lib64/python3.6/threading.py:run:864
/usr/lib64/python3.6/threading.py:_bootstrap_inner:916
/usr/lib64/python3.6/threading.py:_bootstrap:884

140302218905344*:
/usr/lib64/python3.6/random.py:_randbelow:235
/usr/lib64/python3.6/random.py:randrange:196
/usr/lib64/python3.6/random.py:randint:220
tests/dijkstra.py:random_unfilled:80
tests/dijkstra.py:generate:52
tests/dijkstra.py:run:154
tests/dijkstra.py:run_times:169
/usr/lib64/python3.6/threading.py:run:864
/usr/lib64/python3.6/threading.py:_bootstrap_inner:916
/usr/lib64/python3.6/threading.py:_bootstrap:884

140302382565184:
/usr/lib64/python3.6/threading.py:_wait_for_tstate_lock:1072
/usr/lib64/python3.6/threading.py:join:1056
tests/dijkstra.py:main:200
tests/dijkstra.py:<module>:204

As usual, the program being traced has no explicit Pyflame instrumentation---it's just a regular Python program. I ran it with three threads total: two worker threads, plus the main thread. The currently executing thread is marked in the trace output with an asterisk, and is thread 140302218905344 in this example. The thread id listed in the dump is the same id returned by _thread.get_ident(). On Linux systems this will be the same as the integer returned by pthread_self(), as CPython threads are implemented using pthreads.

For really gnarly threading issues you may need to use to a "real" debugger like Pyrasite or GDB. But in my experience, getting a stack trace from all running threads is suitable for debugging many (if not most) real world deadlock issues, including those that have nothing to do with threading at all. For instance, let's say you have a Python process that is stuck in an unresponsive state. You want to know why it's stuck. It could be some kind of threading/deadlock issue, but it could also be something as simple as an unresponsive I/O request that is blocking the entire process. In both cases, knowing where in the code the process is stuck is often sufficient to debug the problem. This is true whether the process is stuck due to locks being acquired in the wrong order, or because a blocking HTTP request is missing a timeout.

Adding this feature was really straightforward, thanks to the work of Max Bolingbroke and James Pickering, who contributed multi-threading support to Pyflame earlier this year. If you find this feature useful, or have any feedback, feel free to shoot me an email or leave an issue on GitHub.