Pyflame Dual Interpreter Mode

October 30, 2016

I recently implemented "dual interpreter mode" for Pyflame, which allows Pyflame to be compiled to target both Python 2 and Python 3 at the same time, in the same executable. This is extremely unusual, and Pyflame is the only Python project implemented in C/C++ that I am aware of that has this feature. In this post I'll explain how I implemented this feature for Pyflame.

The Problem

In order for Pyflame to work, it has to know a lot of details about the internals of the Python interpreter. The most important thing it must know is the struct offsets for various fields in Python objects. As an example, Pyflame needs to know the offsets for where to find things like a frame's pointer to a "code" object. If Pyflame thinks that the code object is at offset 48, but it's actually at offset 56, then Pyflame will get GIGO when trying to decode the stack.

Fortunately you can get all of these offsets from Python.h, and this is exactly what Pyflame does, and has always done. Unfortunately, these struct offsets differ between Python 2 and Python 3. This means that when you compile Pyflame you either give it the Python.h for Python 2, or the Python.h for Python 3, and the resulting Pyflame executable can only profile that version of Python.

There's another complication, which is that Python 2 and Python 3 declare many of the same symbols. This means that even aside from this struct offset issue, normally you wouldn't be able to compile an executable that links against say libpython2.7.so and libpython3.5m.so at the same time.

However, Pyflame isn't a normal C++ program. It actually only uses the Python headers to get struct offsets, and does not link against libpython. So in principle you could come up with a way to build a "dual" Pyflame executable that can profile both Python 2 and Python 3 processes.

While it's an interesting thought experiment to think about how to build a Pyflame that can support both Python interpreter versions at once, it's not really that useful. Most people are either using Python 2 or Python 3, so just supporting one at compile time is not a big deal. People who need both Python versions can just compile two versions. So I had created an issue to remind myself to look into this, but I had considered it very low priority.

This changed recently when I decided to try to get Pyflame into Fedora, and it occurred to me that if I actually did this crazy dual-interpreter mode it would make my packaging life a lot easier. Instead of maintaining python2-pyflame and python3-pyflame, I'd be able to just add a single package. And since there's no linking dependency, I can support both Python interpreters essentially for free. So off I went.

The Solution

There's two parts to solving this. The first is how the code is refactored to support two Python releases with minimal code duplication. The second part is how the build system (autoconf/automake) needed to be changed.

If you'd like to follow along with the changes, please see PR 42.

Code Changes

The code for Python 2 and Python 3 is 95% the same in my estimation. The struct offsets in Python 2 and Python 3 do differ, but other than that the only material change is how strings work in both releases, which is easy to work around with preprocessor macros, and which I had already done.

The solution I came up with here is to define a filed called frob.cc which implements all of the Python internals logic. This file includes Python.h as usual. It has the following compile-time logic:

There's some logic for switching the string implementation depending on the Python interpreter
If this file is being built for Python 2 it uses the namespace pyflame::py2, if the file is being built for Python 3 it uses the namespace pyflame::py3

There are two stub files that include frob.cc: frob2.cc includes it in a way so it's configured to build for Python 2, frob3.cc includes it in a way so it's configured to build for Python 3. The file frob.cc itself is never built into an object, only frob{2,3}.cc are actually compiled and linked.

There's another set of files called pyfrob.{cc.h} that have the following logic:

The runtime detection for Python 2 vs Python 3
The runtime logic for invoking routines from the right namespace based on the interpreter

The way I implemented this Pyflame will do all of this runtime logic just once when Pyflame starts up. Then while it's running it will invoke the interpreter-specific bits using function pointers. This is a pretty small optimization, but avoids additional runtime branching.

Most of the work here was actually refactoring the existing code to be consolidated into fewer files, and the new logic for detecting the Python version. I ended up touching most of the Pyflame codebase to get this to work. The preprocessor macros are pretty hairy in my opinion, but ended up working out fine.

Automake Changes

There's a lot of compilation logic that needs to change for this to work:

The autoconf configure script needs to be able to detect multiple Python
There are preprocessor defines that need to be propagated into config.h
Automake needs to know how to compile frob{2,3}.cc with different flags
The linking step needs to know what frob objects to link in

The hardest part of this was figuring out how to compile frob2.cc and frob3.cc with different include paths. I found an automake documentation page called Per-Object Flags Emulation. which is short, but does cover how to do this. I actually ended up bringing in libtool (which is a compile-time dependency only) since it provides some convenience methods.

I also had to change a lot of logic in my configure.ac so it would know how to pick between the two. The current solution detects what Python releases are on the system, and enables all of the supported ones. I'm not super happy with this: in the future I'll probably revisit the code to allow building against just one release or another.

Next Steps

Once I get my PR reviewed and landed I'm going to tag a major new version of Pyflame, and then try to base my Fedora package submission on that. The next major feature I'm adding to Pyflame will be from issue 13. This issue describes rewriting the code to not use _PyThreadState_Current, and instead find the global "interpreter" list and use that to find the threads. This will let me get stack traces from idle threads which has a ton of really interesting use cases.