I recently implemented "dual interpreter mode" for Pyflame, which allows Pyflame to be compiled to target both Python 2 and Python 3 at the same time, in the same executable. This is extremely unusual, and Pyflame is the only Python project implemented in C/C++ that I am aware of that has this feature. In this post I'll explain how I implemented this feature for Pyflame.
The Problem
In order for Pyflame to work, it has to know a lot of details about the internals of the Python interpreter. The most important thing it must know is the struct offsets for various fields in Python objects. As an example, Pyflame needs to know the offsets for where to find things like a frame's pointer to a "code" object. If Pyflame thinks that the code object is at offset 48, but it's actually at offset 56, then Pyflame will get GIGO when trying to decode the stack.
Fortunately you can get all of these offsets from Python.h
, and this is
exactly what Pyflame does, and has always done. Unfortunately, these struct
offsets differ between Python 2 and Python 3. This means that when you compile
Pyflame you either give it the Python.h
for Python 2, or the Python.h
for
Python 3, and the resulting Pyflame executable can only profile that version of
Python.
There's another complication, which is that Python 2 and Python 3 declare many
of the same symbols. This means that even aside from this struct offset issue,
normally you wouldn't be able to compile an executable that links against say
libpython2.7.so
and libpython3.5m.so
at the same time.
However, Pyflame isn't a normal C++ program. It actually only uses the Python headers to get struct offsets, and does not link against libpython. So in principle you could come up with a way to build a "dual" Pyflame executable that can profile both Python 2 and Python 3 processes.
While it's an interesting thought experiment to think about how to build a Pyflame that can support both Python interpreter versions at once, it's not really that useful. Most people are either using Python 2 or Python 3, so just supporting one at compile time is not a big deal. People who need both Python versions can just compile two versions. So I had created an issue to remind myself to look into this, but I had considered it very low priority.
This changed recently when I decided to try
to
get Pyflame into Fedora,
and it occurred to me that if I actually did this crazy dual-interpreter mode it
would make my packaging life a lot easier. Instead of maintaining
python2-pyflame
and python3-pyflame
, I'd be able to just add a single
package. And since there's no linking dependency, I can support both Python
interpreters essentially for free. So off I went.
The Solution
There's two parts to solving this. The first is how the code is refactored to support two Python releases with minimal code duplication. The second part is how the build system (autoconf/automake) needed to be changed.
If you'd like to follow along with the changes, please see PR 42.
Code Changes
The code for Python 2 and Python 3 is 95% the same in my estimation. The struct offsets in Python 2 and Python 3 do differ, but other than that the only material change is how strings work in both releases, which is easy to work around with preprocessor macros, and which I had already done.
The solution I came up with here is to define a filed called frob.cc
which
implements all of the Python internals logic. This file includes Python.h
as
usual. It has the following compile-time logic:
- There's some logic for switching the string implementation depending on the Python interpreter
- If this file is being built for Python 2 it uses the namespace
pyflame::py2
, if the file is being built for Python 3 it uses the namespacepyflame::py3
There are two stub files that include frob.cc
: frob2.cc
includes it in a way
so it's configured to build for Python 2, frob3.cc
includes it in a way so
it's configured to build for Python 3. The file frob.cc
itself is never built
into an object, only frob{2,3}.cc
are actually compiled and linked.
There's another set of files called pyfrob.{cc.h}
that have the following
logic:
- The runtime detection for Python 2 vs Python 3
- The runtime logic for invoking routines from the right namespace based on the interpreter
The way I implemented this Pyflame will do all of this runtime logic just once when Pyflame starts up. Then while it's running it will invoke the interpreter-specific bits using function pointers. This is a pretty small optimization, but avoids additional runtime branching.
Most of the work here was actually refactoring the existing code to be consolidated into fewer files, and the new logic for detecting the Python version. I ended up touching most of the Pyflame codebase to get this to work. The preprocessor macros are pretty hairy in my opinion, but ended up working out fine.
Automake Changes
There's a lot of compilation logic that needs to change for this to work:
- The autoconf configure script needs to be able to detect multiple Python
- There are preprocessor defines that need to be propagated into
config.h
- Automake needs to know how to compile
frob{2,3}.cc
with different flags - The linking step needs to know what frob objects to link in
The hardest part of this was figuring out how to compile frob2.cc
and
frob3.cc
with different include paths. I found an automake documentation page
called
Per-Object Flags Emulation.
which is short, but does cover how to do this. I actually ended up bringing in
libtool (which is a compile-time dependency only) since it provides
some
convenience methods.
I also had to change a lot of logic in my configure.ac
so it would know how to
pick between the two. The current solution detects what Python releases are on
the system, and enables all of the supported ones. I'm not super happy with
this: in the future I'll probably revisit the code to allow building against
just one release or another.
Next Steps
Once I get my PR reviewed and landed I'm going to tag a major new version of
Pyflame, and then try to base my Fedora package submission on that. The next
major feature I'm adding to Pyflame will be
from issue 13. This issue describes
rewriting the code to not use _PyThreadState_Current
, and instead find the
global "interpreter" list and use that to find the threads. This will let me get
stack traces from idle threads which has a ton of really interesting use cases.