An Unexpected Python ABI Change

Earlier this year, a Pyflame user filed a GitHub issue reporting that Pyflame didn't work reliably with Python 3.6. Sometimes Pyflame would just print garbage output (e.g. bogus line numbers), and other times it would actually crash. Although multiple users confirmed the problem, I didn't have any issues at all with my local Python 3.6 installation. In July I spent some time working with Justin Phillips to identify the root cause of the issue; what we found was an interesting (and unexpected) consequence caused by a subtle change to Python's internal ABI.

PEP 523

In the original bug report, one of the Pyflame users suggested that the bug was related to PEP 523, which changes the internal implementation of Python "code" objects. The CPython interpreter uses PyCodeObject structs as containers that hold Python bytecode and bytecode metadata. Pyflame needs to analyze these code objects to get stack traces. In particular, code objects contain a table mapping bytecode offsets to line numbers, and Pyflame reads these tables to compute line numbers in stack traces.

When I dug in, I became suspicious that this was really the issue. PEP 523 says "One field is to be added to the PyCodeObject struct", and the PEP then shows the following change:

typedef struct {
   ...
   void *co_extra;  /* "Scratch space" for the code object. */
} PyCodeObject;

Adding a field to the end of a struct like this should be perfectly safe, since Pyflame can safely ignore the co_extra field at the end of the struct. These code objects aren't embedded in other structs, so the memory layout of everything should be unaffected by this change. Since I was confident that this PEP was a red herring, I ignored it, and embarked on a long yak shaving exercise to figure out what the real issue was.

Python Build Modes

As I mentioned earlier, I wasn't able to reproduce the issue locally. But I was using Fedora, and the bug reports were coming in from Ubuntu users. Naturally, I suspected the issue was related to the Debian packaging of Python. Pyflame already has a bunch of complicated logic to deal with the difference between how Python is compiled on Debian and Fedora, and this seemed a likely culprit to me.

When you compile Python from source, there's a flag you can pass to the configure script called --enable-shared. The help documentation indicates that this controls whether or not a shared library (libpython) should be built. Actually, this option is a little more subtle. If you build Python with --enable-shared you get a small (about 8 KB) executable that links against libpython. The entry point for the executable is a small stub that calls Py_Main, which is defined in libpython and contains the actual logic for bootstrapping the interpreter. By contrast, if you build without --enable-shared, you get a relatively large executable (about 4 MB) that has the Python implementation statically linked in. Most users will never know about this distinction, but it's very important for Pyflame, since this information is essential to find the thread state information in memory (which is the starting point for extracting stack information). Fedora builds Python using the --enable-shared flag, and Debian/Ubuntu do not use this flag.

This turned out to be a dead end. After spending many hours exploring this hypothesis, I found out that the bug wasn't actually related to the --enable-shared flag: instead, the issue only manifested on systems that had both Python 3.5 and Python 3.6 installed. The real reason that people were only seeing the issue on Ubuntu was more subtle. At the time there wasn't a stable release of Ubuntu that included Python 3.6. Instead, users were installing Python 3.6 from a PPA. If you install Python 3.6 from a PPA, it will be co-installed alongside the system Python 3.5 installation. Thus Ubuntu was really a proxy for users who had multiple Python 3.x releases installed.

With this information, I had a pretty good idea of what the problem was: the issue was likely the result of the header files changing between Python 3.5 and 3.6. This would implicate PEP 523 as the true cause. This is what everyone had been telling me all along; I think the lesson here is not to overthink things early on.

Digging In Depeer

Since I had a hunch that the problem really was related to PEP 523, I guessed that the offending header file was code.h. Sure enough, when I diffed code.h from Python 3.5 and 3.6, I saw the following:

 typedef struct {
     PyObject_HEAD
@@ -15,26 +25,29 @@
     int co_nlocals;		/* #local variables */
     int co_stacksize;		/* #entries needed for evaluation stack */
     int co_flags;		/* CO_..., see below */
+    int co_firstlineno;   /* first source line number */
     PyObject *co_code;		/* instruction opcodes */
     PyObject *co_consts;	/* list (constants used) */
     PyObject *co_names;		/* list of strings (names used) */
     PyObject *co_varnames;	/* tuple of strings (local variable names) */
     PyObject *co_freevars;	/* tuple of strings (free variable names) */
     PyObject *co_cellvars;      /* tuple of strings (cell variable names) */
-    /* The rest aren't used in either hash or comparisons, except for
-       co_name (used in both) and co_firstlineno (used only in
-       comparisons).  This is done to preserve the name and line number
+    /* The rest aren't used in either hash or comparisons, except for co_name,
+       used in both. This is done to preserve the name and line number
        for tracebacks and debuggers; otherwise, constant de-duplication
        would collapse identical functions/lambdas defined on different lines.
     */
     unsigned char *co_cell2arg; /* Maps cell vars which are arguments. */
     PyObject *co_filename;	/* unicode (where it was loaded from) */
     PyObject *co_name;		/* unicode (name, for reference) */
-    int co_firstlineno;		/* first source line number */
     PyObject *co_lnotab;	/* string (encoding addr<->lineno mapping) See
                   Objects/lnotab_notes.txt for details. */
     void *co_zombieframe;     /* for optimization only (see frameobject.c) */
     PyObject *co_weakreflist;   /* to support weakrefs to code objects */
+    /* Scratch space for extra data relating to the code object.
+       Type is a void* to keep the format private in codeobject.c to force
+       people to go through the proper APIs. */
+    void *co_extra;
 } PyCodeObject;

As we expect, there's a new field at the end of the struct. But there's also another change: the offset of the co_firstlineno field has changed! This field gives the line number for the first bytcode instruction in the code object, and is read by Pyflame when extracting stack traces.

The actual error message people were reporting was:

Failed to PTRACE_PEEKDATA at 0x10: Input/output error

This is saying that Pyflame failed to read memory address 0x10 in the traced process. It's essentially a ptrace version of a null pointer dereference: Pyflame has a null pointer to a struct, and is trying to access a field in the struct 16 bytes (0x10) from the start of the struct. The reason this happens is that Pyflame doesn't just access the co_firstlineno field: it also has to access the co_filename and co_name fields from the PyCodeObject as well. The rearrangement of co_firstlineno moves these fields around, meaning that if Pyflame is using the Python 3.5 offset for a Python 3.6 process, it will read some bogus value from memory. In this case the bogus value happens to be zero, which is what makes this similar to a null pointer.

You might wonder how this mix-up can happen in the first place, since the compiler and linker should both use the same version of Python. Pyflame is written in an unusual way: it includes the Python header files, but it doesn't actually link against libpython. Therefore there's no safety checks in place, and things can go off the rails in cases like this one.

Autoconf and pkg-config

The issue is partially the result of how the build system for Pyflame was configured. When I started the project, I decided to use GNU autotools (i.e. autoconf and automake), since it's a de facto C build standard on Linux, and makes .deb and .rpm packaging simple. I also used pkg-config, since it simplifies writing autoconf scripts. There's a notoriously harmful autoconf macro called PKG_CHECK_MODULES that provides a simple (but easy to misuse) way of gluing the two systems together. The Python 3 logic in my configure.ac file looked like this:

PKG_CHECK_MODULES([PY3], [python3], [enable_py3="yes"], [AC_MSG_WARN([Building without Python 3 support])])
AM_CONDITIONAL([ENABLE_PY3], [test x"$enable_py3" = xyes])
AM_COND_IF([ENABLE_PY3], [AC_DEFINE([ENABLE_PY3], [1], [Python3 is enabled])])

This essentially asks pkg-config which CFLAGS are needed when building code against python3. This is just fine if you only have one Python 3 installed on a system, since in that case python3 is unambiguous. If you have two versions of Python 3 installed, pkg-config will just pick one of them---in this case Python 3.5. Whoops.

Conclusion

Justin and I worked out some logic in the autoconf/automake code to handle different Python 3 releases. We also added some runtime logic to Pyflame to detect which Python 3.x version it's profiling, and switch the internal stack decoding implementation as a result. Most of that work is in commit d9bda7.

Python code objects don't necessarily need to have a stable ABI, because they normally only exist in the memory of Python processes. They aren't persisted to disk or serialized over network sockets, so it should be "safe" to change the layout (and hence the ABI) of these structs. Pyflame is an unusual application though: it needs to know the details of the memory layout of the process it's tracing, and thus it's sensitive to "safe" ABI changes like this one.

An alternative way to write Pyflame would be to have the build system produce multiple executables, each linked against a different Python version. This is how most C extensions are packaged on Linux. For instance, if you install NumPy on a Debian system, you choose between python-numpy (built for Python 2.x) or python3-numpy (built for the system Python 3.x). The downside of this approach is that you need a Pyflame executable (e.g. /usr/bin/pyflame2., /usr/bin/pyflame3.5, etc.) per version of Python you have on the system. Originally I thought building multiple executables was less elegant, but now I'm not so sure. Building different executables would simplify a lot of logic, and would completely insulate against these kinds of ABI changes. On the other hand, the fact that a single Pyflame executable can profile multiple Python versions is kind of interesting. I'll be paying closer attention to these ABI changes in future Python versions, and make a decision on how to proceed if these ABI changes get more frequent or more complicated.