Earlier this year, a Pyflame user filed a GitHub issue reporting that Pyflame didn't work reliably with Python 3.6. Sometimes Pyflame would just print garbage output (e.g. bogus line numbers), and other times it would actually crash. Although multiple users confirmed the problem, I didn't have any issues at all with my local Python 3.6 installation. In July I spent some time working with Justin Phillips to identify the root cause of the issue; what we found was an interesting (and unexpected) consequence caused by a subtle change to Python's internal ABI.
PEP 523
In the original bug report, one of the Pyflame users suggested that the bug was
related to PEP 523, which changes
the internal implementation of Python "code" objects. The CPython interpreter
uses PyCodeObject
structs as containers that hold Python bytecode and bytecode
metadata. Pyflame needs to analyze these code objects to get stack traces. In
particular, code objects contain a table mapping bytecode offsets to line
numbers, and Pyflame reads these tables to compute line numbers in stack traces.
When I dug in, I became suspicious that this was really the issue. PEP 523 says
"One field is to be added to the PyCodeObject
struct", and the PEP then shows
the following change:
typedef struct {
...
void *co_extra; /* "Scratch space" for the code object. */
} PyCodeObject;
Adding a field to the end of a struct like this should be perfectly safe, since
Pyflame can safely ignore the co_extra
field at the end of the struct. These
code objects aren't embedded in other structs, so the memory layout of
everything should be unaffected by this change. Since I was confident that this
PEP was a red herring, I ignored it, and embarked on a long yak shaving exercise
to figure out what the real issue was.
Python Build Modes
As I mentioned earlier, I wasn't able to reproduce the issue locally. But I was using Fedora, and the bug reports were coming in from Ubuntu users. Naturally, I suspected the issue was related to the Debian packaging of Python. Pyflame already has a bunch of complicated logic to deal with the difference between how Python is compiled on Debian and Fedora, and this seemed a likely culprit to me.
When you compile Python from source, there's a flag you can pass to the
configure
script called --enable-shared
. The help documentation indicates
that this controls whether or not a shared library (libpython) should be built.
Actually, this option is a little more subtle. If you build Python with
--enable-shared
you get a small (about 8 KB) executable that links
against libpython. The entry point for the executable is a small stub that calls
Py_Main
, which is
defined in libpython and contains the actual logic for bootstrapping the
interpreter. By contrast, if you build without --enable-shared
, you get a
relatively large executable (about 4 MB) that has the Python implementation
statically linked in. Most users will never know about this distinction, but it's
very important for Pyflame, since this information is essential to find the
thread state information in memory (which is the starting point for extracting
stack information). Fedora builds Python using the --enable-shared
flag, and
Debian/Ubuntu do not use this flag.
This turned out to be a dead end. After spending many hours exploring this
hypothesis, I found out that the bug wasn't actually related to the
--enable-shared
flag: instead, the issue only manifested on systems that had
both Python 3.5 and Python 3.6 installed. The real reason that people were only
seeing the issue on Ubuntu was more subtle. At the time there wasn't a stable
release of Ubuntu that included Python 3.6. Instead, users were installing
Python 3.6 from a PPA. If you install Python 3.6 from a PPA, it will be
co-installed alongside the system Python 3.5 installation. Thus Ubuntu was
really a proxy for users who had multiple Python 3.x releases installed.
With this information, I had a pretty good idea of what the problem was: the issue was likely the result of the header files changing between Python 3.5 and 3.6. This would implicate PEP 523 as the true cause. This is what everyone had been telling me all along; I think the lesson here is not to overthink things early on.
Digging In Depeer
Since I had a hunch that the problem really was related to PEP 523, I guessed
that the offending header file was code.h
. Sure enough, when I diffed code.h
from Python 3.5 and 3.6, I saw the following:
typedef struct {
PyObject_HEAD
@@ -15,26 +25,29 @@
int co_nlocals; /* #local variables */
int co_stacksize; /* #entries needed for evaluation stack */
int co_flags; /* CO_..., see below */
+ int co_firstlineno; /* first source line number */
PyObject *co_code; /* instruction opcodes */
PyObject *co_consts; /* list (constants used) */
PyObject *co_names; /* list of strings (names used) */
PyObject *co_varnames; /* tuple of strings (local variable names) */
PyObject *co_freevars; /* tuple of strings (free variable names) */
PyObject *co_cellvars; /* tuple of strings (cell variable names) */
- /* The rest aren't used in either hash or comparisons, except for
- co_name (used in both) and co_firstlineno (used only in
- comparisons). This is done to preserve the name and line number
+ /* The rest aren't used in either hash or comparisons, except for co_name,
+ used in both. This is done to preserve the name and line number
for tracebacks and debuggers; otherwise, constant de-duplication
would collapse identical functions/lambdas defined on different lines.
*/
unsigned char *co_cell2arg; /* Maps cell vars which are arguments. */
PyObject *co_filename; /* unicode (where it was loaded from) */
PyObject *co_name; /* unicode (name, for reference) */
- int co_firstlineno; /* first source line number */
PyObject *co_lnotab; /* string (encoding addr<->lineno mapping) See
Objects/lnotab_notes.txt for details. */
void *co_zombieframe; /* for optimization only (see frameobject.c) */
PyObject *co_weakreflist; /* to support weakrefs to code objects */
+ /* Scratch space for extra data relating to the code object.
+ Type is a void* to keep the format private in codeobject.c to force
+ people to go through the proper APIs. */
+ void *co_extra;
} PyCodeObject;
As we expect, there's a new field at the end of the struct. But there's also
another change: the offset of the co_firstlineno
field has changed! This field
gives the line number for the first bytcode instruction in the code object, and
is read by Pyflame when extracting stack traces.
The actual error message people were reporting was:
Failed to PTRACE_PEEKDATA at 0x10: Input/output error
This is saying that Pyflame failed to read memory address 0x10 in the traced
process. It's essentially a
ptrace version of a null
pointer dereference: Pyflame has a null pointer to a struct, and is trying to
access a field in the struct 16 bytes (0x10) from the start of the struct. The
reason this happens is that Pyflame doesn't just access the co_firstlineno
field: it also has to access the co_filename
and co_name
fields from the
PyCodeObject
as well. The rearrangement of co_firstlineno
moves these fields
around, meaning that if Pyflame is using the Python 3.5 offset for a Python 3.6
process, it will read some bogus value from memory. In this case the bogus value
happens to be zero, which is what makes this similar to a null pointer.
You might wonder how this mix-up can happen in the first place, since the compiler and linker should both use the same version of Python. Pyflame is written in an unusual way: it includes the Python header files, but it doesn't actually link against libpython. Therefore there's no safety checks in place, and things can go off the rails in cases like this one.
Autoconf and pkg-config
The issue is partially the result of how the build system for Pyflame was
configured. When I started the project, I decided to use GNU autotools (i.e.
autoconf and automake), since it's a de facto C build standard on Linux, and
makes .deb
and .rpm
packaging simple. I also used
pkg-config, since it
simplifies writing autoconf scripts. There's a notoriously
harmful
autoconf macro called
PKG_CHECK_MODULES
that provides a simple (but easy to misuse) way of gluing the two systems
together. The Python 3 logic in my configure.ac
file looked like this:
PKG_CHECK_MODULES([PY3], [python3], [enable_py3="yes"], [AC_MSG_WARN([Building without Python 3 support])])
AM_CONDITIONAL([ENABLE_PY3], [test x"$enable_py3" = xyes])
AM_COND_IF([ENABLE_PY3], [AC_DEFINE([ENABLE_PY3], [1], [Python3 is enabled])])
This essentially asks pkg-config
which CFLAGS
are needed when building code
against python3
. This is just fine if you only have one Python 3 installed on
a system, since in that case python3
is unambiguous. If you have two versions
of Python 3 installed, pkg-config will just pick one of them---in this case
Python 3.5. Whoops.
Conclusion
Justin and I worked out some logic in the autoconf/automake code to handle different Python 3 releases. We also added some runtime logic to Pyflame to detect which Python 3.x version it's profiling, and switch the internal stack decoding implementation as a result. Most of that work is in commit d9bda7.
Python code objects don't necessarily need to have a stable ABI, because they normally only exist in the memory of Python processes. They aren't persisted to disk or serialized over network sockets, so it should be "safe" to change the layout (and hence the ABI) of these structs. Pyflame is an unusual application though: it needs to know the details of the memory layout of the process it's tracing, and thus it's sensitive to "safe" ABI changes like this one.
An alternative way to write Pyflame would be to have the build system produce
multiple executables, each linked against a different Python version. This is
how most C extensions are packaged on Linux. For instance, if you install
NumPy on a Debian system, you choose between
python-numpy
(built for Python 2.x) or python3-numpy
(built for the system
Python 3.x). The downside of this approach is that you need a Pyflame executable
(e.g. /usr/bin/pyflame2.
, /usr/bin/pyflame3.5
, etc.) per version of Python
you have on the system. Originally I thought building multiple executables was
less elegant, but now I'm not so sure. Building different executables would
simplify a lot of logic, and would completely insulate against these kinds of
ABI changes. On the other hand, the fact that a single Pyflame executable can
profile multiple Python versions is kind of interesting. I'll be paying closer
attention to these ABI changes in future Python versions, and make a decision on
how to proceed if these ABI changes get more frequent or more complicated.