GCC Generates Strange Code

If you've been following my blog/twitter, you may have picked up on the fact that I'm doing a bunch of disassembly work, particular regarding the Python interpreter. I found something really interesting while doing this work.

Debian Jessie (as of this writing, the latest stable Debian release) ships with Python 2.7.9 compiled by GCC 4.9.2. Here's the start of the disassembly for PyObject_Malloc():

Dump of assembler code for function PyObject_Malloc:
   0x0000000000499750 <+0>:    test   %rdi,%rdi
   0x0000000000499753 <+3>:    js     0x4182be

This might look complicated if you don't know x86, but it's actually really simple. PyObject_Malloc() accepts a single argument---the number of bytes to allocate. These two instructions are checking if the argument passed in (via the %rdi register) is negative, and if so the code jumps to 0x4182be.

The actual C code in Python that implements this is pretty obfuscated, so I won't list it here (if you're really curious: look in pymem.h), but what's happening conceptually is the prototype for PyObject_Malloc() is like:

void* PyObject_Malloc(size_t nbytes);

The code checks to see if the argument you passed to PyObject_Malloc() looks like a negative value (technically, if your nbytes argument exceeds the size of (size_t)PY_SSIZE_T_MAX). If that's the case, PyObject_Malloc() will return NULL.

Ok, makes sense. These two lines of assembler are checking if the value is negative, and then jumping to 0x4182be. Let's look at what's going on at 0x4182be:

(gdb) disas 0x4182be,0x4182be+3
Dump of assembler code from 0x4182be to 0x4182c1:
   0x00000000004182be:    xor    %eax,%eax
   0x00000000004182c0:    retq

Again this is pretty simple. The first instruction sets %eax to zero, and the second instruction returns from the function. This is effectively the code that implements the C code:

return NULL;

But wait. What's going on here? The code makes sense. But why did we jump to 0x4182be? That's not even part of PyObject_Malloc()! I looked at the ELF sections and according to both objdump and nm this is some weird code that actually doesn't belong to any real method. The objdump utility lists the code at 0x4182be as belonging to the very first bytes in the .text area and attributes it to <PyDescr_NewMethod-0x20e0> which means that this isn't actually code in PyDescr_NewMethod(), it's code that comes before it, that doesn't belong to any function at all. In other words, GCC has generated a prologue of commonly used instructions that it can jump to.

In other words, GCC decided that there's a bunch of this return NULL business going on and instead of generating the code over and over again, it generates it just once, and then jumps to that code. Which kind of makes sense---this saves space, which is good---but it makes stack traces hard to understand and definitely makes the disassembly much harder to follow.

I recompiled Python 2.7.9 on a new machine that has GCC 5.3.1. It generates code that looks a lot more like what you'd expect. Here's what it looks like:

Dump of assembler code for function PyObject_Malloc:
   0x0000000000460f70 <+0>:    test   %rdi,%rdi
   0x0000000000460f73 <+3>:    js     0x4610d8 <PyObject_Malloc+360>

So now it's jumping to code within PyObject_Malloc(). The code there is like:

   0x00000000004610d8 <+360>:    xor    %eax,%eax
   0x00000000004610da <+362>:    retq

Just as we expect.

At first I was convinced that this was some weird thing that older GCC versions were doing. But, it turns out, that's not the case! When I compiled Python 2.7.9 from source on Jessie using GCC 4.9.2 I got essentially the same code as I did when compiling with GCC 5.3.1, i.e. without the weird jump to the .text prologue.

I also rebuilt the code with gcc -O3 on Jessie and once again, did not get the weird jump to the .text prologue, so it's not a matter of what optimization level you use.

I looked at the Debian patches, and I believe the compiler option that is causing this optimization is the use of -flto, which does link time optimization. However, when I try to compile Python myself with CFLAGS=-flto I get errors like this when Python invokes the ar command. I still have problems when exporting AR=gcc-ar which is the recommended workaround. I will update this post if I figure out how to build Python with -flto enabled to verify that this is the option which generates this code.