If you’ve ever look at the disassembly output for C or C++ code, you’ll probably
notice that there are a lot of push/pop instructions. And if you pay close
enough attention, you’ll notice that the compiler prefers to use certain
registers over others. In particular, compilers will prefer pushing “old”
registers like RBP (i.e. the ones available on 32-bit x86 CPUs) instead of the
“new” registers like R15 (which aren’t available in 32-bit mode).
The C calling convention on x86 systems specifies that callees need to save
certain registers. There are a few different names for these kinds of registers,
such as nonvolatile registers, callee-saved registers, and so on.
If a method needs to use some registers, it’s best to use the volatile
registers, since they don’t require additional push/pop instructions. However,
there are only a few volatile registers. If a method needs additional registers
it will have to dip in to the nonvolatile set. These nonvolatile registers must
be pushed on function entry, and popped on function exit. So that explains the
first part: why these registers are pushed/popped at all.
But what about the second part: why does the compiler prefer pushing/popping the
old registers RBX, RBP, RDI, RSI, and RSP over the new registers R12, R13, R14,
The answer lies in the historical legacy of how instruction encoding worked on
32-bit systems. First let’s look at a chart showing the different general
purpose registers, and their characteristics:
Since the designers of x86 knew that these registers were going to be
pushed/popped all the time, they wanted to try to make the push/pop instructions
really compact. So they reserved one-byte instruction encodings to push/pop
every register. This is pretty unusual: there aren’t too many instructions that
can be encoded with a single byte. The one-byte instruction encodings are only
used for the most common instructions.
To push a register, you take the number in the chart above and add it
to 0x50. So if you want to push RSP,
the instruction is 0x54, which is 0x50 + 4.
To pop a register, you take the number in the chart above and add it
to 0x58. So if you want to pop RSP, the
instruction is 0x5c, which is 0x58 + 4.
As you can see, they only reserved space for eight registers when pushing; the
same is true when popping. This makes sense, because at the time there were only
eight general purpose registers. However, this is a problem because no space was
reserved for the higher numbers.
When they designed the 64-bit versions of x86 they came up with a clever
solution for this problem, aimed at keeping backwards compatibility with 32-bit
systems. They added new prefix instructions to indicate that certain fields
should be the extended versions. The details of how this work
are a bit complicated, but for a push or
pop instruction the prefix 0x41 means
that the register should be considered the extended version, and the register
number is then subtracted by eight when encoding.
Here’s an example. Suppose we want to push R9. The first byte of the instruction
is 0x41. The second byte is 0x50 + (9 - 8) = 0x51. Thus the full encoding will
Suppose we want to pop R14. The encoding will be 0x41 followed by 0x58 + (14 -
8) = 0x5e. Thus the fully encoded instruction will be 0x415e.
As you can see, the original eight registers have more compact encodings: they
can each be pushed and popped with a single byte, whereas the new registers
require two bytes to push/pop. This applies to certain other instructions too,
not just push/pop. The actual time it takes to execute a push/pop is the same
either way, so there’s not any actual CPU cycles saved. But using smaller
instructions means a slightly smaller executable, means less data for the
decoding pipeline to process, and means that instructions are more likely to
stay in caches. So if you can, it’s better to use the old registers: you’ll save
a byte for each push/pop.