The Curious Case of Position Independent Executables

Most Unix systems, including Linux, use the ELF format for executables and object files. Normally the details of ELF files are invisible to developers, but certain tasks can call for one to peer into their inscrutable depths. One reason that you might need to parse ELF files is when trying to find symbols in another process using the ptrace(2) system call. In particular, by resolving the symbols in the ELF file for a remote process you can do things like figure out which symbols are available in the remote process and where in memory they can actually be found.

Another reason you might explore this route is by doing hacky things with ELF executables, which I intend to describe in this article. While going down this road I learned some interesting and poorly-documented things that I hope to shed some light on for other developers out there.

Quick Aside: Parsing ELF Files

ELF files are composed of a bunch of sections and headers that can be expressed as C structs. That means that you can use something like mmap(2) to directly map the on-disk representation of an ELF file into C data structures. The ELF specification goes into great detail about how all of these structures work. As it turns out, GNU libc (a.k.a. glibc) ships with a header file called elf.h that contains the C struct definitions for you.

GNU libc is licensed under the LGPL which means that if you aren't making modifications to it, you can dynamically link against it in your applications without having it affect the licensing terms of your own code. This means you can use elf.h (and the rest of glibc) freely in your own code. However, you may find that this approach is rather low level, and if you do take this approach you will have to learn a lot of the intricacies of the ELF format to find your way around the various sections and tables.

The details of elf.h are documented in elf(5), meaning that you can read the documentation with the invocation man 5 elf.

If you have the option, I strongly encourage you to instead look at GNU BFD, the obscurely named GNU "Binary File Descriptor" library. BFD provides the basis for GNU binutils, the standard command-line utilities for working with object files. In particular, GNU BFD is the library that is actually used by such tools as ld (the GNU linker), as (the GNU assembler), gdb (the GNU debugger), and other tools you may have heard of like nm, objdump, and readelf. This means that you're using the exact same library that is actually used by the linker/debugger on your machine. Additionally, BFD provides a high-level abstraction so you can just open a file and do things like get symbols without worrying about the low-level details of the ELF format.

There is one catch with BFD, which is that it is licensed under the terms of the GPL. This means that if you use GNU BFD and you want to distribute your work you will have to license your own code under the GPL, which is not the case if you include elf.h when building your application.

ELF File Types

In the header for every ELF file there's a field called e_type which indicates what type of ELF file it is. The ones you should expect to see (besides ET_NONE) are:

Symbol Tables

Within the ELF file there will be a number of "sections", and among the section types there's a type that elf.h calls SHT_SYMTAB which holds the symbol tables. An ELF file can have more than one symbol table.

By convention if there is a symbol table called .dynsym it holds relocatable symbols. Relocatable symbols are symbols built in such a way that they can be relocated to arbitrary virtual memory addresses. Typically if you look at a .so shared object file you'll see that all of the visible symbols are put into .dynsym.

By convention if there is a symbol table called .symtab it holds non-relocatable or non-allocatable symbols.

One of the fields in the ELF header holds the "entry point" for the file, which is the virtual memory address that the system will transfer control to when starting an executable. That is, the entry point holds the virtual memory address for the code that bootstraps start up of the process. You can think of this as your main() function although in fact the compiler will generate some stub code that gets invoked before main().

When you create an executable all of the code that you write will typically be put into non-relocatable addresses. The way this works is the linker decides that it's going to put the entry point at an address like 0x400410 and then other symbols will be put at nearby memory locations. So you might end up with your function foo at 0x400800 and your function bar at 0x400896. If foo calls bar the static linker can emit machine code that literally loads the memory at address 0x400896 which is simple and fast.

When you create a shared library you don't know up front what memory addresses will be available to you. Your library might want to put a symbol at 0x400800, but if the program loading your library already has the memory address mapped then things won't work right. You can imagine what would happen---you'd end up with either a situation where the library jumped into arbitrary positions in the program, or the program would jump into arbitrary positions in the library code. Either case would lead to an unpredictable program that would quickly crash with a segfault or illegal instruction. There are different techniques to solve this problem, but the short version is that when you have a shared library the generated machine code won't use hard-coded memory addresses. Instead, the generated machine code will be such that it looks up the real address of the target function at runtime. It is the job of the dynamic linker to make sure that when these symbols are loaded into memory that everything is set up properly, so any stub code will have the correct addresses.

There is some overhead for relocating code like this, so in general it's faster to use hard-coded addresses, but this technique cannot be applied to shared objects.

Things like debugging symbols can also be put into the symbol tables. Since debugging symbols do not need to be mapped into virtual memory at runtime these symbols are called non-allocatable.

To recap:

When the dynamic linker loads a shared object it will only look at the .dynsym table. The same is true of dlopen(3), which will only find symbols that are defined in the .dynsym table.

Position Independent Executables

You can use the -pie or -PIE flags to GCC to create what is called a "position independent executable" (a.k.a. PIE). When you do this GCC will only generate relocatable code. There is still an entry point for the program with a hard-coded address, but all the entry point does is set things up to run the relocatable code.

The main use case for this is ASLR hardening. When a PIE ASLR binary starts up the kernel picks a random virtual memory address to load all code other than the entry point stub at. This makes it harder to exploit a large class of security vulnerabilities common to C/C++ programs. Most Linux distributions do not compile typically binaries with this option because there is real, measurable overhead to invoking relocatable functions. Distributions like Debian and Ubuntu only compile particularly security sensitive binaries (e.g. ssh) as PIEs. (Traditional non-PIE binaries will still use ASLR on Linux, but only for loading dynamic libraries).

There is another interesting use case here though. As I mentioned earlier, the dynamic linker and dlopen(3) can find symbols in the .dynsym table but not symbols in the .symtab table. However, by default executables don't put their symbols into .dynsym. If an executable is created as a PIE the linker has the option to put the symbols into .dynsym table. If this is done then the symbols will be available both to the executable as well as to the dynamic linker and dlopen.

By default GNU ld will not put symbols into .dynsym for PIEs, even though the symbols are relocatable. However, by invoking ld -E you can ask the linker to export the symbols as dynamic symbols. This doesn't change the generated code, it simply adds the symbols to the .dynsym table which takes up a small amount of additional disk space. If you want to export only certain symbols you can use the --dynamic-list option to control the exported symbols.

By doing this you can create an ELF executable that can be both run on the command line as well as loaded dynamically by dlopen(3). There are a lot of strange things you can use this for. For instance, I am using this technique to create an executable that is also a Python module. This is mostly for fun---I could just as easily set up the build system to create an executable and a shared object separately---but I think it's pretty neat.

I have put a simple example demonstrating the concept on GitHub. If you modify the Makefile to not use -Wl,-E you will see that the dl program will fail to load the symbols.