Parsing ELF symbol tables

February 9, 2016

If you've done any weird low-level ELF debugging, you're probably familiar with the tools nm, objdump, and readelf; and perhaps some others I don't know of.

Well, what if you want to read the symbol table for an ELF executable or library programmatically? Linux systems come with a header called <elf.h>, and there's this whole man 5 elf thing that explains in obtuse terms how to use <elf.h> to decode an ELF executable.

In practice, I found it pretty difficult to figure out how to decode the symbol table. My goal was to decode the symbol table for a "statically" built /usr/bin/python (which is the default way that Debian/Ubuntu compile python).

From readelf I saw:

$ readelf -a $(which python) | grep PyObject_Malloc
619: 0000000000499750   539 FUNC    GLOBAL DEFAULT   13 PyObject_Malloc

So I already knew that PyObject_Malloc was to be found at offset 619 in the symbol table, and that it should be loaded into memory by /usr/bin/python at 0x499750.

I wrote a program that can actually decode $(which python) and give the same output. In particular, I see:

$ ./parse_elf $(which python)
...
SYMBOL TABLE ENTRY 619
st_name = 18442 (PyObject_Malloc)
st_info = 18
st_other = 0
st_shndx = 13
st_value = 0x499750
st_size = 539
...

Which matches in all of the relevant fields: you can see that it correctly finds PyObject_Malloc at symbol table entry 619, that st_shndx (which holds the symbol table program header entry) is 13, and that the size of the object code is correctly identified as 539 bytes.

You can find the code on GitHub at eklitzke/parse-elf. Again, this isn't code I'm super proud of, but if you're in a similar situation to me this should at least help you figure out how to decode all of the relative offsets which should be enough to get you started.