If you've done any weird low-level ELF debugging, you're probably familiar with
the tools nm
, objdump
, and readelf
; and perhaps some others I don't know
of.
Well, what if you want to read the symbol table for an ELF executable or library
programmatically? Linux systems come with a header called <elf.h>
, and there's
this whole man 5 elf thing that explains in
obtuse terms how to use <elf.h>
to decode an ELF executable.
In practice, I found it pretty difficult to figure out how to decode the symbol
table. My goal was to decode the symbol table for a "statically" built
/usr/bin/python
(which is the default way that Debian/Ubuntu compile python).
From readelf
I saw:
$ readelf -a $(which python) | grep PyObject_Malloc
619: 0000000000499750 539 FUNC GLOBAL DEFAULT 13 PyObject_Malloc
So I already knew that PyObject_Malloc
was to be found at offset 619 in the
symbol table, and that it should be loaded into memory by /usr/bin/python
at
0x499750.
I wrote a program that can actually decode $(which python)
and give the same
output. In particular, I see:
$ ./parse_elf $(which python)
...
SYMBOL TABLE ENTRY 619
st_name = 18442 (PyObject_Malloc)
st_info = 18
st_other = 0
st_shndx = 13
st_value = 0x499750
st_size = 539
...
Which matches in all of the relevant fields: you can see that it correctly finds
PyObject_Malloc
at symbol table entry 619, that st_shndx
(which holds the
symbol table program header entry) is 13, and that the size of the object code
is correctly identified as 539 bytes.
You can find the code on GitHub at eklitzke/parse-elf. Again, this isn't code I'm super proud of, but if you're in a similar situation to me this should at least help you figure out how to decode all of the relative offsets which should be enough to get you started.