Until recently the exact model of how ASLR and other memory protection mechanisms work on Linux was something that I knew only at a high level. Recently I've done a lot of work where I've had the need to bypass these mechanisms (in a cooperative setting), and I want to explain for readers exactly how the memory protection model on x86/Linux systems works, what it protects against, and the ways it can be bypassed.
There are two major mechanisms in place to protect memory access that are turned on by default on most x86-64 Linux systems. The first is the so-called NX bit which is a setting that gives finer-grained permissions to mapped memory regions. The second is address space layout randomization (ASLR) which randomizes where certain parts of a program are loaded into memory. I'll discuss these two topics separately since they are complementary but completely orthogonal to one another.
The NX Bit
Traditional computer architectures that implemented memory protection mechanisms typically had two states that a mapped memory region could be in: read-only or writable. The exact details of the memory protection capabilities depends on the CPU architecture, but what I described applies to most 32-bit x86 systems. In addition to this read/write toggle the system would also implement some other basic memory protections such as ensuring that different processes cannot read or write to each other's memory regions.
The way this system is implemented is that each process has a sparse data structure allocated in the kernel called a page table. The page table contains a mapping from virtual memory regions to the physical memory associated (or to an offset in the swap area if the data has been paged out to disk). For each page in the page table there's also a read/write bit that is used by the MMU to enforce permissions for areas that ought to be read-only. Memory protection between processes occurs via the fact that the kernel will arrange for the MMU to use a given process' page table when that process is scheduled to run, meaning that in the absence of kernel bugs a process can't enter a state where it is using another process' page table.
There are a few different use cases for read-only memory regions, but probably the most important use case is read-only text segments. A text segment is the part of a process' memory that holds the actual machine code for the process. The reason it's desirable to map this area read-only is that it helps to mitigate attacks that attempt to inject malicious code into a process. If an attacker can inject malicious code into a process' code area then they can execute arbitrary code with the permissions of the process itself. In many cases these permissions are considerable, even if the process is not running as root. For instance, an attacker that can inject code can run arbitrary filesystem operations as the process' effective UID. When the text area is read-only one can be confident that the code in the text area that is executed has not been tampered with.
A big limitation here is that while we might think of memory as either belonging to read-only code and writable data, in fact some information about what code is executing is stored in the data area. In particular the calling convention on x86 works by storing the return address for a function call on the stack, and the stack must be mapped with write permissions. Thus an attacker that can stomp on memory on the stack can alter the return address for a function. This is an extremely well known problem and is known as a stack buffer overflow. Typically when people talk about "buffer overflows" they're specifically referring to stack buffer overflows, since stack overflows are usually the most dangerous kind of buffer overflow (although not the only kind).
On classic 32-bit x86 systems if one can overflow a buffer allocated on the stack then they can actually arrange for arbitray code to run. The way this works is an attacker will overflow the buffer with malicious x86 code and then arrange for the return address on the stack to have the function return to the malicious code. Thus in this setting any stack buffer overflow can easily be turned into arbitrary code execution.
The NX bit, which is present on all 64-bit x86 systems (and some later 32-bit x86 systems), is an important measure which helps mitigate this attack. Systems that implement this functionality have three permissions bits: read, write, and execute. These permissions work exactly the same as the filesystem read/write/execute bits. What happens on these systems is that text areas are mapped to be readable + executable and data areas are mapped to be readable + writable but not executable. It's also possible to map data areas to be just readable.
With the NX bit the simplest and most dangerous stack buffer overflow attack is
prevented because any x86 code injected into the stack cannot be executed. An
attacker can still arrange for an arbitrary return address to be used as the
function return address, but unless that return address actually points to a
valid offset in the text area the process will segfault (i.e. terminate with
SIGSEGV
). The process can also terminate with SIGILL
if an invalid offset in
the text area is used, since any offset that isn't aligned at an actual
instruction offset will likely become an invalid sequence quickly.
The most dangerous attack that is still possible with stack buffer overflows when the NX bit is enabled is called a return-to-libc attack. The way this works is that if the conditions are just right an attacker can arrange for a function's return address to be the address of a sequence of instructions in libc that will cause the program to do something dangerous (e.g. exec a shell). The reason this attack is dangerous is because libc has helper routines for nearly every system call, plus numerous other high level functions that can be rather powerful. The reason this attack is much more difficult to execute than a typical stack buffer overflow is that not only must the attacker know exactly what address in libc to return to, the attacker must also arrange for the registers and return instruction sequence to be just right so the libc code is in the correct state to actually execute the malicious instructions.
Address Space Layout Randomization (ASLR)
ASLR is a mechanism that is technically complementary to the NX bit, but is made much more powerful when the NX bit is enabled. The reason is that ASLR is designed precisely to make return-to-libc (or return to any other shared library) much more difficult to execute.
The way that ASLR works is that when shared libraries are mapped into a process' memory they are loaded at randomized locations. For instance:
$ for x in {1..5}; do grep 'r-xp .*/libc' /proc/self/maps; done
7f2f5469b000-7f2f5484a000 r-xp 00000000 fe:00 31376 /lib64/libc-2.19.so
7fba4ce49000-7fba4cff8000 r-xp 00000000 fe:00 31376 /lib64/libc-2.19.so
7f063b82c000-7f063b9db000 r-xp 00000000 fe:00 31376 /lib64/libc-2.19.so
7f45cbbd9000-7f45cbd88000 r-xp 00000000 fe:00 31376 /lib64/libc-2.19.so
7f1c25f74000-7f1c26123000 r-xp 00000000 fe:00 31376 /lib64/libc-2.19.so
The leftmost hex sequence in this output shows the offset that libc is loaded at
for five different random bash instances. As you can see, in each invocation
libc is loaded at a different memory offset. This means that the address of a
given method in libc (e.g. the wrapper for unlink(2)
) will differ in every
process invocation. If an attacker wants to execute a method like unlink(2)
they can't easily know exactly where the code for unlink(2)
will actually be
loaded in memory.
Another way to make this more apparent is to write a C program that prints out the address of a libc function each time it's run. Consider the following program listing:
#include <dlfcn.h>
#include <stdio.h>
// must use the exact version for libc!
const char libc_name[] = "libc-2.19.so";
int main(int argc, char **argv) {
// find the libc that is already loaded in memory
void *handle = dlopen(libc_name, RTLD_LAZY | RTLD_NOLOAD);
if (handle == NULL) {
printf("failed to find libc in memory!\n");
return 1;
}
// locate the unlink(2) wrapper
printf("unlink(2) is loaded at %p\n", dlsym(handle, "unlink"));
return 0;
}
You can compile this with an invocation like gcc -ldl unlink.c
. On a sequence
of five different invocations I get five different addresses for unlink(2)
:
$ for x in {1..5}; do ./a.out; done
unlink(2) is loaded at 0x7f1d5501e540
unlink(2) is loaded at 0x7f384fbfe540
unlink(2) is loaded at 0x7f2c8f5de540
unlink(2) is loaded at 0x7f5669055540
unlink(2) is loaded at 0x7f8efd897540
Bypassing Memory Protection
Both of the protection mechanisms I just mentioned can be bypassed in a number of different ways, particularly in the "cooperative" case where the program author is intentionally trying to bypass the protections.
The NX bit can by bypassed by a caller using the mprotect(2)
system call. The
prototype for that function looks like this:
int mprotect(void *addr, size_t len, int prot);
In this function the prot
field consists of zero or more of the
read/write/execute bits masked together. After making this system call the memory
region specified will be remapped with the new permissions in this prot
field.
One fun thing that a caller can do is to set the text regions for a process to be writable. If this is done it completely bypasses the read-only state that code areas are usually mapped in. The caller can also map data regions (such as the stack) to be executable, which also effectively bypasses the usual protection mechanism. To actually do either of these things a caller must know the exact address they want to change the permissions for. Doing this is somewhat hampered by the fact that many parts of memory live at randomized offsets due to ASLR.
Since ASLR is the main protection mechanism that prevents an attack from
trivially remapping memory regions with mprotect(2)
the rest of this section
will discuss ways that ASLR can be bypassed.
As I demonstrated briefly earlier, a process can see its exact memory layout by
looking at /proc/self/maps
. This file is the easiest way to work around ASLR
protections. Since the file has a well known name and is an easily parseable
format accessing it makes it easy to know exactly where everything is loaded and
what permissions are enforced on each memory region. In a "cooperative" setting
where you are intentionally reading your own maps file it just takes a couple of
lines of C or C++ code to find things and remap them.
In practice it would be difficult for an attacker to use this file because it
would require actually injecting code to open/parse/interpret the file. Normally
the NX + ASLR mechanisms themselves ought to be sufficient to protect against
this. In some cases an attacker might be able to use another process running as
the same UID to parse a given process' maps file, but even that seems far
fetched since the attacker still needs another sidechannel to get the data into
the process. There are also various ways to isolate the process directories in
/proc
to protect against this kind of sidechannel attack.
Processing the maps
file in /proc
is not the only way to bypass ASLR.
Processes can also locate functions in memory by looking at the function symbol
tables. The demonstration C file I listed shows how to do this with dlopen(3)
and dlsym(3)
which makes things easy since libdl
implements the actual low
level bits that know how to do this function symbol parsing. However it's worth
noting that libdl
isn't doing anything magic here, and in the usage I showed
it does not look at /proc/self/maps
.
The way libdl
works with glibc is that it contains the code that knows how to
parse an actual ELF file. When you call dlopen(3)
and dlsym(3)
in the manner
I demonstrated, libdl
will locate libc-2.19.so
by searching the library
search path and then it will parse the symbol table to see what offset the
unlink(2)
method is located at. It can then use this to find the method in the
process' symbol table. Therefore it is possible in principle for an attacker to
use this kind of mechanism even if an executable does not link against libdl
.
Again, in an actual malicious scenario it is unlikely that an attacker could
actually accomplish this since the logic to parse an ELF file and the symbol
table is considerable and would have to be injected into a process.
There is another mechanism that is not well understood by many people; this is
also what I think would be the most likely way that an attacker would try to
bypass ASLR in real scenarios. For typical executables only shared libraries
have their locations randomized. The text area for the executable itself will
not use randomized memory locations. Therefore if an attacker has a copy of an
executable, they can analyze the executable's code and determine effective
places to return to. For instance, suppose an attacker wants to unlink a file.
If the executable itself has code that calls unlink(2)
then instead of trying
to actually return to unlink(2)
, the attacker can instead return to the code
that calls unlink(2)
, which will exist at a well known location. For many
attacks this is still difficult since the attacker has to get registers and/or
memory into the right state to pass the right arguments to the target function.
Still, this is an effective way to bypass the randomization element that ASLR
introduces. In some cases one could also use this mechanism even if the
executable doesn't actually call the intended C function: the attacker needs
only to find a sequence of executable bytes that can be interpreted as the
desired function call. Thus one can intentionally jump to an invalid byte offset
can cause a crash with SIGILL
if one knows that right before the illegal
instructions are hit there will be a brief instruction run that can be parsed in
the desired way.
If you are worried about this last attack you can compile with the -PIE
flag,
which stands for position independent executable. This causes all of an
executable's symbols to be loaded at randomized locations using ASLR. The
executable will still have a fixed, well-known entry point, but this entry point
will just arrange for the C runtime (a.k.a. CRT) initialization functions to
load and from there everything will have an unpredictable memory address.
Compiling with -PIE
does have non-negligible overhead, particularly on 32-bit
systems, but it's indispensable for security-sensitive executables such as
ssh
.
One final way to bypass memory protections, which I will mention just for
completeness, is by using the ptrace(2)
system call. This system call lets a
remote process read and write a remote process' memory without restriction (i.e.
it is not necessary to mprotect(2)
anything). This system call is what GDB
uses to probe remote processes. The ptrace(2)
system call is very powerful,
but because it's so powerful its usage is typically quite restricted. At a
minimum you need to have the same effective UID as the remote process in order
to ptrace it. Additionally there is a file called
/proc/sys/kernel/yama/ptrace_scope
which can be used to neuter the ptrace(2)
system call. By default Debian and
Ubuntu ship kernels that have the Yama ptrace scope configured so that only the
superuser can ptrace other processes (and only the superuser can change this
setting). In general if you can ptrace another process you have unlimited
ability to do anything as that process so while ptrace can be used to bypass
memory protections, any situation in which you can ptrace another process is a
situation in which you already are able to compromise the process.