Memory Protection and ASLR on Linux

Until recently the exact model of how ASLR and other memory protection mechanisms work on Linux was something that I knew only at a high level. Recently I've done a lot of work where I've had the need to bypass these mechanisms (in a cooperative setting), and I want to explain for readers exactly how the memory protection model on x86/Linux systems works, what it protects against, and the ways it can be bypassed.

There are two major mechanisms in place to protect memory access that are turned on by default on most x86-64 Linux systems. The first is the so-called NX bit which is a setting that gives finer-grained permissions to mapped memory regions. The second is address space layout randomization (ASLR) which randomizes where certain parts of a program are loaded into memory. I'll discuss these two topics separately since they are complementary but completely orthogonal to one another.

The NX Bit

Traditional computer architectures that implemented memory protection mechanisms typically had two states that a mapped memory region could be in: read-only or writable. The exact details of the memory protection capabilities depends on the CPU architecture, but what I described applies to most 32-bit x86 systems. In addition to this read/write toggle the system would also implement some other basic memory protections such as ensuring that different processes cannot read or write to each other's memory regions.

The way this system is implemented is that each process has a sparse data structure allocated in the kernel called a page table. The page table contains a mapping from virtual memory regions to the physical memory associated (or to an offset in the swap area if the data has been paged out to disk). For each page in the page table there's also a read/write bit that is used by the MMU to enforce permissions for areas that ought to be read-only. Memory protection between processes occurs via the fact that the kernel will arrange for the MMU to use a given process' page table when that process is scheduled to run, meaning that in the absence of kernel bugs a process can't enter a state where it is using another process' page table.

There are a few different use cases for read-only memory regions, but probably the most important use case is read-only text segments. A text segment is the part of a process' memory that holds the actual machine code for the process. The reason it's desirable to map this area read-only is that it helps to mitigate attacks that attempt to inject malicious code into a process. If an attacker can inject malicious code into a process' code area then they can execute arbitrary code with the permissions of the process itself. In many cases these permissions are considerable, even if the process is not running as root. For instance, an attacker that can inject code can run arbitrary filesystem operations as the process' effective UID. When the text area is read-only one can be confident that the code in the text area that is executed has not been tampered with.

A big limitation here is that while we might think of memory as either belonging to read-only code and writable data, in fact some information about what code is executing is stored in the data area. In particular the calling convention on x86 works by storing the return address for a function call on the stack, and the stack must be mapped with write permissions. Thus an attacker that can stomp on memory on the stack can alter the return address for a function. This is an extremely well known problem and is known as a stack buffer overflow. Typically when people talk about "buffer overflows" they're specifically referring to stack buffer overflows, since stack overflows are usually the most dangerous kind of buffer overflow (although not the only kind).

On classic 32-bit x86 systems if one can overflow a buffer allocated on the stack then they can actually arrange for arbitray code to run. The way this works is an attacker will overflow the buffer with malicious x86 code and then arrange for the return address on the stack to have the function return to the malicious code. Thus in this setting any stack buffer overflow can easily be turned into arbitrary code execution.

The NX bit, which is present on all 64-bit x86 systems (and some later 32-bit x86 systems), is an important measure which helps mitigate this attack. Systems that implement this functionality have three permissions bits: read, write, and execute. These permissions work exactly the same as the filesystem read/write/execute bits. What happens on these systems is that text areas are mapped to be readable + executable and data areas are mapped to be readable + writable but not executable. It's also possible to map data areas to be just readable.

With the NX bit the simplest and most dangerous stack buffer overflow attack is prevented because any x86 code injected into the stack cannot be executed. An attacker can still arrange for an arbitrary return address to be used as the function return address, but unless that return address actually points to a valid offset in the text area the process will segfault (i.e. terminate with SIGSEGV). The process can also terminate with SIGILL if an invalid offset in the text area is used, since any offset that isn't aligned at an actual instruction offset will likely become an invalid sequence quickly.

The most dangerous attack that is still possible with stack buffer overflows when the NX bit is enabled is called a return-to-libc attack. The way this works is that if the conditions are just right an attacker can arrange for a function's return address to be the address of a sequence of instructions in libc that will cause the program to do something dangerous (e.g. exec a shell). The reason this attack is dangerous is because libc has helper routines for nearly every system call, plus numerous other high level functions that can be rather powerful. The reason this attack is much more difficult to execute than a typical stack buffer overflow is that not only must the attacker know exactly what address in libc to return to, the attacker must also arrange for the registers and return instruction sequence to be just right so the libc code is in the correct state to actually execute the malicious instructions.

Address Space Layout Randomization (ASLR)

ASLR is a mechanism that is technically complementary to the NX bit, but is made much more powerful when the NX bit is enabled. The reason is that ASLR is designed precisely to make return-to-libc (or return to any other shared library) much more difficult to execute.

The way that ASLR works is that when shared libraries are mapped into a process' memory they are loaded at randomized locations. For instance:

$ for x in {1..5}; do grep 'r-xp .*/libc' /proc/self/maps; done
7f2f5469b000-7f2f5484a000 r-xp 00000000 fe:00 31376                      /lib64/libc-2.19.so
7fba4ce49000-7fba4cff8000 r-xp 00000000 fe:00 31376                      /lib64/libc-2.19.so
7f063b82c000-7f063b9db000 r-xp 00000000 fe:00 31376                      /lib64/libc-2.19.so
7f45cbbd9000-7f45cbd88000 r-xp 00000000 fe:00 31376                      /lib64/libc-2.19.so
7f1c25f74000-7f1c26123000 r-xp 00000000 fe:00 31376                      /lib64/libc-2.19.so

The leftmost hex sequence in this output shows the offset that libc is loaded at for five different random bash instances. As you can see, in each invocation libc is loaded at a different memory offset. This means that the address of a given method in libc (e.g. the wrapper for unlink(2)) will differ in every process invocation. If an attacker wants to execute a method like unlink(2) they can't easily know exactly where the code for unlink(2) will actually be loaded in memory.

Another way to make this more apparent is to write a C program that prints out the address of a libc function each time it's run. Consider the following program listing:

#include <dlfcn.h>
#include <stdio.h>

// must use the exact version for libc!
const char libc_name[] = "libc-2.19.so";

int main(int argc, char **argv) {
  // find the libc that is already loaded in memory
  void *handle = dlopen(libc_name, RTLD_LAZY | RTLD_NOLOAD);
  if (handle == NULL) {
    printf("failed to find libc in memory!\n");
    return 1;
  }
  // locate the unlink(2) wrapper
  printf("unlink(2) is loaded at %p\n", dlsym(handle, "unlink"));
  return 0;
}

You can compile this with an invocation like gcc -ldl unlink.c. On a sequence of five different invocations I get five different addresses for unlink(2):

$ for x in {1..5}; do ./a.out; done
unlink(2) is loaded at 0x7f1d5501e540
unlink(2) is loaded at 0x7f384fbfe540
unlink(2) is loaded at 0x7f2c8f5de540
unlink(2) is loaded at 0x7f5669055540
unlink(2) is loaded at 0x7f8efd897540

Bypassing Memory Protection

Both of the protection mechanisms I just mentioned can be bypassed in a number of different ways, particularly in the "cooperative" case where the program author is intentionally trying to bypass the protections.

The NX bit can by bypassed by a caller using the mprotect(2) system call. The prototype for that function looks like this:

int mprotect(void *addr, size_t len, int prot);

In this function the prot field consists of zero or more of the read/write/execute bits masked together. After making this system call the memory region specified will be remapped with the new permissions in this prot field.

One fun thing that a caller can do is to set the text regions for a process to be writable. If this is done it completely bypasses the read-only state that code areas are usually mapped in. The caller can also map data regions (such as the stack) to be executable, which also effectively bypasses the usual protection mechanism. To actually do either of these things a caller must know the exact address they want to change the permissions for. Doing this is somewhat hampered by the fact that many parts of memory live at randomized offsets due to ASLR.

Since ASLR is the main protection mechanism that prevents an attack from trivially remapping memory regions with mprotect(2) the rest of this section will discuss ways that ASLR can be bypassed.

As I demonstrated briefly earlier, a process can see its exact memory layout by looking at /proc/self/maps. This file is the easiest way to work around ASLR protections. Since the file has a well known name and is an easily parseable format accessing it makes it easy to know exactly where everything is loaded and what permissions are enforced on each memory region. In a "cooperative" setting where you are intentionally reading your own maps file it just takes a couple of lines of C or C++ code to find things and remap them.

In practice it would be difficult for an attacker to use this file because it would require actually injecting code to open/parse/interpret the file. Normally the NX + ASLR mechanisms themselves ought to be sufficient to protect against this. In some cases an attacker might be able to use another process running as the same UID to parse a given process' maps file, but even that seems far fetched since the attacker still needs another sidechannel to get the data into the process. There are also various ways to isolate the process directories in /proc to protect against this kind of sidechannel attack.

Processing the maps file in /proc is not the only way to bypass ASLR. Processes can also locate functions in memory by looking at the function symbol tables. The demonstration C file I listed shows how to do this with dlopen(3) and dlsym(3) which makes things easy since libdl implements the actual low level bits that know how to do this function symbol parsing. However it's worth noting that libdl isn't doing anything magic here, and in the usage I showed it does not look at /proc/self/maps.

The way libdl works with glibc is that it contains the code that knows how to parse an actual ELF file. When you call dlopen(3) and dlsym(3) in the manner I demonstrated, libdl will locate libc-2.19.so by searching the library search path and then it will parse the symbol table to see what offset the unlink(2) method is located at. It can then use this to find the method in the process' symbol table. Therefore it is possible in principle for an attacker to use this kind of mechanism even if an executable does not link against libdl. Again, in an actual malicious scenario it is unlikely that an attacker could actually accomplish this since the logic to parse an ELF file and the symbol table is considerable and would have to be injected into a process.

There is another mechanism that is not well understood by many people; this is also what I think would be the most likely way that an attacker would try to bypass ASLR in real scenarios. For typical executables only shared libraries have their locations randomized. The text area for the executable itself will not use randomized memory locations. Therefore if an attacker has a copy of an executable, they can analyze the executable's code and determine effective places to return to. For instance, suppose an attacker wants to unlink a file. If the executable itself has code that calls unlink(2) then instead of trying to actually return to unlink(2), the attacker can instead return to the code that calls unlink(2), which will exist at a well known location. For many attacks this is still difficult since the attacker has to get registers and/or memory into the right state to pass the right arguments to the target function. Still, this is an effective way to bypass the randomization element that ASLR introduces. In some cases one could also use this mechanism even if the executable doesn't actually call the intended C function: the attacker needs only to find a sequence of executable bytes that can be interpreted as the desired function call. Thus one can intentionally jump to an invalid byte offset can cause a crash with SIGILL if one knows that right before the illegal instructions are hit there will be a brief instruction run that can be parsed in the desired way.

If you are worried about this last attack you can compile with the -PIE flag, which stands for position independent executable. This causes all of an executable's symbols to be loaded at randomized locations using ASLR. The executable will still have a fixed, well-known entry point, but this entry point will just arrange for the C runtime (a.k.a. CRT) initialization functions to load and from there everything will have an unpredictable memory address. Compiling with -PIE does have non-negligible overhead, particularly on 32-bit systems, but it's indispensable for security-sensitive executables such as ssh.

One final way to bypass memory protections, which I will mention just for completeness, is by using the ptrace(2) system call. This system call lets a remote process read and write a remote process' memory without restriction (i.e. it is not necessary to mprotect(2) anything). This system call is what GDB uses to probe remote processes. The ptrace(2) system call is very powerful, but because it's so powerful its usage is typically quite restricted. At a minimum you need to have the same effective UID as the remote process in order to ptrace it. Additionally there is a file called /proc/sys/kernel/yama/ptrace_scope which can be used to neuter the ptrace(2) system call. By default Debian and Ubuntu ship kernels that have the Yama ptrace scope configured so that only the superuser can ptrace other processes (and only the superuser can change this setting). In general if you can ptrace another process you have unlimited ability to do anything as that process so while ptrace can be used to bypass memory protections, any situation in which you can ptrace another process is a situation in which you already are able to compromise the process.