An Introduction to Valgrind Memcheck

February 21, 2017

Valgrind is an extremely powerful tool for debugging and profiling programs written in C or C++. Valgrind has many builtin tools for various operations, such as checking memory errors, memory profiling, and profiling CPU branch predictions. In this post I'll run through the basics of the default tool memory checking tool, called memcheck. The Valgrind memory checker is capable of detecting many kinds of memory leaks, access to uninitialized memory, and out-of-bounds array indexes.

How Valgrind Works

Unlike many other similar tools, Valgrind does not require the program it is checking to be compiled or linked specially. You can use Valgrind with any program.

Valgrind works by doing a just-in-time (JIT) translation of the input program into an equivalent version that has additional checking. For the memcheck tool, this means it literally looks at the x86 code in the executable, and detects what instructions represent memory accesses. These instructions are sandboxed, so that it can trap accesses to uninitialized memory (even if those accesses do not cause a page fault or segmentation fault). The memcheck tool also has special knowledge of routines like malloc() and free(), so it can track what memory is still "reachable" when the program terminates.

The design of Valgrind means that there is no penalty for running instructions that do not touch main memory (or that are provably already checked). This is pretty good, although in practice instructions touching main memory are pretty common. After all, 64-bit x86 systems only have 16 general purpose registers. In my experience it's typical for the Valgrind memory checker to slow programs down by about 10x. This is usually fast enough for development work, although you wouldn't want to use Valgrind in production.

To get the most effective results out of Valgrind you should compile your program with debug symbols. This will allow Valgrind to track full call stacks for memory violations. Of course, when developing your code you should be compiling your code with debug symbols anyway.

If you're interested in an extremely technical description of how Valgrind works, I refer you to The Design and Implementation of Valgrind, written by Julian Seward, the original author of Valgrind.

Debugging A Simple Memory Leak

We're going to first look at a simple (but incorrect) program that detects if ~/.bashrc exists:

#include <stdio.h>
#include <stdlib.h>
#include <string.h>

#define BUF_SIZE 1000

int main() {
  char *path = calloc(BUF_SIZE, 1);
  strcat(path, getenv("HOME"));
  strcat(path, "/.bashrc");
  FILE *foo = fopen(path, "r");
  return foo == NULL;
}

Here are the bugs I see in this program:

The program does not check if calloc() returns NULL
The unsafe function strcat() is used, instead of strncat()
The fopen() call does not have a corresponding fclose()
The calloc() call does not have a corresponding free()

Since I'm pedantic, I'll also note that using getenv() in this way isn't strictly correct. A correct program should get the user's home directory from /etc/passwd using getpwent() or fgetpwent(). The version here is given for brevity.

Valgrind can detect the errors noted above that are related to bad memory accesses or allocation/deallocation patterns. In the default mode, here's what we see:

$ valgrind ./a.out
==22633== Memcheck, a memory error detector
==22633== Copyright (C) 2002-2015, and GNU GPL'd, by Julian Seward et al.
==22633== Using Valgrind-3.12.0 and LibVEX; rerun with -h for copyright info
==22633== Command: ./a.out
==22633==
==22633==
==22633== HEAP SUMMARY:
==22633==     in use at exit: 1,552 bytes in 2 blocks
==22633==   total heap usage: 2 allocs, 0 frees, 1,552 bytes allocated
==22633==
==22633== LEAK SUMMARY:
==22633==    definitely lost: 1,000 bytes in 1 blocks
==22633==    indirectly lost: 0 bytes in 0 blocks
==22633==      possibly lost: 0 bytes in 0 blocks
==22633==    still reachable: 552 bytes in 1 blocks
==22633==         suppressed: 0 bytes in 0 blocks
==22633== Rerun with --leak-check=full to see details of leaked memory
==22633==
==22633== For counts of detected and suppressed errors, rerun with: -v
==22633== ERROR SUMMARY: 0 errors from 0 contexts (suppressed: 0 from 0)

Here Valgrind is telling us it thinks there are still 1,552 bytes from two allocations on the heap when the program terminates. That means there are two memory leaks. Valgrind knows that 1,000 of those bytes are "definitely lost", and 552 bytes are "still reachable".

As the output says, we should run with --leak-check=full to get a full error report. Actually we need a few more options; the full invocation should be:

$ valgrind --tool=memcheck --leak-check=full --show-leak-kinds=all ./a.out
==22679== Memcheck, a memory error detector
==22679== Copyright (C) 2002-2015, and GNU GPL'd, by Julian Seward et al.
==22679== Using Valgrind-3.12.0 and LibVEX; rerun with -h for copyright info
==22679== Command: ./a.out
==22679==
==22679==
==22679== HEAP SUMMARY:
==22679==     in use at exit: 1,552 bytes in 2 blocks
==22679==   total heap usage: 2 allocs, 0 frees, 1,552 bytes allocated
==22679==
==22679== 552 bytes in 1 blocks are still reachable in loss record 1 of 2
==22679==    at 0x4C2DB9D: malloc (vg_replace_malloc.c:299)
==22679==    by 0x4EA905C: __fopen_internal (in /usr/lib64/libc-2.24.so)
==22679==    by 0x400657: main (fopen.c:11)
==22679==
==22679== 1,000 bytes in 1 blocks are definitely lost in loss record 2 of 2
==22679==    at 0x4C2FA50: calloc (vg_replace_malloc.c:711)
==22679==    by 0x4005EC: main (fopen.c:8)
==22679==
==22679== LEAK SUMMARY:
==22679==    definitely lost: 1,000 bytes in 1 blocks
==22679==    indirectly lost: 0 bytes in 0 blocks
==22679==      possibly lost: 0 bytes in 0 blocks
==22679==    still reachable: 552 bytes in 1 blocks
==22679==         suppressed: 0 bytes in 0 blocks
==22679==
==22679== For counts of detected and suppressed errors, rerun with: -v
==22679== ERROR SUMMARY: 1 errors from 1 contexts (suppressed: 0 from 0)

Great, now we know where the memory leaks come from. Valgrind tells us that the 552 still reachable bytes were allocated from line 11 which called __fopen_internal, which called malloc. It also tells us that the 1,000 definitely lost bytes come from line 8 which calls calloc.

We can fix those with this diff:

--- fopen1.c    2017-02-20 13:32:20.442466791 -0800
+++ fopen.c    2017-02-20 13:32:23.176399966 -0800
@@ -9,5 +9,7 @@
   strcat(path, getenv("HOME"));
   strcat(path, "/.bashrc");
   FILE *foo = fopen(path, "r");
+  fclose(foo);
+  free(path);
   return foo == NULL;
 }

Now when we run Valgrind it will give us a clean heap summary:

==22786== HEAP SUMMARY:
==22786==     in use at exit: 0 bytes in 0 blocks
==22786==   total heap usage: 2 allocs, 2 frees, 1,552 bytes allocated
==22786==
==22786== All heap blocks were freed -- no leaks are possible

This program still has bugs, as the return values for various methods aren't checked. Notably, the return value to calloc() isn't checked to see if it's null. But this is a good start.

"Definitely Lost" vs. "Still Reachable"

Before the leak was fixed, Valgrind said that the memory allocated by calloc() was "definitely lost", whereas the memory allocated by fopen() was "still reachable". The Valgrind FAQ has this explanation for the difference between "definitely lost" and "still reachable". The short version is that "definitely lost" memory is just that: definitely an error. Memory that is "still reachable" is the least worrisome kind of memory leak: it's memory that could have been freed, but wasn't allocated in an unbounded way.

What's actually happening here is that when the file is opened, glibc allocates a structure for the FILE* object. This structure has things like the underlying file descriptor number, the file name, a read buffer for the file, and usually a mutex (depending on how glibc was compiled). Since there are a bounded number of file descriptors you can actually open, there's also a bounded amount of memory you can leak in this way before the program would fail to open new files.

Furthermore, many people don't care if their program exits with open file descriptors. For instance, do you close stdin, stdout, and stderr before exiting your program? Those are also FILE* objects allocated by glibc. No one bothers to close these before exiting, and it's not a big deal.

I don't know the exact heuristics that Valgrind uses to tell which category a memory leak is in, but my advice is to make a best effort to fix them all, but to ignore "still reachable" errors if they're too difficult to fix. In particular, you may find yourself using certain libraries that trigger "still reachable" or "possibly lost" memory errors when run through Valgrind. The most common case is a library that allocates a small, bounded amount of data when initialized, but exposes no method to deallocate that memory. Take a critical look at these and Google the errors you find, but remember that perfect is the enemy of good, and use your programming time productively.

Accessing Uninitialized Memory

C has "undefined behavior" when you read from uninitialized memory, or when you write to data on the heap outside of a valid range. This is really bad. If you're lucky this situation will cause a segfault to occur. If you're not lucky, you'll read invalid data or corrupt the heap.

We can trigger this in our program by setting BUF_SIZE to a small number:

--- fopen1.c    2017-02-20 13:40:10.985125910 -0800
+++ fopen.c    2017-02-20 13:40:13.636064803 -0800
@@ -2,7 +2,7 @@
 #include <stdlib.h>
 #include <string.h>

-#define BUF_SIZE 1000
+#define BUF_SIZE 1

 int main() {
   char *path = calloc(BUF_SIZE, 1);

Now the calloc() call will allocate a buffer that is too small. What's interesting is that in practice, if you run this new program with GCC/glibc you're unlikely to actually observe an error. This is because the glibc memory allocator won't actually ask for just one byte from the operating system: in practice it will allocate a larger chunk. Thus this program will overrun its allocated buffer, but probably won't segfault. In this simple program this may not be an issue; in a more complicated program this will almost certainly cause problems, including memory corruption.

Here's what Valgrind has to say about the new program:

$ valgrind --tool=memcheck --leak-check=full --show-leak-kinds=all ./a.out
==23581== Memcheck, a memory error detector
==23581== Copyright (C) 2002-2015, and GNU GPL'd, by Julian Seward et al.
==23581== Using Valgrind-3.12.0 and LibVEX; rerun with -h for copyright info
==23581== Command: ./a.out
==23581==
==23581== Invalid write of size 1
==23581==    at 0x4C30890: strcat (vg_replace_strmem.c:303)
==23581==    by 0x40069C: main (fopen.c:9)
==23581==  Address 0x5200041 is 0 bytes after a block of size 1 alloc'd
==23581==    at 0x4C2FA50: calloc (vg_replace_malloc.c:711)
==23581==    by 0x40067C: main (fopen.c:8)
==23581==
==23581== Invalid write of size 1
==23581==    at 0x4C3089F: strcat (vg_replace_strmem.c:303)
==23581==    by 0x40069C: main (fopen.c:9)
==23581==  Address 0x520004a is 9 bytes after a block of size 1 alloc'd
==23581==    at 0x4C2FA50: calloc (vg_replace_malloc.c:711)
==23581==    by 0x40067C: main (fopen.c:8)
==23581==
==23581== Invalid read of size 1
==23581==    at 0x4006B3: main (fopen.c:10)
==23581==  Address 0x5200041 is 0 bytes after a block of size 1 alloc'd
==23581==    at 0x4C2FA50: calloc (vg_replace_malloc.c:711)
==23581==    by 0x40067C: main (fopen.c:8)
==23581==
==23581== Invalid write of size 8
==23581==    at 0x4006D0: main (fopen.c:10)
==23581==  Address 0x520004a is 9 bytes after a block of size 1 alloc'd
==23581==    at 0x4C2FA50: calloc (vg_replace_malloc.c:711)
==23581==    by 0x40067C: main (fopen.c:8)
==23581==
==23581== Invalid write of size 1
==23581==    at 0x4006D3: main (fopen.c:10)
==23581==  Address 0x5200052 is 17 bytes after a block of size 1 alloc'd
==23581==    at 0x4C2FA50: calloc (vg_replace_malloc.c:711)
==23581==    by 0x40067C: main (fopen.c:8)
==23581==
==23581== Syscall param open(filename) points to unaddressable byte(s)
==23581==    at 0x4F319B0: __open_nocancel (in /usr/lib64/libc-2.24.so)
==23581==    by 0x4EB5BC2: _IO_file_open (in /usr/lib64/libc-2.24.so)
==23581==    by 0x4EB5E84: _IO_file_fopen@@GLIBC_2.2.5 (in /usr/lib64/libc-2.24.so)
==23581==    by 0x4EA90B3: __fopen_internal (in /usr/lib64/libc-2.24.so)
==23581==    by 0x4006E7: main (fopen.c:11)
==23581==  Address 0x5200041 is 0 bytes after a block of size 1 alloc'd
==23581==    at 0x4C2FA50: calloc (vg_replace_malloc.c:711)
==23581==    by 0x40067C: main (fopen.c:8)
==23581==
==23581==
==23581== HEAP SUMMARY:
==23581==     in use at exit: 0 bytes in 0 blocks
==23581==   total heap usage: 2 allocs, 2 frees, 553 bytes allocated
==23581==
==23581== All heap blocks were freed -- no leaks are possible
==23581==
==23581== For counts of detected and suppressed errors, rerun with: -v
==23581== ERROR SUMMARY: 23 errors from 6 contexts (suppressed: 0 from 0)

What's interesting here is that even though Valgrind gives us a clean heap summary ("All heap blocks were freed---no leaks are possible"), there are still a lot of errors. In fact, we see that there are 23 errors from 6 contexts. There are three types of errors reported:

Invalid reads
Invalid writes
Invalid parameter to syscall "open"

In all of these cases we can see that the calloc() call on line 8 is indicated as the source of the bad pointer in the stack trace. We can also see what method did the read or write on that pointer.

Bonus: GCC Builtin Functions

This is probably enough information to fix the bug: the call to calloc() is indicated as the source of our problems, and in practice that's probably enough information to know that the allocation was too small. But let's look more closely at the output, because we can see a very interesting thing that GCC is doing when optimizing this program.

On line 9 of the input program an invalid read and an invalid write are both indicated. For reference, here's line 9 of the source program:

strcat(path, getenv("HOME"));  // line 9

And here are the error contexts, as reported by Valgrind:

==23581== Invalid write of size 1
==23581==    at 0x4C30890: strcat (vg_replace_strmem.c:303)
==23581==    by 0x40069C: main (fopen.c:9)
==23581==  Address 0x5200041 is 0 bytes after a block of size 1 alloc'd
==23581==    at 0x4C2FA50: calloc (vg_replace_malloc.c:711)
==23581==    by 0x40067C: main (fopen.c:8)
==23581==
==23581== Invalid write of size 1
==23581==    at 0x4C3089F: strcat (vg_replace_strmem.c:303)
==23581==    by 0x40069C: main (fopen.c:9)
==23581==  Address 0x520004a is 9 bytes after a block of size 1 alloc'd
==23581==    at 0x4C2FA50: calloc (vg_replace_malloc.c:711)
==23581==    by 0x40067C: main (fopen.c:8)

This is saying the original allocation happened through the calloc() call, and the method doing the bad read/write is the strcat() call on line 9. This is what we expect.

However, let's look at the next three contexts, which are all for line 10. For reference, here's line 10 of the source program:

strcat(path, "/.bashrc");  // line 10

And here are the error contexts, as reported by Valgrind:

==23581== Invalid read of size 1
==23581==    at 0x4006B3: main (fopen.c:10)
==23581==  Address 0x5200041 is 0 bytes after a block of size 1 alloc'd
==23581==    at 0x4C2FA50: calloc (vg_replace_malloc.c:711)
==23581==    by 0x40067C: main (fopen.c:8)
==23581==
==23581== Invalid write of size 8
==23581==    at 0x4006D0: main (fopen.c:10)
==23581==  Address 0x520004a is 9 bytes after a block of size 1 alloc'd
==23581==    at 0x4C2FA50: calloc (vg_replace_malloc.c:711)
==23581==    by 0x40067C: main (fopen.c:8)
==23581==
==23581== Invalid write of size 1
==23581==    at 0x4006D3: main (fopen.c:10)
==23581==  Address 0x5200052 is 17 bytes after a block of size 1 alloc'd
==23581==    at 0x4C2FA50: calloc (vg_replace_malloc.c:711)
==23581==    by 0x40067C: main (fopen.c:8)

What's interesting here is that we don't see strcat() attributed as the source of the read/write, even though our original program called that method! What's going on?

In the strcat() on line 9, the "source" string was dynamic: it comes from getenv(). So GCC does a regular strcat() here. However, on line 10 the strcat() call uses a "source" string that is known ahead of time to the compiler. In this case GCC will actually omit the strcat() call and replace it with a "builtin" inline version. In other words, GCC has special cased calling strcat() with a constant source string, and will just inline the logic to do the concatenation. That's why Valgrind doesn't report a call to strcat()!

This behavior can be changed with the flag -fno-builtin, which causes GCC to avoid this optimization. One reason you might want to know about this is when debugging a program. If you set a GDB breakpoint on strcat() then you might be surprised to see that the breakpoint isn't always executing.

GCC can apply builtin optimizations for many functions. Here is a complete list, from the GCC documentation.