Strace

November 4, 2015

Strace is one of my favorite commands. I use it probably almost every day at work, and regularly use it on my home computers to debug all manner of things. People using OS X or BSD like to talk about how great DTrace is, but I think that for 99% of the things I want to do strace is vastly simpler and therefore is actually much more useful, despite being more limited in nature.

How To Use It

Previously I wrote an article about setting up eCryptfs on Fedora. When I was first setting this up, I didn't correctly have my user in the ecryptfs group and after running ecryptfs-setup-private I got an error like this:

/usr/bin/ecryptfs-setup-private: line 450: /sbin/mount.ecryptfs_private: Permission denied
ERROR:  Could not mount private ecryptfs directory

What this error was actually telling me was that my ecryptfs directory had been successfully created but couldn't be mounted (which would have been solved by just restarting my login session). If you know this is what's going on the error message makes sense, but it doesn't necessarily make sense if you don't know that up front. So after restarting my session I tried running ecryptfs-setup-private again and instead got this error:

ERROR:  wrapped-passphrase file already exists, use --force to overwrite.

This is equally cryptic if you don't know what's going on. It's kind of clear that there's some sort of file somewhere that exists, and the setup script is trying to not overwrite that file. But it's not at all clear where that file is since the error message doesn't actually include the path.

This is an example of the kind of problem that you can debug in a minute or so once you know how to use strace. Running

strace ecryptfs-setup-private

and scanning the output will show that there aren't any suspicious failed system calls or access checks, but you will see that the command calls clone(2) and receives a SIGCHLD. This indicates that the ecryptfs-setup-private comand is creating child processes or threads that are doing the actual work, so we need to trace those:

strace -f ecrypts-setup-private 2>&1 | less

This will cause the child processes/threads to be traced as well and will send all the output to less. This yields nearly 3000 lines of output on my computer, which is a lot to scan through, but we can guess that since the error message says that the "file already exists" we probably need to look for a system call that returns EEXIST. When I searched for this, it immediately took me to this section of the strace output:

[pid  4798] mkdir("/home", 0777)        = -1 EEXIST (File exists)
[pid  4798] chdir("/home")              = 0
[pid  4798] mkdir("evan", 0777)         = -1 EEXIST (File exists)
[pid  4798] chdir("evan")               = 0
[pid  4798] mkdir(".ecryptfs", 0700)    = -1 EEXIST (File exists)
[pid  4798] stat(".ecryptfs", {st_mode=S_IFDIR|0700, st_size=4096, ...}) = 0
[pid  4798] close(1)                    = 0
[pid  4798] close(2)                    = 0
[pid  4798] exit_group(0)               = ?
[pid  4798] +++ exited with 0 +++

and then about ten lines below that I see

stat("/home/evan/.ecryptfs/wrapped-passphrase", {st_mode=S_IFREG|0600, st_size=58, ...}) = 0

This is pretty obviously the issue that the error message is talking about. I got a little lucky here by searching for EEXIST which wasn't actually the root of the error but was close to where the error occurs. I could have also searched for wrapped-passphrase which was mentioned as the filename in the error message. After doing this I poked around in a bash shell looking at the ~/.ecryptfs directory and figured out what was going on.

Again I got lucky because I had a good guess of what the actual error was; that said, this is the type of thing you'll quickly develop an intuition for if you use strace regularly.

If you don't get as lucky with guessing the error code or filename to look for, usually the right thing to do is to look backwards in the strace output for calls to exit(2) or exit_group(2). Shortly before the exit_group(2) call typically you will see what the program was doing when it decided to bail. Another technique that works is looking for system calls that have negative return codes (again, typically you want to search backwards since you want to look for failures near the end of the program run). Another technique is to look for the write call to stdout or stderr that prints the error message, and try to work backwards from there. In any case when you get stuck the rule of thumb is to try to look at the end of the output first, since most programs exit (or at least print an error) right after encountering a problem.

If the program is really complicated or does a lot of stuff before exiting then pipeing the output to less will be unwieldly. In this case you want to do something like:

strace -ff -o x ecryptfs-mount-private

This will create a bunch of files named x.<PID> with the actual PID substituted. Afterwards you can usally use grep to figure out which files you want to investigate further, and once you've narrowed it down you can analyze the files with less, vi, etc. This technique works well when you have a lot of child processes to trace, since isolating the output of each process makes following the trace of each of the processes simpler as you don't have to deal with interleaved output.

There are a couple of other options you'll want to know. To trace an already running process, use:

strace -p <pid>

To get timing information, you'll want to -tt to get absolute timestamps and/or -T to get elapsed time. I frequently use them together, e.g.

strace -tt -T ls

The leading portion here will show the timestamp and the trailing part will show how long the system call took. This is useful for debugging slow programs, e.g. you can look for system calls that take more than one second easily by grepping the output of this.

By default strings will be truncated after only 32 characters which isn't that many; to get more characters use strace -s <number.

If you know what system calls you're interested in you can filter with -e. For instance, let's say I want to know how the ping program creates a socket for sending ICMP packets; I might look at this with

strace -e socket ping www.google.com

You can specify mulitple system calls this way, e.g.

strace -e socket,setsockopt curl -s www.google.com >/dev/null

There are a myriad of other options that strace will accept, but the ones I've listed here are the ones I most frequently use.

When To Use It

A lot of people know about strace, but in my opinion fail to use it effectively because they don't use it at the right time.

Any time a program is crashing right after you start it, you can probably figure out what's going on with strace. The most common situation here is that the program is making a failed system call and then exiting after the failed system call. Frequently this will be something like a missing file or a permissions error (e.g. ENOENT or EPERM), but it could be almost any failed system call. As I said before, try to work backwards from the end of the strace output.

Any time you have a program that appears to be hung, you should use strace -p to see what system call it's stuck on (or what sequence of system calls it's stuck on). Any time you have a program that appears to be running really slowly, you should use strace or strace -p and see if there are system calls that are taking a long time.

In both of the cases I just listed, there are a few common patterns that you'll see:

Programs that are stuck in a wait* call are waiting for a child process to complete, proceed by tracing the child processes.
Programs that are stuck in select(2) or epoll_wait(2) calls or loops without making other system calls are typically waiting forever for network data. To debug, try to figure out what file descriptors are in the select or epoll (you can typically figure this out with more strace, by looking in /proc, or using lsof). This is potentially kind of tricky to do with edge-triggered epoll loops, you may need to use gdb or restart the program if you don't have sufficient context.
Programs that are stuck in calls like connect(2), read(2), write(2), sendto(2), recvfrom(2), etc. are also stuck doing I/O, and once again you should use something like lsof to figure out what the other end of the file descriptor is.
Programs that are stuck in the futex(2) system call typically have some sort of multi-threading related problem as this is the primitive that Pthreads uses on Linux. You'll see this also in higher level programs written in languages like Python, since Python threads are implemented on Linux using Pthreads. This can't easily be debugged with strace but often knowing that it's a threading problem is sufficient to point you in the right direction. If you need to do more low level debugging, proceed with gdb (I would recommend starting by attaching to the process and getting a backtrace).

Another really common use case for me is programs that aren't loading the right files. Here's one that comes up a lot. I frequently see people who have a Python program that has an import statement that is either loading the wrong module, or that is failing even though the developer thinks that the module should be able to be imported. How can you figure out where Python is attempting to load the module from? You can use strace and then look for calls to open(2) (or perhaps acces(2)), and then grep for the module name in question. Quickly you'll see what files the Python interpreter is actually opening (or attempting to open), and that will dispell any misconceptions about what may or may not be in your sys.path, will identify stale bytecode files, etc.

If there aren't any slow system calls or the system calls don't appear to be looping then the process is likely stuck on some sort of userspace thing. For instance, if the program is computing the value of pi to trillions of decimal places it will probably not be doing any system calls (or doing very few system calls that complete quickly). In this case you need to use a userspace debugger like gdb.

One final piece of advice I have is to try to make an effort to learn as many system calls as you can and figure out why programs are making them. If you've never used strace before the output is going to be really overwhelming and confusing, particularly if you don't have much Unix C programming experience. However, there actually aren't that many system calls that are regularly used or that are useful when debugging these issues. There are also common sequences or patterns to system calls (e.g. for things like program initialization, memory allocation, network I/O, etc.) that you can pick up quickly that will make scanning strace output really fast. All of the system calls you see will have man pages available that explain all of the arguments and return codes, so you don't even need to leave your terminal or have an internet connection to figure out what's going on.

Happy debugging, and may the strace gods bless you with quick results.