Ptrace (continued)

February 1, 2016

I made some significant improvements to my userspace ptrace prober today. The new way it works is pretty interesting.

There are essentially two major changes. The first is that instead of single stepping through the program, I am now using the x86 TRAP facility (a.k.a. INT 3) to more efficiently pause at the right point. This is the same mechanism that a debugger like GDB uses to implement break points. The idea is pretty simple. You insert the value 0xcc at the instruction that you want to break on. Then when the program hits that instruction the kernel will deliver SIGTRAP to the process. From the tracer process you can use wait(), waitpid(), or waitid() to wait until the child is delivered SIGTRAP. At that point you can do whatever you need to do.

The reason that using a trap is more efficient is because in single stepping mode the process gets interrupted after literally every x86 instruction it executes, which imposes a significant overhead as you might imagine. When using a trap there is essentially no overhead, the only overhead is when you actually process the trap.

The second problem that I had was that I was encoding the CALL instruction to fprintf, and the format string data, directly into the code segment of the executable at the current instruction pointer. This actually works fine, but there's one caveat. If you had run my tracer exactly when the tracee was in the middle of executing fprintf() (or a routine called by fprintf()) then the fprintf() code would actually be corrupted and the program would crash.

Here's how the new tracer works:

Attach to the tracee as usual.
Insert a SYSCALL instruction at %rip (this takes two bytes, 0x0f05).
Insert a JMP %rax instruction right after SYSCALL (this is also two bytes, 0xffe0).
Modify the registers to encode the arguments to the mmap(2) system call to allocate a 4096 byte anonymous page that is marked PROT_READ | PROT_EXEC.
Use PTRACE_SINGLESTEP to execute the system call.
Verify that the mmap(2) call worked by reading the address out of %rax and checking that it did not return -1.
Use PTRACE_SINGLESTEP to execute the JUMP %rax instruction we already poked into memory; conveniently, the address is already in %rax since the kernel returns system call.
Copy a CALL instruction into the mmap'ed region followed by a TRAP instruction.
Copy the fprintf() format string right after the TRAP instruction
Modify the registers to hold the values in %rax, %rdi, %rsi, and %rdx needed to do the fprintf() call.
Run PTRACE_CONT and then wait() for the process to return to the TRAP we inserted.
Replace the TRAP with a JMP %rax to return back to the original value of %rip and PTRACE_SINGLESTEP to execute the jump.
The current instruction will still be the SYSCALL we encoded earlier for our mmap(2) system call.
Set up the registers to hold the right arguments to munmap(2) to unmap the page we allocated earlier.
Execute the munmap(2) system call by using PTRACE_SINGLESTEP.
Verify that the munmap(2) call worked by checking that %rax is 0.
Poke the value at the original %rip to replace it with the single word we previously overwrote with the SYSCALL and JMP %rax instructions.
Restore the original register state.
Use PTRACE_DETACH to detach from the process.

This new tracer is both faster and more correct than the previous implementation. As before, you can find the code on GitHub at eklitzke/ptrace-call-userspace.