Most programming languages offered buffered I/O features by default, since it makes generating output much more efficient. These buffered I/O facilities typically “Just Work” out of the box. But sometimes they don’t. When we say they “don’t work” what we mean is that excess buffering occurs, causing data not to be printed in a timely manner. This is typically fixed by explicitly putting a “flush” call in the code, e.g. with something like sys.stdout.flush() in Python, fflush(3) in C, or std::flush in C++.
Frequently when people are confused about the rules of buffering their code becomes littered with unnecessary flush statements, an example of cargo-cult programming. In this post I’ll explain the buffering rules for stdout, so you’ll never be confused again.
Why Buffering Exists
As already discussed, the problem with buffering is that it can cause output to be delayed. So why does it exist at all?
At the underlying system call level, data is written to file descriptors using write(2). This method takes a file descriptor and a byte buffer, and writes the data in the byte buffer to the file descriptor.
Most languages have very fast function calls. The overhead for a function call in a compiled language like C or C++ is just a few CPU cycles. In these languages it’s common to think of functions call overhead as negligible, and only in extreme cases are functions marked as inline. However, a system call is much more expensive. A system call on Linux takes closer to a thousand CPU cycles and implies a context switch. Thus system calls are significantly more expensive than regular userspace function calls.
The main reason why buffering exists is to amortize the cost of these system calls. This is primarily important when the program is doing a lot of these write calls, as the amortization is only effective when the system call overhead is a significant percentage of the program’s time.
Let’s consider what happens when you
use grep to search for a
pattern in an input file (or stdin). Suppose you’re grepping nginx logs for a
pattern—say lines from a particular IP address. A typical line length in
these nginx logs might be 100 characters. That means that if buffering wasn’t
used, for each matching line in the input file that grep needs to print, it
would invoke the
write(2) system call. This would happen over and over again,
and each time the average buffer size would be 100 bytes. If, instead, a
4096-byte buffer size is used then data won’t be flushed until the 4096-byte
buffer fills up. This means that in this mode the grep command would wait until
it had about 40 lines of input before the byte buffer filled up. Then it would
flush the buffer by invoking
write(2) with a pointer to the 4096-byte buffer.
This effectively transforms forty system calls into one, yielding a 40x decrease
in system call overhead. Not bad!
If the grep command is sending a lot of data to stdout you won’t even notice the buffering delay. And a grep command matching a simple pattern can easily spend more time trying to print data than actually filtering the input data. But suppose instead the grep pattern occurs very infrequently. Suppose it’s so uncommon that a matching input line is only found once every 10 seconds. Then we’d have to wait about 400 seconds (more than six minutes!) before seeing any output, even though grep actually found data within the first ten seconds.
This buffering can be especially insidious in certain shell pipelines. For instance, suppose we want to print the first matching line in a log file. The invocation might be:
# BAD: grep will buffer output before sending it to head grep RAREPATTERN /var/log/mylog.txt | head -n 1
Going with the previous example, we would like this command to complete within ten seconds, since that’s the average amount of time it will take grep to find the input pattern in this file. But if buffering is enabled then the pipeline will instead take many minutes to run. In other words, in this example buffering makes the program strictly slower, not faster!
Even in cases where the output isn’t being limited by a command like
if output is very infrequent then buffering can be extremely annoying and
provide essentially zero performance improvement.
When Programs Buffer, And When They Don’t
There are typically three modes for buffering:
- If a file descriptor is unbuffered then no buffering occurs whatsoever, and function calls that read or write data occur immediately (and will block).
- If a file descriptor is fully-buffered then a fixed-size buffer is used, and read or write calls simply read or write from the buffer. The buffer isn’t flushed until it fills up.
- If a file descriptor is line-buffered then the buffering waits until it
sees a newline character. So data will buffer and buffer until a
\nis seen, and then all of the data that buffered is flushed at that point in time. In reality there’s typically a maximum size on the buffer (just as in the fully-buffered case), so the rule is actually more like “buffer until a newline character is seen or 4096 bytes of data are encountered, whichever occurs first”.
GNU libc (glibc) uses the following rules for buffering:
|stdout (not a TTY)||output||fully-buffered|
As you can see, the behavior for stdout is a bit unusual: the exact behavior for stdout depends on whether or not it appears to be a TTY. The rationale here is that when stdout is a TTY it means a user is likely watching the command run and waiting for output, and therefore printing data in a timely manner is most important. On the other hand, if the output isn’t a TTY the assumption is that the data is being processed or saved for later use, and therefore efficiency is more important.
Most other programming languages have exactly the same rules: either because those languages implement function routines as calls to buffered libc output commands (such as printf(3)), or because they actually implement the same logic.
More Grep Examples
Grep is a special case for buffering because a grep command can turn a large amount of input data into a slow and small stream of output data. Therefore grep is particularly susceptible to buffering frustration. Knowing when grep will buffer data is easy: it follows the glibc buffering rules described above.
If the output of grep is a TTY then it will be line-buffered. If the output of grep is sent to a file or a pipe, it will be fully-buffered, as the output destination is not a TTY.
This grep command will be line-buffered, since stdout is a TTY:
# line-buffered grep RAREPATTERN /var/log/mylog.txt
If stdout is redirected to a file then stdout is no longer a TTY, and output will be fully-buffered. This is usually fine:
# fully-buffered grep RAREPATTERN /var/log/mylog.txt >output.txt
One situation where the previous example isn’t ideal is if you have another
terminal output that is trying to
tail -f the output file.
Suppose we want to search the file backwards by piping tac(1) to grep. This will be line-buffered, as grep is still the last command in the pipeline and thus stdout is still a TTY:
# line-buffered tac /var/log/mylog.txt | grep RAREPATTERN
But what if we want to filter the output of grep? If we use a shell pipeline this will cause the grep output to become buffered. For instance, consider the following:
# fully-buffered grep RAREPATTERN /var/log/mylog.txt | cut -f1
The issue here is that when we put a pipe after the grep command, grep’s stdout is now the file descriptor for a pipe. Pipes are not TTYs, and thus grep will go into fully-buffered mode.
For the grep command the solution is to use the
--line-buffered option to
# forced line-buffering grep --line-buffered RAREPATTERN /var/log/mylog.txt | cut -f1
As noted earlier, you may also want to use this when redirecting grep output to
a file and then consuming the file in another session using
If you’re writing your own C code, you can control the buffering for
streams using setbuf(3).
Using this you can force behavior such as always line-buffering stdout. You can
also use this for disk-backed files, so you can do things like write a file to
disk and have fprintf(3)
be automatically line-buffered.
GNU coreutils comes with a program
allows you to change the default buffering behavior of programs you don’t
control. There are a few caveats for target programs: the programs must use C
FILE* streams, and the programs can’t use the explicit buffer control routines
There’s one further gotcha that typically pops up in C++ programs. Many C++ programmers are accustomed to using std::endl for newlines. For instance,
// Two ways to print output with a newline ending. std::cout << "Hello, world!\n"; std::cout << "Hello, world!" << std::endl;
These are not the same. The difference is that when
std::endl is used it
automatically forces the output stream to be flushed, regardless of the output
mode of the stream. For instance,
// Subject to normal buffering rules. std::cout << "Hello, world!\n"; // These are equivalent and are *always* line-buffered. std::cout << "Hello, world!\n" << std::flush; std::cout << "Hello, world!" << std::endl;
Thus if you’re using
std::endl a lot then the usual buffering rules don’t
std::endl is effectively forcing line-buffering! This can be
important in certain performance sensitive programs, since using
inadvertently disable buffering.
My suggestion is: only use
std::endl when you actually want to flush the
output stream. If you don’t know if the stream should be forcibly flushed then
stick to using a regular
\n sequence in your code.