repz stosb


Joey Hess has a collection of general purpose Unix utilities called moreutils. This is available on pretty much every Linux distribution I know of under the unsurprising package name moreutils. There are a few good things in here, but my favorite by far is the sponge(1) command.

Explaining what sponge is and why it's useful is easiest with an example. Suppose that you've done some complex shell pipeline and redirected the output to a file called file.txt. You realize that you've accidentally included some extra lines in the file, and you want to remove them using grep. Conceptually what you want to do is:

# incorrectly try to remove lines from file.txt
grep -v badpattern file.txt > file.txt

The issue is that this doesn't work—or at least, it doesn't do what you might expect. The problem is that when you use the > operator in bash it immediately opens the file for writing, truncating it. This happens before the first part of the pipeline runs: in this case, before the grep command is invoked. This means that when grep runs it will see file.txt as an empty file and consequently this shell invocation will always leave you with an empty file.

One way to fix this is to use an intermediate output file, perhaps like this:

# correct version, using a temporary file
grep -v badpattern file.txt > tmp.txt
mv tmp.txt file.txt

This works because the output is fully written to a new file before replacing the input file. But it's two steps, and if you're like me you're likely to end up with a lot of random semi-processed files that you need to clean up later.

There's a better way to do this using sponge. You do it like this:

# correct version, using sponge
grep -v badpattern file.txt | sponge file.txt

What happens here is that sponge will buffer all of the data it gets on stdin into memory. When it detects the EOF condition on stdin it will then write all of the data it buffered to a file named by its argument. By convention you would use the input file as this argument. The end result is that file.txt won't be truncated until after it's been fully read by the left hand side of the pipe. The only caveat to be aware of is that because the output is first buffered into memory, you may run into problems if the output file is too large (i.e. larger than the amount of free memory you have). However I've very rarely found that to be the case, and I'm a happy regular user of sponge.