Joey Hess has a collection of general purpose Unix utilities called
moreutils. This is available on pretty
much every Linux distribution I know of under the unsurprising package name
moreutils
. There are a few good things in here, but my favorite by far is the
sponge(1)
command.
Explaining what sponge is and why it's useful is easiest with an example.
Suppose that you've done some complex shell pipeline and redirected the output
to a file called file.txt
. You realize that you've accidentally included some
extra lines in the file, and you want to remove them using grep
. Conceptually
what you want to do is:
# incorrectly try to remove lines from file.txt
grep -v badpattern file.txt > file.txt
The issue is that this doesn't work---or at least, it doesn't do what you might
expect. The problem is that when you use the >
operator in bash it immediately
opens the file for writing, truncating it. This happens before the first part of
the pipeline runs: in this case, before the grep
command is invoked. This
means that when grep
runs it will see file.txt
as an empty file and
consequently this shell invocation will always leave you with an empty file.
One way to fix this is to use an intermediate output file, perhaps like this:
# correct version, using a temporary file
grep -v badpattern file.txt > tmp.txt
mv tmp.txt file.txt
This works because the output is fully written to a new file before replacing the input file. But it's two steps, and if you're like me you're likely to end up with a lot of random semi-processed files that you need to clean up later.
There's a better way to do this using sponge. You do it like this:
# correct version, using sponge
grep -v badpattern file.txt | sponge file.txt
What happens here is that sponge will buffer all of the data it gets on stdin
into memory. When it detects the EOF condition on stdin it will then write all
of the data it buffered to a file named by its argument. By convention you would
use the input file as this argument. The end result is that file.txt
won't be
truncated until after it's been fully read by the left hand side of the pipe.
The only caveat to be aware of is that because the output is first buffered into
memory, you may run into problems if the output file is too large (i.e. larger
than the amount of free memory you have). However I've very rarely found that to
be the case, and I'm a happy regular user of sponge.