I've been writing a lot of Go recently at work, and I made an interesting tool
for navigating my Go projects. The problem I ran into is that Go enforces a
rigid directory structure that is very nested. I have my GitHub projects in
~/go/src/github.com/ORG/REPO
, my work projects in
~/go/src/my.work.place/WORKORG/REPO
, and various other more complicated
directory layouts for projects cloned from other places. Remembering where I had
cloned things was a challenge and a lot of typing.
I wanted to be able to type something like gocd foo
and then automatically be
taken to the right project directory for foo
, wherever that was. My first
attempt at this was in Bash and looked like this:
gocd() {
local bestdir=$(find "${GOPATH}/src" -type d -name "$1" | awk '{print length, $0}' | sort -n | head -n 1 | cut -d' ' -f2-)
if [ -n "$bestdir" ]; then
cd "$bestdir"
else
return 1
fi
}
This finds the shortest matching directory in my $GOPATH
and worked great. The
problem is that it also explores .git/
and vendor/
directories. This isn't a
problem when my filesystem caches are warmed up, but when they're cold (e.g.
after booting) this command runs slowly if there are a lot of repos checked out.
I had the idea to rewrite this in C++ with the following design constraints:
- When encountering a directory containing a
.git/
subdirectory, don't descend into any of its subdirectories, as this is a cloned project. This logic will implicitly ignorevendor/
directories. - Only explore the shortest path option at any point while walking the
filesystem. By doing this the first directory encountered whose name matches
the search target will also be the best match. This means that a lot of I/O
can be saved if there are a large number of cloned projects under
$GOPATH
.
I coded this all up and it's indeed very fast. I can find a project in under 20 milliseconds with my filesystem caches completely cold, compared to more than one second before. When the filesystem caches are warm the command completes in one or two milliseconds.
The current implementation uses a std::vector<std::string>
of directory
candidates, where the vector is sorted by string size. This allows finding the
best candidate efficiently (it's at the back of the vector). For the same reason
each candidate can be removed from the search list efficiently, since it just
requires popping the back element from the vector. In theory this could be made
even more efficient by either using my own custom merge sort (since I have one
sorted vector and one unsorted vector), or by using a min-heap keyed on string
size. In practice I'm not sure that either of these would actually improve
things since my current solution correctly optimizes for minimizing disk I/O
which is by far the most expensive part of the search.
To actually use this I install the C++ executable as _smartcd
and then have a
Bash wrapper called gocd()
to tie things together:
gocd() {
if [ -z "${GOPATH}" ]; then
return 1;
fi
local best=$(_smartcd "${GOPATH}/src" "$1")
if [ -n "$best" ]; then
cd "$best"
else
return 1
fi
}
If you find this type of thing useful you can find it on my GitHub at github.com/eklitzke/tools. There are some other CLI tools in here written in C++ that I use for other things in my Bash prompt too. As with all my other projects, all of the code is licensed under the GPL.