Gocd

September 23, 2016

I've been writing a lot of Go recently at work, and I made an interesting tool for navigating my Go projects. The problem I ran into is that Go enforces a rigid directory structure that is very nested. I have my GitHub projects in ~/go/src/github.com/ORG/REPO, my work projects in ~/go/src/my.work.place/WORKORG/REPO, and various other more complicated directory layouts for projects cloned from other places. Remembering where I had cloned things was a challenge and a lot of typing.

I wanted to be able to type something like gocd foo and then automatically be taken to the right project directory for foo, wherever that was. My first attempt at this was in Bash and looked like this:

gocd() {
    local bestdir=$(find "${GOPATH}/src" -type d -name "$1" | awk '{print length, $0}' | sort -n | head -n 1 | cut -d' ' -f2-)
    if [ -n "$bestdir" ]; then
        cd "$bestdir"
    else
        return 1
    fi
}

This finds the shortest matching directory in my $GOPATH and worked great. The problem is that it also explores .git/ and vendor/ directories. This isn't a problem when my filesystem caches are warmed up, but when they're cold (e.g. after booting) this command runs slowly if there are a lot of repos checked out.

I had the idea to rewrite this in C++ with the following design constraints:

When encountering a directory containing a .git/ subdirectory, don't descend into any of its subdirectories, as this is a cloned project. This logic will implicitly ignore vendor/ directories.
Only explore the shortest path option at any point while walking the filesystem. By doing this the first directory encountered whose name matches the search target will also be the best match. This means that a lot of I/O can be saved if there are a large number of cloned projects under $GOPATH.

I coded this all up and it's indeed very fast. I can find a project in under 20 milliseconds with my filesystem caches completely cold, compared to more than one second before. When the filesystem caches are warm the command completes in one or two milliseconds.

The current implementation uses a std::vector<std::string> of directory candidates, where the vector is sorted by string size. This allows finding the best candidate efficiently (it's at the back of the vector). For the same reason each candidate can be removed from the search list efficiently, since it just requires popping the back element from the vector. In theory this could be made even more efficient by either using my own custom merge sort (since I have one sorted vector and one unsorted vector), or by using a min-heap keyed on string size. In practice I'm not sure that either of these would actually improve things since my current solution correctly optimizes for minimizing disk I/O which is by far the most expensive part of the search.

To actually use this I install the C++ executable as _smartcd and then have a Bash wrapper called gocd() to tie things together:

gocd() {
    if [ -z "${GOPATH}" ]; then
        return 1;
    fi
    local best=$(_smartcd "${GOPATH}/src" "$1")
    if [ -n "$best" ]; then
        cd "$best"
    else
        return 1
    fi
}

If you find this type of thing useful you can find it on my GitHub at github.com/eklitzke/tools. There are some other CLI tools in here written in C++ that I use for other things in my Bash prompt too. As with all my other projects, all of the code is licensed under the GPL.