My Philosophy on "Dot Files"

July 7, 2015

This is my philosophy on dot files, based on my 10+ years of being a Linux user and my professional career as a sysadmin and software engineer. This is partially also based on what I've seen in developer's dotfiles at the company I currently work for, which has a system for managing and installing the dotfiles of nearly 1000 engineers.

When I started using Linux, like every new Unix user I started cribbing dot files from various parts of the internet. Predictably, I ended up with a mess. By doing this, you get all kinds of cool stuff in your environment, but you also end up with a system that you don't understand, is totally nonstandard, and is almost always of questionable portability.

In my experience this is less of a problem for people who are software engineers who don't have to do a lot of ops/sysadmin work. A lot of software engineers only do development on their OS X based computer, and possibly a few Linux hosts that are all running the exact same distro. So what happens is if they have an unportable mess, they don't really know and it doesn't affect them. That's great for those people.

When you start doing ops work, you end up having to do all kinds of stuff in a really heterogenous environment. It doesn't matter if you work at small shop or a huge company, if you do any amount of ops work you're going to admin multiple Linux distros, probably various BSD flavors, and so on. Besides that (or even if you have a more homogeneous environment), you end up having to admin hosts that are in various states of disrepair (e.g. failed partially way through provisioning) and therefore might as well be different distros.

Early on, the (incorrect) lesson I got out of this was that I needed to focus on portability. This is really hard to do if you actually have to admin a really heterogeneous environment. For a few reasons. For a starter, even the basic question of "What kind of system am I on?" is surprisingly hard to answer. The "standard" way to do it is to use the lsb_release command... but as you would guess, this only works on Linux, and it only works on Linux systems that are recent enough to have a lsb_release command. If you work around this problem, you still have the problem that it's easy to end up with a huge unreadable soup of if statements that at best is hard to understand, and frequently is too specific to really correct anyway. You might think that you could work around this by doing "feature testing", which is actually the right way to solve the problem, but this is notoriously hard to do in a shell environment and can again easily make the configuration unreadable or unmaintainable.

It gets even worse for things like terminal based emulators. The feature set of different terminal emulators like xterm, aterm, rxvt, and so on varies widely. And it gets even more complicated if you're using a "terminal multiplexer" like screen or tmux. God forbid you try to run something in a vim shell or Emacs eshell/ansi-term. Trying to detect what terminal emulator you're under and what features it actually supports is basically impossible. Even if you could do this reliably (which you can't because a lot of terminal emulators lie), the feature set of these terminal emulators has varied widely over the years, so simply knowing which terminal emulator you're using isn't necessarily enough to know what features it supports.

As I became a more seasoned Linux/Unix user, what I learned was that I should try to customize as little as possible. Forget those fancy prompts, forget the fancy aliases and functions, and forget the fancy 256-color terminal emulator support. The less you customize the less you rely on, and the easier it becomes to work on whatever $RANDOMSYSTEM you end up on. For a number of years the only customization I would do at all was setting PS1 to a basic colorized prompt that included the username, hostname, and current working directory---and nothing else.

Recently I've softened on this position a bit, and I now have a reasonable amount of configuration. In the oldest version of my .bashrc that I still track with version control (from 2011, sadly I don't have the older versions anymore), the file had just 46 lines. It has a complicated __git_ps1 function I cribbed from the internet to get my current git branch/state if applicable, sets up a colorized PS1 using that function, and does nothing else. By 2012-01-01 this file had expanded to 64 lines, mostly to munge my PATH variable and set up a few basic aliases. On 2013-01-01 it was only one line longer at 65 lines (I added another alias). On 2014-01-01 it was still 65 lines. At the beginning of this year, on 2015-01-01 it was 85 lines due to the addition of a crazy function I wrote that had to wrap the arc command in a really strange way. Now as I write this in mid-2015, it's nearly twice the size, at a whopping 141 lines.

What changed here is that I learned to program a little more defensively, and I also got comfortable enough with my bash-fu and general Unix knowledge. I now know what things I need to test for, what things I don't need to test for, and how to write good, portable, defensive shell script. The most complicated part of my .bashrc file today is setting up my fairly weird SSH environment (I use envoy and have really specific requirements for how I use keys/agents with hosts in China, and also how I mark my shell as tainted when accessing China). Most of my other "dot files" are really simple, ideally with as little configuration as possible. Part of this trimming down of things has been aided by setting up an editor with sensible defaults: for real software engineering stuff I use Spacemacs with a short .spacemacs file and no other configuration, and for ops/sysadmin stuff I use a default uncustomized vi or vim environment.

Which brings me to the next part of this topic. As I mentioned before, the company I work at has nearly 1000 engineers. We also have a neat little system where people can have customized dot files installed on all of our production hosts. The way it works is there's a specific git repo that people can clone and then create or edit content in a directory that is the same as their Unix login. The files they create in that directory will be installed on all production hosts via a cron that runs once an hour. A server-side git hook prevents users from editing content in other user's directories. This system means that generally users have their dot files installed on all hosts (with a few exceptions not worth going into here), and also everyone can see everyone else's checked in dot files since they're all in the same repo.

People abuse this system like you would not believe. The main offenders are people who copy oh-my-zsh and a ton of plugins into their dot files directory. There are a few other workalike systems like Bashish (which I think predates oh-my-zsh), but they're all basically the same: you copy thousands of lines of shell code of questionable provenance into your terminal, cross your fingers and hope it works, and then have no idea how to fix it if you later encounter problems. Besides that, I see a ton of people with many-hundreds-of-lines of configuration in their bash/zsh/vim/emacs configuration that are clearly copied from questionable sources all over the internet.

This has given me a pretty great way to judge my coworkers' technical competency. On the lowest rung are the gormless people who have no dot files set up and therefore either don't give a shit at all or can't be bothered to read any documentation. Just above that are the people who have 10,000 lines of random shell script and/or vimscript checked into their dot files directory. At the higher levels are people who have a fairly minimal setup, which you can generally tell just by looking at the file sizes in their directory.

If you want to see what differentiates the really competent people, here are a few things I sometimes look for:

proper and safe quoting!
understanding what variables ought to be exported
correct usage and quoting of $# and $* comes up a lot in more advanced bash functions
usage of aliases and functions that demonstrates that the person actually understands which is appropriate to use in which circumstance
anyone who knows how to use an if statement with a command other than [
bonus points to anyone who has a .profile or .bash_profile that actually does what a login shell is supposed to do (e.g. doesn't source .bashrc or export things like prompts)