missives from evan klitzke…



I just want to quote Kanye West from his most recent interview in Vanity Fair:

I think if Michelangelo was alive or Da Vinci was alive, there's no way that they wouldn't be working with shoes, as a part of what they work on. Definitely one of the things they'd work on would be shoes. I've gone three years without a phone. I don't go a day without shoes.

The Thrill Of Riding


There is one experience that I will never be able to explain to another human being—the thrill of riding a bicycle. Some people get it, and some people don't. If you get it, you know exactly what I'm talking about, and if you don't, well you don't.

My Background

I grew up in a middle class community in the East Bay called San Ramon. Most people who are from the Bay Area have heard of San Ramon, but don't know a lot about it. What you need to know about it is that it was mostly built up in the '70s as a bunch of conservative yuppies who couldn't afford to live in SF or the South Bay, and who didn't want to slum it up in Oakland/San Leandro/Castro Valley/etc. In 1983 SBC (formerly Southen Bell Corporation, now a part of AT&T) moved its headquarters to San Ramon. In 2011 Chevron moved their headquarters from San Francisco to San Ramon. Ironically, I'm probably about to move into the old Chevron Headquarters at 555 Market as part of my current job, which is going to be weird for me knowing so many people who's parents were big deals at Chevron in San Ramon.

So growing up in San Ramon, bikes and bike lanes are not really a thing. Everyone works and rides cars to Chevron, SBC (AT&T), or else commutes by car to tech and biotech jobs in the South Bay. So everyone has a car. Nearby there is the Iron Horse Trail which is something like 40 miles end to end, but is honestly pretty lame. It is flat the entire distance so no serious cyclist cares about it, and it's full of little kids on tricycles learning how to ride their bikes and taking up the entire space without regard for people like me trying to blast down at high speeds. It sucks. The hardcore cyclists are out ascending Mt Diablo, but Mt Diablo is classified as an HC Climb, and while I might be close to being able to do something like that now, I definitely wasn't even close when I was in high school.

The Scene

When I started college at UC Berkeley I bought my first "real" bike—a Surly Cross Check that I bought throught the awesome folks at Missing Link Bicycle Cooperative. That bike ended up getting stolen during a road trip I was on with a friend while I had it locked up outdoors from the (fucking awesome) India House Hotel which was in a seriously hood neighborhood of New Orleans. But that bike taught me the thrill of riding in real urban environments—down Shattuck and Telegraph in Berkeley, down Telegraph, San Pablo, MLK, and Adeline in Oakland, down one of my favorite streets in the world Peralta in Oakland, and many others. There's something just incredible weaving through city traffic, avoiding railroad tracks, avoiding 3'x'3' pot holes, and the whole thing while blasting throught at max speed. Plus, in Oakland you will see the craziest most legit graffiti pieces in industrial areas in West Oakland where you literally go from one block with a huge multi-day graffiti piece and nearby there are buildings with bullet holes in the windows. Literally the corner store I lived next to had bullet stores in its windows, and my nearby go-to burrito join had bullet-proof glass you had to slide money through to pay for your meal. I remember walking through that neighborhood and seeing an altercation 100 ft away from me where someone pulled out a hand gun at point blank range on another homie; and I remember getting the fuck out of there as fast as I could. This is also when I started carrying a pocket knife on me at all times. Riding through these neighborhoods during the day is a trip in itself, but being restless and blasting through the neighborhood with insomnia at 3am when no one else is out is fucking beyond incredible.

Around this time is when I started doing community rides like SF Critical Mass, East Bay Bike Party, SF Bike Party, and Midnight Mystery Ride. You will meet the most fun people, from super casual people just checking it out to hardcore fucked up 19 y/o kids attending dead end community colleges who are serious beasts in terms of strength and ride only brakeless fixed gear bikes.

I've had fun living in Berkeley, Oakland, and Pasadena/Los Angeles and going crazy in the streeths there, but by far the most fun on solo rides is the rides since I moved to San Francisco. In San Francisco you have to deal with serious fucking traffic, crazy crab drivers who don't give a shit about you, MUNI tracks everywhere that will fuck you up if you're not careful, and intense hills everywhere. I've gotten up pretty much every hill in the city (including Twin Peaks), and at this point I know if I'm going from point A to point B what's the fastest route, what's the least hilly route, which ways I can go that have bike lines or bike designated streets, and so on. Finally, after almost ten years of aggressive city riding, I finally know how to get out of the saddle and really throw my weight around to do crazy turns on a dime. If a car pulls in front of me and throws its breaks on, I know how to throw my weight around and put mad English on the bike to swerve around pretty much everything.

I fucking love my crazy commute every morning from Valencia down Market St, which is full of the most intense car weaving, snaking through car lines, and running lights you can imagine. But beyond that, sometimes at 11pm or 12am I'll get restless and decide to check out the night life scene in the city. I'll go down Mission, check out 16th St, bike down all of the shitty night "clubs" on or near Folsom, and then go up to Natoma to check out Tempest. Then I'll do my favorite thing which is check out Polk St which is all kinds of fucked up. Usually I come riding southbound from North Point all the way up Polk down to Market. When you top Polk, basically the whole way down until O'Farrell is shitty frat bro/sorority girl bars, and everyone is drunk as fuck and wandering out in the street, taxis, Ubers, Lyfts, and SFPD are going crazy weaving around trying to pick up people or arrest them, and it's insane on a bike. Once you hit O'Farrell, if you have no fear and really go HAM on your bike you can go 25+ mph and make every light from O'Farrell down to Market which is fun as hell and scary as fuck.

Blasting through the Sunset is amazing too. You can get onto Kirkham and it's a bike lane the whole way. And there's not a lot of traffic, but there are stop signs everywhere. Ifyou get into the right speed you can blast down Kirkham looking for oncoming cross-traffic and make crazy speed getting up and down hills. And that takes you into The Wiggle which is a whole separate class of weaving and scofflawing.

The point being—biking is a whole thrill in and of itself. When I'm doing more road biking I get really into my vertical elevation gain, and get really into how many vertical feet I'm doing a week. But the scene is so much fore than that. Endangering your life in intense city traffic is one of the most exhilirating things I can think of. It's a fucking blast.

If you're not a serious cyclist, and want to get into it, hit me up and I'll show you a good time.

BitlBee & HipChat


The chat system we use at work is HipChat. HipChat offers a web client (HTML and Javascript) as well as native clients for Windows, OS X, and Linux. It also offers an XMPP gateway, so if you don't want to use the native client you can use any other XMPP chat client. The XMPP gateway is a little hacky because HipChat has extended the XMPP protocol with proprietary attributes to add custom features, e.g. HipChat has both the concept of a display name (i.e. the user's full real name) as will as an @mention name that is used to alert people in chats. HipChat does not have a native IRC gateway.

I was really unhappy with the native Linux HipChat client for a number of reasons. I found it to be really slow and it used a ton of memory. It also hasn't gotten the same amount of attention as the other native clients, and lags behind in a number of areas. Besides that, I've been using IRC for years and I've already built up a nice workflow with weechat (and previously with irssi) that I wanted to keep using.

If you want to use weechat, irssi, or some other IRC client to connect to HipChat, it turns out there's a way—BitlBee provides an XMPP/IRC gateway. The way it works is BitlBee acts to you as an IRC server implementing the IRC protocol. On the backend it will translate between IRC and XMPP (or one of many other chat protocols like AIM, ICQ, Twitter, etc.).

About three years ago when I first tried using BitlBee with HipChat, it was really rough around the edges. It worked, but barely. There were a lot of problems with how it displayed user names and the workflow for adding and joining channel. Thankfully in the last few years this has gotten a lot better. This article will explain how to get set up with BitlBee and HipChat. Once you get everything working, you'll get a neat IRC setup like this (assuming you're using a console based client):

how it looks

Apologies for the impossibly small text, but you probably get the idea.

How To Use Bitlbee With HipChat

First I would recommend taking a look at the official docs here. This will give you an overview of the state of the world, and below I will provide some more specific advice as to the workflow I use.

If you're feeling adventurous, in the upstream git repo there is a branch called feat/hip-cat that gives you almost native support for HipChat. If you use this branch, when you connect to the server you'll see users show up with their IRC nick set to their @mention name. BitlBee will apply a mangling heuristic to room names to make a best guess as to what to name them (by lowercasing the room and removing spaces and special characters). What this means is if there's a HipChat room with a name like "Engineering" you'll probably be able to join it with an IRC command like /join #engineering, which is not true if you're in the master branch. I ran on this hip-cat for about six months (until just now), and while it mostly works it is rough around the edges. I found problems in the following areas:

Additionally, you'll find that the master branch gets a lot more commits made to it than the hip-cat branch. Due to these bugs, and the fact that I wanted to follow along with all of the latest stuff in master, I have switched to the master branch and it's what I recommend.

Setting It Up

First install BitlBee. I would recommend getting it from git and building it yourself, but that's obviously optional and if you want you can use a version packaged by your favorite distro. If you compile BitlBee from source, make sure that you have an appropriate /var/lib directory to store your profile. I had to manually create /var/lib/bitlbee and set it up with the correct permissions even though I configured BitlBee to use an alternate prefix (i.e. it seems not to respect a command like ./configure --prefix=$HOME/opt in this regard, and you'll find that the ./configure script is not a real autoconf script).

You can start the server with a command like:

bitlbee -D -p 6667 -i

This will daemonize BitlBee and have it bind to If this works successfully, fire up your IRC client, connect to the IP and port you set up, and you should see a greeting banner from BitlBee.

Once you've done this, the following sequence will set up a HipChat account for you:

account add hipchat you@yourcompany.com YOUR_HIPCHAT_PASSWORD
account hipchat on

Note that the BitlBee password does not have to be the same as your HipChat password, and in fact it's a good idea to make them different. If you get an error in the register command above about not being able to write a file (as I initially did), make sure that you have a /var/lib/bitlbee directory on your system and that it's writeable by the BitlBee process.

In the future when you connect to BitlBee you'll be able to re-authenticate by using:


This will log you in and restore all of the channels that you've set up.

Managing Channels

Now that you're logged in and you've created an account, it's time to add some channels. Go to YOURCOMPANY.hipchat.com/account/xmpp to get a channel list and the XMPP names. Let's say that a channel is named 9999_foo and you want it to be mapped locally to the IRC channel whose name will be #bar. To do that, you'd use the following command in the BitlBee control window:

chat add hipchat 9999_foo@conf.hipchat.com #bar

After this you should be able to /join #bar and join the channel.

In the future, you may want to delete channels you've created, change their configuration, set up auto-joining, etc. This is a little bit cumbersome. What you need to do here is to know the internal BitlBee channel id that was given to the channel. You can see that with:

channel list

This will print the list of channels and their numbers.

To delete a channel:

channel CHANNEL_ID delete

To set a channel to auto-join:

channel CHANNEL_ID auto_join true

Chatting With Users

Users will show up as their @mention name. If you need to know a user's real name, you can use the standard /whois command in IRC to get that information. I do not know how to do the opposite, that is search for a user's @mention name (i.e. IRC nick) based on their real name.

Once you know a person's name you can /msg nick in the usual way to chat with users.

Mission Bicycle w/ SRAM Automatix Hub


Today I ordered a new bicycle through MASH SF. I bought a fixed gear track bike built up from the Cinelli MASH Histogram frame. I'm going to write about this bike once I've had it for a while, but for now I want to write about the bike that the Histogram will be semi-replacing.

About a year and a half ago I bought a bike from Mission Bicycle Company. If you're not familiar with them, they make their own steel frames with horizontal dropouts which means that their bikes are built as single-speed/fixed gear or with internally geared rear hubs (in contrast to how most road bikes are built with a rear derailleur). I already had a Surly Pacer and was looking to build something a bit cooler and more unique. Thus, I built up this bike in a pretty unusual way. I asked them to build it with an SRAM Automatix hub. This is an interally geared rear hub with a "centrifugal clutch", and is the only such hub on the market that I know of. The way this works is when the rear wheel is spinning at about 11.5mph the rear wheel shifts "up" with a 137% step. These are pretty unusual—I ride a lot and check out other people's bikes somewhat obsessively as I'm riding (and walking) around, and since I got this bike I've only seen one other bike built up with this hub. Mine is built with a 46t front chainring and a 19t cog on the rear hub. This means that when the bike shifts up, it's similar to riding with a 46t/14t ratio or thereabouts. Since the shifting is automatic, there's no cabling to the rear wheel. I also built the bike with only a front brake, so the only cabling at all on the bike is a small cable going from the handlebars to the front brake caliper. The hub is really small (much smaller than other internally geared hubs you may have seen), and the small size of the hub and lack of cabling aesthetically makes it look like I'm riding a fixed gear or single-speed bike.

my bike

The rest of the bike was built with mid tier components and with bull handlebars. I got security bolts for the front wheel and security bolts for the seat. The whole thing cost something like $1300 with tax which is kind of pricey for an unbranded bike with two gears, but includes the price of the hub and a fancy seat. One change I did make a few weeks after buying the bike is to get the traditional pedals with steel toe clips swapped out for double-sided SPD mountain biking pedals. I thought I was going to want standard pedals so I could get around town with regular shoes, but I found that after riding for so long with SPD cleats on my road bike I just liked the feeling of being clipped in too much to go without. I just throw an extra pair of shoes in my backpack if I'm heading out (and at work I have an extra pair of regular shoes for the office).

The main function of this bike has been as a commuter bike, although I have done a few intense weekend rides (2000' elevation gain) on this bike. As a commuter bike, this bike is awesome. I'm really happy with it. Because it's not branded and is built up with security bolts everywhere, I feel like I can lock it up anywhere in the city with only the rear wheel/triangle locked. This means I only carry a single mini U-Lock and I haven't had any issues.

I can go surprisingly fast on this thing. I'm a pretty aggressive rider in general, so I'm always sprinting wherever I go, but that said I'm able to pass people in spandex on expensive carbon fiber bikes all the time on straight stretches. Over the distances you can typically go in the city between lights, the lack of gears isn't a problem at all, and actually when the higher gear is engaged the ratio is pretty high. I've timed my work commute on my road bike, and it takes me exactly the same amount of time to commute on both bikes (the commute time is mostly dependent on the lights I hit or don't hit). I have a weekend loop that I do, and the loop that I do from my apartment to the top of the Legion of Honor takes 40-45 minutes on my road bike and is about 700' of climbing. The loop takes about 50 minutes on the Mission Bicycles bike, and I'm able to get up from the bottom of Sea Cliff to the top of Legion of Honor which is a Category 4 climb on Strava. The steepest segment I've gotten up on this bike is the 43rd Ave Climb which is 275' and is an 11% grade on the steep block. My time on this bike on the 43rd Ave segment is actually only a few seconds off from my road bike, although subjectively the climb is a lot harder on the Automatix hub.

Overall I'm pretty happy with the Automatix hub, but there are a few minor issues with it. The first is that because it engages the higher gear at a particular speed, there are certain grades where it's really hard to not engage it. What I mean is that there are certain grades (5% or 6%? I'm not sure) where I naturally want to get out of the saddle to climb. However, getting out of the saddle delivers enough power that I end up hitting the speed threshold, and all of a sudden the climb is beyond my max power output. If this happens I have to stop pedaling for a fraction of a second and then try to re-engage the bike on the lower gear. I've never actually fallen this way, but I've gotten pretty close a few times, and it's pretty scary when it happens. To counteract this I end up having to be really careful about my cadence on certain hills which is annoying and inefficient. This tends to be an issue for me when navigating The Wiggle where I have to bike at a really unnatural speed unless I want to go all out and bike up in the high gear.

Another problem is that sometimes the gear can egnage unexpectedly at lower speeds. This typically happens when going over potholes or bumps in the road, especially when I'm close to the shifting speed. Again, this is particularly annoying on hills where it can be dangerous. Another thing I've noticed is that when going below the shifting speed, if you keep pressure applied to the cranks in just the right way you can keep the gear engaged in the higher gear basically all the way down to when you're stopped. This doesn't happen too frequently, but it's caught me off guard when coming to a rolling stop at stop signs. What happens is I'll get down to nearly 0mph, and then try to start pedaling with the cranks vertical. When the cranks are vertical I have almost no mechanical advantage, so I've almost fallen before when this happens.

The last issue is that the hub has the tendency to rattle when going over bumps/potholes. I haven't had any mechanical problems as a result of this, so I don't think it indicates a real problem, but it's definitely disconcerting. On a previous bike I once had the bottom bracket slip out when going over a pothole, and I occassionally get flashbacks of this happening, or worry that my rear wheel is going to slip out of the rear dropouts since a loose rear wheel can also cause rattling.

Despite these minor issues, I want to reiterate that I like this bike a lot, and I plan to keep it and probably still mostly commute on it. I will definitely continue to use this bike for around-town chores and when I have to go places where I'll have to lock my bike up outside. The only thing I'd really consider changing if I were to rebuild it is the lack of a rear brake. 99% of the time I don't need the additional braking power, but descending from the Legion of Honor or down Clipper from the top of Portola can be really terrifying since the front brake will get really hot and loud and it's literally my only braking mechanism.

The Commodification of Databases


There's a conversation that I've had a few times in the last ~10 years that I've been a software engineer, that goes something like this. Databases are known to be a hard problem. Databases are the central scalability bottleneck of many large engineering organizations. At the same time, major tech companies like Google and Facebook have famously built in-house proprietary databases that are extremely robust, powerful, and scalable, and are the lynchpins of their business. Likewise, both open source and commercial database offerings have been rapidly evolving to meet the scalability problems that people face. So the question is: will we reach a point where one or a small number of open source or commercial databases become so good, so powerful, and highly available, and so horizontally scalable that other solutions will fall by the wayside? That is, will we converge on just a few commodified databases?

A number of smart people I've worked with seem to think that this is the case. However, I'm skeptical, or think that if it is the case it will be a very long time coming.

The first part of my reasoning here hinges on why the database world has become so fragmented today. When the first relational databases came to market in the late seventies and early eighties they actually came to market in co-evolution with the SQL standard. The first commercial SQL database on the market was from Oracle in 1979 followed very closely by SQL databases from other companies like IBM (who published their first SQL database in 1981). The reason that these companies came to market with SQL databases at nearly the same time is because the SQL standard had been a number of years in the making at that point. EF Codd published his seminal paper "A Relational Model of Data for Large Shared Data Banks" in 1970, and throughout the 1970s people at IBM and elsewhere had been designing the SQL standard.

For at least 25 years SQL reigned supreme as the de facto way to query and interact with large databases. The standard did evolve, and there were always other offerings, especially in the embedded world or in special purpose applications like graph databases or hierarchical databases. But the reality is that for several decades, when people were talking about large scale databases they were nearly always talking about large scale SQL databases.

However, if you want to build a really big highly partitioned database, SQL is not a great way to go. The relational data model makes partitioning difficult—you generally need application specific knowledge to efficiently partition a database that allows joins. Also, in many cases the SQL standard provides too much functionality. If you relax some of the constraints that SQL imposes there are massive performance and scaling wins to be had.

The thing is, while SQL certainly has a lot of defficiencies, it's at least a standard. You can more or less directly compare the feature matrix of two relational databases. While every benchmark has to be taken with a grain of salt, the fact that SQL is a standardized language means it's possible to create benchmarks for different SQL databases and compare them that way. The standardization of SQL is a big part of the reason why it's kept hold for so long.

Non-relational databases and NewSQL databases like Riak, Cassandra, HBase, CockroachDB, RocksDB, and many others all take the traditional SQL database model and modify it in different ways, in some cases drastically. This means these databases are hard to directly compare to each other because their features and functionality differ so much.

There are also orders-of-magnitude improvements to be had by relaxing certain durability guarantees and by having certain special purpose data structures. In the most extreme example, a database that consists of an append-only log can max out hard drives on write speed even though such a database would be infeasibly inefficient to query. You could still think of such a thing as a database, and this is similar to what something like syslog or Scribe or Kafka is. Likewise, a database that consists of a properly balanced hash table can provide extremely efficient reads for single keys at the cost of sacrificing the ability to do range queries and at the cost of potentially expensive rebalancing operations. For instance, in the most extreme example a read-only database like cdb can do reads with one disk seek for misses and two disk seeks for successful lookups. There are so many different tradeoffs here in terms of durability gurantees, data structures, read performance, write performance, and so on that it's impossible to prove that one database is more efficient than another.

Even in more general purpose databases that do log-structured writes and can efficiently perform range queries, the specifics of how you do writes, how you structure your keys, the order you insert keys, how pages are sized, etc. can make huge differences in efficiency.

One final thing to consider on this efficiency aspect. Missing indexes can turn operations that should be blazing fast into ridiculously efficient table scans. It's easy to completely saturate the I/O on a hard drive (even an SSD) with a few concurrent table scans. Some of the smarter databases will prevent you from doing this altogether (i.e. they will return an error instead of scanning), but even then it's easy to accidentally write something tantamount to a table scan with an innocent looking for loop. In many cases it's straightforward to look at a query and with some very basic cardinality estimates come up with a reasonable suggestion index for that query, so in theory it's possible to automatically suggest or build appropriate indexes. However, one can just as easily run the risk of having too many indexes, which can be just as deleterious to database performance. Automatically detecting the database-query-in-a-for-loop case is certainly possible but not trivial. In other words, there is no silver bullet to the indexing problem.

The point is, once you move beyond SQL all bets are off. If we as an industry were to standardize on a new database query protocol it might be possible to return to a somewhat normal world where it's possible to compare different database products in a sane way—but today it's not. Even internally at Google, which is famous for probably having the most state of the art database products, there isn't a single database used—different products there are built on different technologies like BigTable, Megastore, Spanner, F1, etc.

Maybe one day we will reach the point where servers are so cheap and so powerful that people can have massive petabyte scale databases with microsecond lookups on commodity hardware. Then maybe the hardware will truly be so fast that the specifics of the software implementation will be irrelevant (although even then, the datastructures will have different running complexity and therefore benchmarks will still show "big" wins for one technology or another). However, I think that those days are quite a ways off. In the meantime, there will be a lot of companies that need terabyte and petabyte scale databases, there will be a lot of small startups that evolve into tech giants, and there will be a need to build special, custom purpose databases.

I'm not prescient, and the computer industry is one where long term predictions are notoriously difficult. However, at this point databases have been considered a difficult problem for more than fifty years. If nothing else, purely based on the longevity of the challenge thus far I think it's reasonable to assume that it will still be considered a challenging problem in computer science at least a decade from now, if not a few decades from now.



Today I wrote some unit tests for the code that does the static html/xml generation for this blog. I was motivated to do this after my friend James Brown pointed out some bugs he had noticed on the site.

To add the tests, I had to significantly refactor things. Previously the whole thing was a single 251 line Python script. To add the tests, I had to refactor it into an actual Python module with different components, create a setup.py file with console_scripts entry points, create a requirements.txt file, set up pytest, and so on. The tests validate a bunch of things and do crazy things with lxml and XPath queries to do validation of the blog content about what things should and should not be present in the generated files. All in all, the refactored code is a lot easier to test and reason about, but it's also a lot more complicated which is a bit unfortunate.

The reason I expended this this considerable amount of work instead of just dropping in the original one-liner fix is is what one might consider a security or privacy bug. A while back I had written some content that I wanted to share with friends, but that I didn't want linked anywhere or crawled. I figured a clever solution here would be to reuse the blog generation stuff I already have here so I'd get the nice CSS and whatnot, but just add a mode that would cause these pages to not be linked from the index page. I never created a robots.txt file for these pages since by nature of its existence such a file publicizes the secret URLs.

This all worked great, except for one little bug. When I generate the static site content, I also generate a file /index.rss which is picked up by RSS readers for feed syndication. The code generating the RSS file didn't know about the hidden page feature, so these hidden pages ended up in the RSS feed. I didn't notice this since I don't subscribe to the RSS feed for my own site. As a result of this bug, not only was the content visible to people browsing via RSS, it was also actually indexed by Googlebot. I was able to confirm this by doing a Google query with site:eklitzke.org my-specific-search-term. Interestingly, these pages were not indexed by Yahoo or Bing which suggests to me that the crawling backend for Google is unified with their RSS crawler, whereas the same is not true of Yahoo/Bing.

Besides fixing the root bug, all pages I generate in this way (i.e. not linked from the main index page) now specifically use meta noindex feature just in case they are ever linked to again. This is functionally similar to a robots.txt file but doesn't publicize the URLs. I also registered my site with the Google webmaster tools and explicitly requested that they take down the URLs I didn't want indexed.

All is good now. I guess the moral of the story is that for any program that is even remotely interesting, it's worth spending a bit of time to write tests. And hat tip to James for reporting the issue in the first place.

Rendering Videos From OpenGL Is Hard


Lately I've been getting into OpenGL since I love looking at awesome animations and visualizations, and want to start making my own. After a few false starts, I've settled on writing these programs in C++. Besides the great performance that C++ has, C++ seems to have the widest array of bindings to different parts of the graphics and math ecosystem. In C++ I have native bindings to all of the usual C libraries, plus great C++ libraries like GLM that are high performance, actively maintained, and tailored for graphics programming. The stuff I'm doing right now isn't stressing the capabilities of my graphics subsystem by any stretch of the imagination, but I have ambitions to do more high performance stuff in the future, and having familiarity with the C++ OpenGL ecosystem will come in handy then.

The downside is that it's kind of hard to share these programs with others. This is one area where WebGL really shines—you can just send someone a link to a page with WebGL and Javascript on it and it will render right there in their browser without any futzing around. On the other hand, I've seen a lot of videos and animated gifs on Tumblr that are fairly high quality, so I thought that perhaps I could figure out one of these techniques and share my animations this way.

As it turns out, this is surprisingly difficult.

Animated GIFs

I've seen a lot of awesome animations on Tumblr using animated GIFs. For instance, this animation and this animation are pretty good in my mind. After I looked into this more, I realized that these images are very carefully crafted to look as good as possible given the limitations of GIF, and that this is a really bad solution for general purpose videos.

The most important limitation of GIF is that it's limited to a color palette of 256 colors. This means that for images with fine color gradients you need to use dithering which looks bad and is kind of hard to do anyway. It is possible in GIF to use what is called a "local color table" to provide a different 256 color palette for each frame in the animation, so in a global sense you can use more than 256 colors, but within each frame pixels are represented by 8 bits each and therefore you're limited to 256 colors.

Besides the color limitation, generating animated GIFs is pretty difficult. I was pleased at first to find GIFLIB, but if you look at the actual API it gives you it's incredibly low level and difficult to use. For instance, it can't take raw RGB data and generate a GIF automatically: it's your job to generate the color palette, dither the input data, and write out the raw frame data.

There are a few other libraries out there for working with GIF images, but the main alternative, and I suspect what most people are using, seems to be ImageMagick/GraphicsMagick. What you would do in this model is generate a bunch of raw image frames and then stitch them together into an animation using the convert command. There are some great documents on how to do this, for instance this basic animation guide and this optimized animation guide. However, once I really started looking into this it started seeming rather complicated, slow, and weird.

The other thing that I realized is that the good looking GIF animations I was seeing on Tumblr are mostly images that are carefully stitched together in a loop. For instance, if you look at the images I posted previously on a site like GIF Explode you'll see that there's only a small number of frames in the animation (20-30 or so). This is a lot different from posting a 10 second animation at 24fps which will be 240 frames, potentially each with their own local color table.

As a result of these limitations, I decided to abandon the GIF approach. If I do any animations that can be represented as short loops I will probably revisit this approach.

HTML5 Video

The other option I explored was generating an mp4 video using the ffmpeg command line tool. This is an attractive option because it's really easy to do. Basically what you do is call glReadPixels() to read your raw RGB data into a buffer, and then send those pixels over a Unix pipe to the ffmpeg process. When you invoke ffmpeg you give it a few options to tell it about the image format, color space, and dimensions of the input data. You also have to tell it to vertically flip the data (since it has the opposite convention of OpenGL here). The actual invocation in the C++ code ends up looking something like:

FILE *ffmpeg = popen("/usr/bin/ffmpeg -vcodec rawvideo -f rawvideo -pix_fmt rgb24 -s 640x480 -i pipe:0 -vf vflip -vcodec h264 -r 60 out.avi", "w");

Then data can be sent to the file object in the usual way using fwrite(3) to send over the raw image data. After the pipe is closed the ffmpeg process will quit, and there will be a file out.avi with the encoded video.

This actually generates an output file that looks pretty decent (although the colors end up looking washed out due to some problem related to linear RGB vs. sRGB that I haven't figured out). There are definitely noticeable video encoding artifacts compared to the actual OpenGL animation, but it doesn't seem too unreasonable.

The problem here is that when I upload a video like this to Tumblr the video ends up getting re-encoded, and then on the Tumblr pages it's resized yet again in the browser and the animation looks really bad.

In the actual animation, a frame will look like this:

a great image

When I encode the video and send it up to Tumblr, it ends up looking like like this. If you look at the source mp4 file on Tumblr it's definitely a lot worse than the reference image above, but it's not as bad as the way it's rendered on the Tumblr post page.

I may end up sticking with this technique and just hosting the files myself, since my source videos are of good enough quality (better than the re-encoded source mp4 file above), and I don't actually have that many visitors to this site so the bandwidth costs aren't a real issue.

tl;dr Posting high-quality OpenGL animations to the internet is hard to do, and I'm still trying to figure out the best solution.

Linear Algebra; or, The Folly of Youth


I was a really poor student in college. In fact, I was such a poor student that I ended up dropping out of school. I don't regret at all not having degree—it hasn't hurt me one bit, and I've had tons of great career opportunities despite my lack of diploma. However, I do frequently regret a lot of the amazing learning opportunities that I passed up.

This pain has been felt most poignantly by me when I look back on the survey course of Linear Algebra that I took at UC Berkeley, Math 110. This was a class that I took because it was a requirement from my major. It was also supposed to be the "easiest" upper division math class at Cal, and thus it was the first upper division class I took that had real proofs. I spent more time grappling with my inability to understand or create proofs than I did with the actual material. That might have been OK, but it was a subject I didn't particularly care and therefore I kind of fumbled through the class from week to week and finished the class without having really learned very much at all. Later on, when I had gotten the learning-how-to-learn-things aspect of math down, I did a lot better and actually came away from my other classes with a lot more knowledge.

As it turns out, knowing a little bit of linear algebra goes a long way when it comes to a wide range of fields in computer science. For instance, it is at the root of signal processing and computer graphics. It comes up in other places too. Recently for fun I wrote an n-body simulator. Guess what? All of those vector quantities like position and velocity that you can mostly ignore when doing analytic solutions in your college physics classes are really important when you're writing a computer simulation. Now I've been trying to get into graphics programming with OpenGL, and despite the fact that the type of work I'm doing is mostly orthographic projections, there are still vectors and matrices popping up all over the place. Linear algebra is also really helpful if you want to do number crunching on the GPU, since GPU shaders have all of this dedicated hardware for vector and matrix processing. All stuff I'm coming to learn the hard way.

Surprising Things Found While Exploring Bash


A few days ago I was making some changes to my .bashrc file and noticed a few interesting things regarding bash aliases and functions.

In my actual .bashrc file I had only the following lines that were related to setting up aliases:

alias grep='grep --color=auto'
alias ls='ls --color=auto'

if which vim &>/dev/null; then
    alias vi=vim

But here's what I got when I typed alias:

$ alias
alias egrep='egrep --color=auto'
alias fgrep='fgrep --color=auto'
alias grep='grep --color=auto'
alias l.='ls -d .* --color=auto'
alias ll='ls -l --color=auto'
alias ls='ls --color=auto'
alias vi='vim'
alias which='(alias; declare -f) | /usr/bin/which --tty-only --read-alias --read-functions --show-tilde --show-dot'
alias xzegrep='xzegrep --color=auto'
alias xzfgrep='xzfgrep --color=auto'
alias xzgrep='xzgrep --color=auto'
alias zegrep='zegrep --color=auto'
alias zfgrep='zfgrep --color=auto'
alias zgrep='zgrep --color=auto'

Weird, right? The only ones I had defined in my .bashrc were the aliases for grep, ls, and vi. Well, it turns out that my distribution has already decided to add the --color=auto stuff for me for ls, which is pretty reasonable, and I found bug 1034631 which is the origin of all of the weird grep variants automatically aliased for me. That seems a little weird, but I do understand it. However, I do find it amusing that the obscure ls variant vdir isn't colorized even though it is part of coreutils and supports colorization (perhaps I should file an RFE).

But WTF is going on with that which alias? Actually, what's going on here is pretty neat. This alias pipes the list of defined aliases and bash functions defined in the shell to which so you can can see where these come from. And if I run declare -f on my system, I see that there's actually a lot of stuff in there:

$ declare -f | wc -l

A couple of those are functions I defined in my own .bashrc file, but that accounts for perhaps 50 of those nearly 2500 lines of shell script.

Nearly all of the functions that I see listed by declare -f appear to be functions that are installed for bash command completion. There are some real gems in here. Check out this one:

_known_hosts_real ()
    local configfile flag prefix;
    local cur curd awkcur user suffix aliases i host;
    local -a kh khd config;
    local OPTIND=1;
    while getopts "acF:p:" flag "$@"; do
        case $flag in 
    [[ $# -lt $OPTIND ]] && echo "error: $FUNCNAME: missing mandatory argument CWORD";
    let "OPTIND += 1";
    [[ $# -ge $OPTIND ]] && echo "error: $FUNCNAME("$@"): unprocessed arguments:" $(while [[ $# -ge $OPTIND ]]; do printf '%s\n' ${!OPTIND}; shift; done);
    [[ $cur == *@* ]] && user=${cur%@*}@ && cur=${cur#*@};
    if [[ -n $configfile ]]; then
        [[ -r $configfile ]] && config+=("$configfile");
        for i in /etc/ssh/ssh_config ~/.ssh/config ~/.ssh2/config;
            [[ -r $i ]] && config+=("$i");
    if [[ ${#config[@]} -gt 0 ]]; then
        local OIFS=$IFS IFS='
' j;
        local -a tmpkh;
        tmpkh=($( awk 'sub("^[ \t]*([Gg][Ll][Oo][Bb][Aa][Ll]|[Uu][Ss][Ee][Rr])[Kk][Nn][Oo][Ww][Nn][Hh][Oo][Ss][Tt][Ss][Ff][Ii][Ll][Ee][ \t]+", "") { print $0 }' "${config[@]}" | sort -u ));
        for i in "${tmpkh[@]}";
            while [[ $i =~ ^([^\"]*)\"([^\"]*)\"(.*)$ ]]; do
                __expand_tilde_by_ref j;
                [[ -r $j ]] && kh+=("$j");
            for j in $i;
                __expand_tilde_by_ref j;
                [[ -r $j ]] && kh+=("$j");
    if [[ -z $configfile ]]; then
        for i in /etc/ssh/ssh_known_hosts /etc/ssh/ssh_known_hosts2 /etc/known_hosts /etc/known_hosts2 ~/.ssh/known_hosts ~/.ssh/known_hosts2;
            [[ -r $i ]] && kh+=("$i");
        for i in /etc/ssh2/knownhosts ~/.ssh2/hostkeys;
            [[ -d $i ]] && khd+=("$i"/*pub);
    if [[ ${#kh[@]} -gt 0 || ${#khd[@]} -gt 0 ]]; then
        if [[ "$awkcur" == [0-9]*[.:]* ]]; then
            if [[ "$awkcur" == [0-9]* ]]; then
                if [[ -z $awkcur ]]; then
        if [[ ${#kh[@]} -gt 0 ]]; then
            COMPREPLY+=($( awk 'BEGIN {FS=","}
            /^\s*[^|\#]/ {
            sub("^@[^ ]+ +", ""); \
            sub(" .*$", ""); \
            for (i=1; i<=NF; ++i) { \
            sub("^\\[", "", $i); sub("\\](:[0-9]+)?$", "", $i); \
            if ($i !~ /[*?]/ && $i ~ /'"$awkcur"'/) {print $i} \
            }}' "${kh[@]}" 2>/dev/null ));
        if [[ ${#khd[@]} -gt 0 ]]; then
            for i in "${khd[@]}";
                if [[ "$i" == *key_22_$curd*.pub && -r "$i" ]]; then
        for ((i=0; i < ${#COMPREPLY[@]}; i++ ))
    if [[ ${#config[@]} -gt 0 && -n "$aliases" ]]; then
        local hosts=$( sed -ne 's/^['"$'\t '"']*[Hh][Oo][Ss][Tt]\([Nn][Aa][Mm][Ee]\)\{0,1\}['"$'\t '"']\{1,\}\([^#*?%]*\)\(#.*\)\{0,1\}$/\2/p' "${config[@]}" );
        COMPREPLY+=($( compgen -P "$prefix$user"             -S "$suffix" -W "$hosts"—"$cur" ));
    if [[ -n ${COMP_KNOWN_HOSTS_WITH_AVAHI:-} ]] && type avahi-browse &> /dev/null; then
        COMPREPLY+=($( compgen -P "$prefix$user" -S "$suffix" -W             "$( avahi-browse -cpr _workstation._tcp 2>/dev/null |                  awk -F';' '/^=/ { print $7 }' | sort -u )"—"$cur" ));
    COMPREPLY+=($( compgen -W         "$( ruptime 2>/dev/null | awk '!/^ruptime:/ { print $1 }' )"        —"$cur" ));
    if [[ -n ${COMP_KNOWN_HOSTS_WITH_HOSTFILE-1} ]]; then
        COMPREPLY+=($( compgen -A hostname -P "$prefix$user" -S "$suffix"—"$cur" ));
    __ltrim_colon_completions "$prefix$user$cur";
    return 0

Huh? It turns out that when the functions are loaded into bash they're stripped of comments, so the declare -f output makes it look a bit more cryptic than it really is. If you look at the bash-completion source code, you'll see that the original is actually well commented, and that the other weird functions showing up in my shell (including a bunch of non-underscore-prefixed functions like quote and dequote) come from this package. It's still fun to look at. You know you're in for a treat whenever you see IFS being redefined. You can see a lot of fun things like hacks for the lack of case-insensitive regular expressions in awk and sed, and the rather remarkable fact that avahi can be used for host discovery from bash. From this code I also learned about the existence of bash regular expressions.

It's also interesting to see just how many commands commands have completion, and how much code has gone into all of this:

$ complete -p | wc -l

$ find /usr/share/bash-completion -type f | wc -l

$ find /usr/share/bash-completion -type f | xargs wc -l
40545 total

Wow. That's a lot of code. Who wrote it all? Does it all work? Is everything quoted properly? Are there security bugs?

There's even amusing things in here like completion code for cd. That's right, the shell builtin cd has custom completion code. You might think that to complete cd you just have to look at the directories in the current working directory... not so. The builtin cd has two important parameters that can affect its operation, the CDPATH variable (which defines pathes other than the current working directory that should be searched), as well as the shopt option called cdable_vars which allows you to define variables that can be treated as targets to cd. The bash-completion package has code that handles both of these cases.

A lot has already been written about how Unix systems, which used to be quite simple and easy to understand, have turned into these incredibly baroque and complicated systems. This is a complaint that people particularly levy at Linux and GNU utilities, which are perceived to be particularly complex compared to systems with a BSD lineage. After finding things like this, it's a hard to disagree. At the same time, this complexity is there for a reason, and I'm not sure I'd want to give any of it up.

Cars And Turn Restrictions On Market St In San Francisco


I'm a little bit late to this issue, but on June 16, 2015, the SFMTA approved changes in a plan called the "Safer Market St Project" to limit the movement that private cars can make on Market St in San Francisco. SFGate has some good coverage of the issue here and here. I want to share my thoughts on the matter.

The origin of this legislation is something called Vision Zero SF, which is a group that is trying to reduce traffic deaths in SF; their declared goal is to achieve zero traffic deaths in SF by 2024. This is an ambitious goal. San Francisco is a city with nearly a million permanent residents, and during week days when people from around the Bay Area commute into the city the population regularly exceeds that number.

Whether or not you believe it's possible to achieve the Vision Zero goal, what is interesting is to look at statistics on which intersections in the city have the most deaths. This is particularly the case for deaths caused by cars hitting pedestrians or cyclists, since anyone who walks or bikes regularly in the city knows that there are certain areas where road layout, lights, turn restrictions, etc. encourage cars to engage in behavior that is dangerous to pedestrians and cyclists. These are the areas where we can most effectively make changes or traffic restrictions that will save lives. What Vision Zero has found is that a hugely disproportionate percentage of fatalities take place on a few blocks of Market St. According to the document I just linked to, a portion of SOMA that accounts for 12% of San Francisco's streets accounts for 70% of total crashes. This is consistent with similar (and older) data from the San Francisco Bicycle Coalition on the matter regarding accidents regarding cyclists. The SFMTA did analysis on the matter and gave this public hearing presentation which outlines their research on the matter and the proposed turn restrictions (which ended up actually being implemented in June).

Opponents of the legislation pointed out that restrictions of any kind on traffic on Market St will likely lead to higher congestion on nearby streets and generally higher congestion in SOMA in general. It's hard to deny this theory: these restrictions will have a serious impact on drivers and their ability to navigate in the area around Market St. And yet, that's kind of the whole point of the legislation. The legislation is being passed to save the lives of pedestrians and cyclists, not to decrease congestion.

I was disappointed in how a lot of the legislation I read portrayed the matter as one between cars and cyclists. As you can see in the Vision Zero literature, there are many more pedestrian related deaths than cycling deaths. The report I linked to earlier showed that in 2014 there were 17 pedestrians killed in traffic related deaths, nine motorists killed in traffic related deaths, and only three cyclists killed. They do point out that San Francisco is one of the few cities in the country where cycling related deaths are increasing, but I think the fact that there are more than five times as many pedestrian fatalities speaks for itself.

As someone who rides a bicycle up and down Market St every week as part of my work commute, I strongly support the legislation that the SFMTA passed. I hope that they are able to move ahead with similar legislation in the future. Market St is incredibly dangerous to bike down, and I have frequent close encounters with cars on Market St. My personal take on the matter is that of the encouters I have with motorized vehicles, very seldom do I have problems with MUNI buses or taxis. Most of Market St in SOMA already has designated lanes for buses and taxis that puts them in the center lane away from cyclists. Market St is especially dangeorus due to the divided lanes for bus stops. This lane division was not originally anticipated when Market St was built, and consequently the right-most lane is very narrow in many places and does not have sufficient space for cyclists. Additionally, many pedestrians who are getting on or off MUNI will cross the right-most lane to get to the divided bus stop area, and in doing so cross into the way of traffic. The restrictions that the SFMTA passed apply primarily to private vehicles that are in the rightmost lane or turning into the rightmost lane, right where they're the most likely to hit pedestrians and cyclists.

Being Self-Taught


Something I've been thinking about a lot recently is this curious aspect of the profession of programming—what is it that makes some programmers better than others? Obviously this is a complicated question, and a full analysis is outside of the scope of a short blog post (not that I have all the answers anyway).

However, one thing I've been thinking about recently is this weird thing about programming which is that it is almost entirely self-taught. A lot of the people I work with, including many of the brightest ones working on the hardest problems, don't have a formal training in "computer science". Most of them have college degrees in other disciplines. But by the same token, a lot of them don't have college degrees at all.

From talking to peple who actually did pursue a degree in "computer science" from a university, frequently they did not learn at all in a classroom about any of the following things:

To be successful in the field as a programmer you have to know how to do most or all of the above.

The thing that's really interesting to me is that despite this, it's hard for me to think of any profession that's easier to get into. In fact, it's probably more correct to say: because of this it's hard for me to think of any profession that's easier to get into. Everyone in this field is primarily self-taught.

There is more content on the internet about programming than almost any other field I can think of. If you browse the front page of Hacker News regularly you will find a constant stream of new and high quality content about interesting programming related articles. Stack Overflow is full of what I can only guess must be millions of questions and answers, and if you have a new question you can ask it. For those who take it upon themselves to learn how to use an IRC client, there are hundreds or thousands of public programming-related IRC channels where you can ask questions and get feedback from people in real time.

I frequently get questions from people who asked me how I learned about X. How did I learn what the difference is between a character device and a block device? How did I learn about how virtual memory systems work? How did I learn about databases? The answer is the same as how almost every programmer learned what they know: it's mostly self taught, I spent a ton of time reading things on the internet, and I've spent a lot of time in front of a computer writing code and analyzing the runtime behavior of programs.

Another important and related observation to this is that while there is definitely a correlation between intelligence and programming ability and success, from what I have seen the correlation is a lot weaker than you might think. There are a ton of exceptionally brilliant people that I know who work in the field of computer programming who are poor programmers or who aren't successful in their career. Usually these are people who have decided to focus their interests on something else, who have settled down into a particular niche and haven't branched out of it, or who aren't interested in putting in the hard work that it would take to improve as a programmer-at-large.

It definitely helps a lot to work with smart people. It definitely helps a lot to talk and interact with other people, and to get code reviews and feedback from them. But in my estimation, the lion's share of what there is to learn comes from the amount of work that a person spends alone in front of a computer with a web browser, a text editor, and a terminal. I hope that more people who want to become computer programmers or who want to improve take this into consideration.

Resource Management in Garbage Collected Languages


I was talking to a coworker recently about an issue where npm was opening too many files and was then failing with an error related to EMFILE. For quick reference, this is the error that system calls that create file descriptors (e.g. open(2) or socket(2)) return when the calling process has hit its file descriptor limit. I actually don't know the details of this problem (other than that it was "solved" by increasing a ulimit), but it reminded me of an interesting but not-well-understood topic related to the management of resources like file descriptors in garbage collected languages.

I plan to use this post to talk about the issue, and ways that different languages work around the issue.

The Problem

In garbage collected languages there's generally an idea of running a "finalizer" or "destructor" when an object is collected. In the simplest case, when the garbage collector collects an object it reclaims that object's memory somehow. When a finalizer or destructor is associated with an object, in addition to reclaiming the object's memory the GC will execute the code in the destructor. For instance, in Python you can do this (with a few important caveats about what happens when the process is exiting!) by implementing a __del__() method on your class. Not all languages natively support this. For instance, JavaScript does not have a native equivalent.

While the Javascript language doesn't let you write destructors, real world JavaScript VMs do have a concept of destructors. Why? Because real JavaScript programs need to do things like perform I/O and have to interact with real world resources like browsee assets, images, files, and so forth. These constructs that are provided in the context like a DOM or in Node, but are not part of the Javascript language itself, are typically implemented within these VMs by binding against native C or C++ code.

Here's an example. Suppose that you have a Node.js program that opens a file and does some stuff with it. Under the hood, what is happening is that the Node.js program has created some file wrapper object that does the open(2) system call for you and manages the associated file descriptor, does async read/write system calls for you, and so on. When the file object is garbage collected the file descriptor associated with the object needs to be released to the operating system using close(2). There's a mechanism in V8 to run a callback when an object is collected, and the C++ class implementing the file object type uses to to add a destructor callback will handle invoking the close(2) system call when the file object is GC'ed.

A similar thing happens in pretty much every garbage collected language. For instance, in Python if you have a file object you can choose to manually invoke the .close() method to close the object. But if you don't do that, that's OK too—if the garbage collector determines that the object should be collected, it will automatically close the file if necessary in addition to actually reclaiming the memory used by the object. This works in a similar way to Node.js, except that instead of this logic being implemented with a V8 C++ binding it's implemented in the Python C API.

So far so good. Here's where the interesting issue is that I want to discuss. Suppose you're opening and closing lots of files really fast in Node.js or Python or some other garbage collected language. This will generate a lot of objects that need to be GC'ed. Despite there being a lot of these objects, the objects themselves are probably pretty small—just the actual object overhead plus a few bytes for the file descriptor and maybe a file name.

The garbage collector determines when it should run and actually collect objects based on a bunch of magic heuristics, but these heuristics are all related to memory pressure—e.g. how much memory it thinks the program is using, how many objects it thinks are collectable, how long it's been since a collection, or some other metric along these lines. The garbage collector itself knows how to count objects and track memory usage, but it doesn't know about extraneous resources like "file descriptors". So what happens is you can easily have hundreds or thousands of file descriptors ready to be closed, but the GC thinks that the amount of reclaimable memory is very small and thinks it doesn't need to run yet. In other words, despite being close to running out of file descriptors, the GC doesn't realize that it can help the situation by reclaiming these file objects since it's only considering memory pressure.

This can lead to situations where you get errors like EMFILE when instantiating new file objects, because despite your program doing the "right thing", the GC is doing something weird.

This gets a lot more insidious with other resources. Here's a classic example. Suppose you're writing a program in Python or Ruby or whatever else, and that program is using some bindings to a fancy C library that does some heavy processing for some task like linear algebra, computer vision, machine learning, or so forth. To be concrete, let's pretend like it's using bindings to a C library that does really optimized linear algebra on huge matrices. The bindings will make some calls into the C library to allocate a matrix when an object is instantiated, and likewise will have a destructor callback to deallocate the matrix when the object is GC'ed. Well, since these are huge matrices, your matrices could easily be hundreds of megabytes, or even many gigabytes in size, and all of that data will actually be page faulted in and resident in memory. So what happens is the Python GC is humming along, and it sees this PyObject that it thinks is really small, e.g. it might think that the object is only 100 bytes. But the reality is that object has an opaque handle to a 500 MB matrix that was allocated by the C library, and the Python GC has no way of knowing that, or even of knowing that there was 500 MB allocated anywhere at all! This happens because the C library is probably using malloc(3) or its own allocater, and the Python VM uses its own memory allocator). So you can easily have a situation where the machine is low on memory, and the Python GC has gigabytes of these matrices ready to be garbage collected, but it thinks it's just managing a few small objects and doesn't GC them in a timely manner. This example is a bit counterintuitive because it can appear like the language is leaking memory when it's actually just an impedance mismatch between how the kernel tracks memory for your process and how the VM's GC tracks memory.

Again, I don't know if this was the exact problem that my coworker had with npm, but it's an interesting thought experiment—if npm is opening and closing too many files really quickly, it's totally possible that the issue isn't actually a resource leak, but actually just related to this GC impedance mismatch.

Generally there's not a way to magically make the GC for a language know about these issues, because there's no way that it can know about every type of foreign resource, how important that resource is to collect, and so on. Typically what you should do in a language like Python or Ruby or JavaScript is to make sure that objects have an explicit close() method or similar, and that method will finalize the resource. Then if the developer really cares about when the resources are released they can manually call that method. If the developer forgets to call the close method then you can opt to do it automatically in the destructor.

C++ Solution

C++ has a really elegant solution to this problem that I want to talk about, since it inspired a similar solution in Python that I'm also going to talk about. The solution has the cryptic name RAII -- Resource Acquisition Is Initialization. In my opinion this is a really confusing name, since most people really care more about finalization than initialization, but that's what it's called.

Here's how it works. C++ has a special kind of scoping called block scoping. This is something that a lot of people think that syntactically similar languages like JavaScript have, since those languages also have curly braces, but the block scoping concept is actually totally different. Block scoping means that variables are bound to the curly brace block they're declared in. When the context of the closing curly brace is exited, the variable is out of scope. This is different from JavaScript (or Python) for instance, because in JavaScript variables are scoped by the function call they're in. So a variable declared in a for loop inside of a function is out of scope when the loop exits in C++, but it is not out of scope until the function exits in JavaScript.

In addition to the block scoping concept, C++ also has a rule that says that when an object goes out of scope its destructor is called immediately. It even further goes to specify the rules about what order the destructors are invoked. If multiple objects to out of scope at once, the destructors are called in reverse order. So suppose you have a block like.

  Foo a;
  Bar b;

The following will happen, in order:

This is guaranteed by the language. So here's how it works in the context of managing things like file resources. Suppose I want to have a simple wrapper for a file. I might implement code like this:

class File {
  File(const char *filename) :fd_(open(filename, O_RDONLY)) {}
  ~File() { close(fd_); }

  int fd_;

int foo() {
  File my_file("hello.txt");
  return baz();

(Note: I'm glossing over some details like handling errors returned by open(2) and close(2), but that's not important for now.)

In this example, we opened the file hello.txt, do some things with it, and then it will automatically get closed after the call to baz(). So what? This doesn't seem that much better than explicitly opening the file and closing it. In fact, it would have certainly been a lot less code to just call open(2) at the start of foo(), and then had the one extra line at the end to close the file before calling baz().

Well, besides being error prone, there is another problem with that approach. What if bar() throws an exception? If we had an explicit call to close(2) at the end of the function, then an exception would mean that line of code would never be run. And that would leak the resource. The C++ RAII pattern ensures that the file is closed when the block scope exits, so it properly handles the case of the function ending normally, and also the case where some exception is thrown somewhere else to cause the function to exit without a return.

The C++ solution is elegant because once we've done the work to write the class wrapping the resource, we generally never need to explicitly close things, and we also get the guarantee that the resources is always finalized and that that is done in a timely manner. And it's automatically exception safe in all cases. Of course, this only works with resources that can be scoped to the stack, but this is true a lot more often than you might suspect.

This pattern is particularly useful with mutexes and other process/thread exclusion constructs where failure to release the mutex won't just cause a resource leak but can cause your program to deadlock.

Python Context Managers

Python has a related concept called "context managers" and an associated syntax feature called a with statement.

I don't want to get too deep into the details of the context manager protocol, but the basic idea is that an object can be used in a with statement if it implements two magic methods called __enter__() and __exit__() which have a particular interface. Then when the with statement is entered the __enter__() method is invoked, and when the with statement is exited for any reason (an exception is thrown, a return statement is encountered, or the last line of code in the block is executed) the __exit__() method is invoked. Again, there are some details I'm eliding here related to exception handling, but for the purpose of resouce management what's interesting is that this provides a similar solution to the C++ RAII pattern. When the with statement is used we can ensure that objects are automatically and safely finalized by making sure that finalization happens in the __exit__() method.

The most important difference here compared to the C++ RAII approach is that you must remember to use the with statement with the object to get the context manager semantics. With C++, RAII typically is meant to imply that the object is allocated on the stack which means that it is automatically reclaimed and there's no chance for you to forget to release the context.

Go Defer

Go has a syntax feature called defer that lets you ensure that some code is run when the current block is exited. This is rather similar to the Python context manager approach, although syntactically it works a lot differently.

The thing that is really nice about this feature is that it lets you run any code at the time the block is exited, i.e. you can pass any arbitrary code to a defer statement. This makes the feature incredibly flexible—in fact, it is a lot more flexible than the approach that Python and C++ have.

There are a few downsides to this approach in my opinion.

The first downside is that like with Python, you have to actually remember to do it. Unlike C++, it will never happen automatically.

The second downside is that because it's so flexible, it has more potential to be abused or used in a non-idiomatic way. In Python, if you see an object being used in a with statement you know that the semantics are that the object is going to be finalized when the with statement is exited. In Go the defer statement probably occurs close to object initialization, but doesn't necessarily have to.

The third downside is that the defer statement isn't run until the function it's defined in exits. This is less powerful than C++ (because C++ blocks don't have to be function-scoped) and also less powerful than Pyton (because with statements can exit before the calling function).

I don't necessarily think this construct is worse than C++ or Python, but it is important to understand how the semantics differ.


Javascript doesn't really have a true analog of the C++/Python/Go approaches, as far as I know. What you can do in Javascript is to use a try statement with a finally clause. Then in the finally clause you can put your call to fileObj.close() or whatever the actual interface is. Actually, you can also use this approach in Python if you wish, since Python also has the try/finally construct.

Like with Go defer statements, it is the caller's responsibility to remember to do this in every case, and if you forget to do it in one place you can have resource leaks. In a lot of ways this is less elegant than Go because the finalization semantics are separated from the initialization code, and this makes the code harder to follow in my opinion.

My Philosophy on "Dot Files"


This is my philosophy on dot files, based on my 10+ years of being a Linux user and my professional career as a sysadmin and software engineer. This is partially also based on what I've seen in developer's dotfiles at the company I currently work for, which has a system for managing and installing the dotfiles of nearly 1000 engineers.

When I started using Linux, like every new Unix user I started cribbing dot files from various parts of the internet. Predictably, I ended up with a mess. By doing this, you get all kinds of cool stuff in your environment, but you also end up with a system that you don't understand, is totally nonstandard, and is almost always of questionable portability.

In my experience this is less of a problem for people who are software engineers who don't have to do a lot of ops/sysadmin work. A lot of software engineers only do development on their OS X based computer, and possibly a few Linux hosts that are all running the exact same distro. So what happens is if they have an unportable mess, they don't really know and it doesn't affect them. That's great for those people.

When you start doing ops work, you end up having to do all kinds of stuff in a really heterogenous environment. It doesn't matter if you work at small shop or a huge company, if you do any amount of ops work you're going to admin multiple Linux distros, probably various BSD flavors, and so on. Besides that (or even if you have a more homogeneous environment), you end up having to admin hosts that are in various states of disrepair (e.g. failed partially way through provisioning) and therefore might as well be different distros.

Early on, the (incorrect) lesson I got out of this was that I needed to focus on portability. This is really hard to do if you actually have to admin a really heterogeneous environment. For a few reasons. For a starter, even the basic question of "What kind of system am I on?" is surprisingly hard to answer. The "standard" way to do it is to use the lsb_release command... but as you would guess, this only works on Linux, and it only works on Linux systems that are recent enough to have a lsb_release command. If you work around this problem, you still have the problem that it's easy to end up with a huge unreadable soup of if statements that at best is hard to understand, and frequently is too specific to really correct anyway. You might think that you could work around this by doing "feature testing", which is actually the right way to solve the problem, but this is notoriously hard to do in a shell environment and can again easily make the configuration unreadable or unmaintainable.

It gets even worse for things like terminal based emulators. The feature set of different terminal emulators like xterm, aterm, rxvt, and so on varies widely. And it gets even more complicated if you're using a "terminal multiplexer" like screen or tmux. God forbid you try to run something in a vim shell or Emacs eshell/ansi-term. Trying to detect what terminal emulator you're under and what features it actually supports is basically impossible. Even if you could do this reliably (which you can't because a lot of terminal emulators lie), the feature set of these terminal emulators has varied widely over the years, so simply knowing which terminal emulator you're using isn't necessarily enough to know what features it supports.

As I became a more seasoned Linux/Unix user, what I learned was that I should try to customize as little as possible. Forget those fancy prompts, forget the fancy aliases and functions, and forget the fancy 256-color terminal emulator support. The less you customize the less you rely on, and the easier it becomes to work on whatever $RANDOMSYSTEM you end up on. For a number of years the only customization I would do at all was setting PS1 to a basic colorized prompt that included the username, hostname, and current working directory—and nothing else.

Recently I've softened on this position a bit, and I know have a reasonable amount of configuration. In the oldest version of my .bashrc that I still track with version control (from 2011, sadly I don't have the older versions anymore), the file had just 46 lines. It has a complicated __git_ps1 function I cribbed from the internet to get my current git branch/state if applicable, sets up a colorized PS1 using that function, and does nothing else. By 2012-01-01 this file had expanded to 64 lines, mostly to munge my PATH variable and set up a few basic aliases. On 2013-01-01 it was only one line longer at 65 lines (I added another alias). On 2014-01-01 it was still 65 lines. At the beginning of this year, on 2015-01-01 it was 85 lines due to the addition of a crazy function I wrote that had to wrap the arc command in a really strange way. Now as I write this in mid-2015, it's nearly twice the size, at a whopping 141 lines.

What changed here is that I learned to program a little more defensively, and I also got comfortable enough with my bash-fu and general Unix knowledge. I now know what things I need to test for, what things I don't need to test for, and how to write good, portable, defensive shell script. The most complicated part of my .bashrc file today is setting up my fairly weird SSH environment (I use envoy and have really specific requirements for how I use keys/agents with hosts in China, and also how I mark my shell as tainted when accessing China). Most of my other "dot files" are really simple, ideally with as little configuration as possible. Part of this trimming down of things has been aided by setting up an editor with sensible defaults: for real software engineering stuff I use Spacemacs with a short .spacemacs file and no other configuration, and for ops/sysadmin stuff I use a default uncustomized vi or vim environment.

Which brings me to the next part of this topic. As I mentioned before, the company I work at has nearly 1000 engineers. We also have a neat little system where people can have customized dot files installed on all of our production hosts. The way it works is there's a specific git repo that people can clone and then create or edit content in a directory that is the same as their Unix login. The files they create in that directory will be installed on all production hosts via a cron that runs once an hour. A server-side git hook prevents users from editing content in other user's directories. This system means that generally users have their dot files installed on all hosts (with a few exceptions not worth going into here), and also everyone can see everyone else's checked in dot files since they're all in the same repo.

People abuse this system like you would not believe. The main offenders are people who copy oh-my-zsh and a ton of plugins into their dot files directory. There are a few other workalike systems like Bashish (which I think predates oh-my-zsh), but they're all basically the same: you copy thousands of lines of shell code of questionable provenance into your terminal, cross your fingers and hope it works, and then have no idea how to fix it if you later encounter problems. Besides that, I see a ton of people with many-hundreds-of-lines of configuration in their bash/zsh/vim/emacs configuration that are clearly copied from questionable sources all over the internet.

This has given me a pretty great way to judge my coworkers' technical competency. On the lowest rung are the gormless people who have no dot files set up and therefore either don't give a shit at all or can't be bothered to read any documentation. Just above that are the people who have 10,000 lines of random shell script and/or vimscript checked into their dot files directory. At the higher levels are people who have a fairly minimal setup, which you can generally tell just by looking at the file sizes in their directory.

If you want to see what differentiates the really competent people, here are a few things I sometimes look for:

LaRouche PAC


So the LaRouche PAC has been posted up the last few weeks at 3rd & Market, with a bunch of aggravating signs about Greece, Glass-Steagall, and so forth. In classic LaRouche fashion, they have these stupid posters with Obama sporting a Hitler moustache and other nonconstructive inflammatory things designed to provoke people and pull them into discussions/arguments so they can espouse their conspiracy theories.

I had a friend in college (UC Berkeley) that joined and left the LaRouchies multiple times, and dropped out of school as a result. They are a really scary cult. I want to write about that story, so people realize how fucked up they are.

Basically what would happen is my friend would start going to the LaRouche discussion meetings in the evenings that are open to the public, and would get really cauhgt up in that. Then he'd start also going on their weekend retreats from time to time. The way these work, you go out to some cabin or something in the middle of nowhere, you don't have your cell phone or a way to contact the outside world, and they have this highly structured schedule where you talk about LaRouche politics and ideology all weekend for 16 hours a day. Then since he was spending all of this time in the evening meetings and weekend retreats, he'd stop going to classes and would spend time canvassing with the group. At one point, he had effectively moved out of the Berkeley co-op he was in and was living in some house that they had somewhere in Berkeley or Oakland that they let people stay in who are sufficiently dedicated to the cause (and who are spending some minimum amount of time canvassing, donating money, or doing who knows what else).

I remember the first time he joined the group. At first he was telling me about this cool group that had these interesting math/geometry discussions, and didn't mention it was the LaRouche movement. Maybe he didn't even know at first. Apparently there's some sort of shadow organization where they'll do these tracks focused on less political stuff like math/philosophy, and then they try to use that to get you to start going to the more "philosophy" focused discussions, and then that leads into the actual political arm of the organization which is their real interest. He'd be telling me about how they were doing these interesting Euclidean geometry discussions, and talking about logic, and somehow this math/geometry thing was related to the general concept of rationality, reasoning, and higher level thought and philosophy. Anyway, I was like "yeah that sounds cool maybe I'll check it out some time" and never went for whatever reason, I guess just because it sounded too weird to me. Then over the course of the semester, he started telling me about more of the stuff they were discussing, and started getting into the politics of it. At the time I was fairly up to date with what was going on with national and international politics, but not nearly as knowledgeable as someone who spends all day reading/talking about this stuff, so we'd get into these discussions where he'd be espousing these weird views about whatever was the topic of the moment and I would just be like "OK, whatever, clearly I don't know as much about this issue as you but I still disagree—I'm not going to debate you on this, let's talk about something else."

Then basically the last time I hung out with my friend that semester, we were walking around talking about stuff, and he started telling me this really crazy shit about how Lyndon LaRouche actually was controlling the Democratic Party, and somehow also had Congress and the GOP under his thumb, and all of this really out there stuff. Lyndon LaRouche would issue these internal memos where he'd be predicting various political/economic events, most of which either sounded to me very vague or were not substantiable. From my point of view, the things he predicted correctly would be used to "prove" that he was controlling the Democratic Party or whatever, and then for the stuff that didn't come to fruition there would be some excuse about how something had changed at the last minute and LaRouche had to steer in a different direction. My friend didn't realize how bullshit this was. I was just like WTF I don't even know how to explain how crazy this is, and didn't really see him for a few weeks after that.

The next time I saw him was during the finals week of that semester. I was going down into the UC Berkeley main stacks (read: huge campus library) to do some studying or whatever. So I randomly run into him, and we're talking and he asked me what I was doing down there. So I was like "Uh..... I'm studying for finals... why else would I be down in main stacks during finals week?" and during the ensuing discussion I realized that he didn't even know that it was finals week at school, and had completely stopped going to classes or following anything related to actually being a student at UC Berkeley.

What happened is he failed all of his classes that semester and was put on academic probation. His parents found out, because they were paying his tuition and rent and whatnot. They found out about the LaRouche stuff and freaked out, and they got him to take a semester off of school, live at home with them, and they got him out of the cult. He basically came to his senses, realized that the LaRouche thing was ruining his life, and decided to quit the movement and go back to school again.

The next semester that he was actually back in school we start hanging out again and he filled me in on what happened, how the LaRouche movement is a cult (duh, I had already figured it out by this point), and all of that stuff. But these people from the LaRouche movement kept calling him. We'd be hanging out and he'd get a phone call and be like "hold on, I need to take this", and then he'd spend an hour talking to the person about how he had left the group and wasn't interested in going to their meetings. I don't know why he didn't just stop taking the phone calls, or hang up immediately, but somehow he'd always get dragged into a long discussion.

Predictably what happened is at some point he ended up going to one of their meetings, didn't tell me or his parents about it, and got dragged right back into the cult. Then he stopped going to his classes again and cut off contact with me and the other few friends he had (although looking back, I think I might have been his only friend outside of the LaRouche movement at the time). Then he got kicked out of school since he had failed all of his classes one semester, took a semester off, and then failed all of his classes again. His parents found out and lost their shit again. My friend and his parents were from Bulgaria (I think he had come over to the United States in middle school), and they got him to somehow move to Bulgaria so that he could actually get the fuck out of the LaRouche cult and try to get a job or go to school there. I'm not really sure of the details because he had deleted his Facebook (or maybe that's when I had deleted mine), so I didn't really keep in touch. I did hear a few years later that he was still in Bulgaria, so I think things worked out.

tl;dr Fuck the LaRouche movement. It's a fucked up cult. Do not try to talk with them or engage with them, it's a waste of your time. The best thing you can do is ignore them, and if you see anyone reading their literature tell them they're a fucking cult. There's a bunch of this stuff documented on the internet. Usually the Wikipedia articles are informative on the matter, but the LaRouchies have been in a multi-year edit war with Wikipedia trying to remove any damaging facts about their organization, so what's on Wikipedia is not necessarily trustworthy at any given moment.

An Important Difference Between mysql(1) and MySQLdb


I keep forgetting about this thing, and then every six to twelve months when I have to do it again, I waste a bunch of time rediscovering it. It's important enough that I'm going to blog it.

If you're used to using PostgreSQL, you'll know that with Postgres you can connect over the local AF_UNIX socket using peer authentication. This means that as the evan user I can automagically connect to the evan database without a password. Likewise, to become the Postgres superuser, I simply need to do sudo -u postgres psql. This works using some magic related to either SO_PEERCRED or SCM_CREDENTIALS which let you securely get the credentials of the other end of a connected AF_UNIX socket.

MySQL also has a local AF_UNIX socket, and you can use this socket to make connections to MySQL. This is pretty handy, and for many reasons you may prefer to connect to MySQL over the local socket rather than using a TCP connection to localhost.

However, MySQL does not do the peer authentication thing. It doesn't matter if you're the root user connecting over a local socket. If the root user is configured to require a password (which is what I strongly recommend), then you must supply a password, even if you have sudo privileges on the host.

Fortunately, there's an easy workaround here that prevents you from having to type the root password all the time if you're doing a lot of MySQL administration. When you use the mysql CLI program, it will look for a file called ~/.my.cnf and use it to look up various connection settings. In particular, in this file you can set a default user and password. So let's say you've done this nice thing and made a file called /root/.my.cnf that has the root user's MySQL credentials, and you have the file set to mode 600 and all that and everything is great. You can type sudo mysql and you won't have to supply the root MySQL password (just possibly the root sudo password).

Here is a really important thing to know: the behavior of reading ~/.my.cnf is something that the mysql CLI program implements, it is not something implemented by libmysqlclient.so!

What that means is that when you are writing some script to frob MySQL using Python and MySQLdb, this will not work:

conn = MySQLdb.connect(unix_socket='/run/mysqld/mysql.sock',

You might think that if you ran this script as the root user, it could authenticate. Not so. Instead what you want is this:

conn = MySQLdb.connect(unix_socket='/run/mysqld/mysql.sock',

By the way, using the read_default_file option like this is definitely the best way to authenticate to MySQL from Python in general. You should not be putting database passwords in your Python projects---neither in your source code, nor in your project configs. By using a file in the filesystem like this you can move all of the database credentials into Puppet/Chef/whatever and secure the files so that most users can't read them. It may not seem like a big win today, but a few years later, when you're given the task of auditing everything for passwords, knowing that passwords have only lived in your configuration management software is going to help a lot.

How To Be An Urban Cyclist—Part 1


This blog series is going to explain to my advice on being an urban cyclist. The difficulty I've seen with other people is that while a lot of people know how to ride a bike, they may not feel comfortable riding in heavy traffic, on poorly paved roads, or in poorly lit ares. These posts are based on my experience the last six or seven years of my life cycling mostly around Berkeley, Oakland, San Francisco, and Los Angeles.

The first post in the series will cover what kind of bicycle I recommend, and what kind of gear you need to ride.

First you should have a well maintained bicycle. If you're buying a new bicycle, I strongly recommend getting a road bike with drop bars rather than a cheapo mountain bike. Road bikes are simply a lot faster, and if you don't feel fast you're not going to want to bike. There's nothing more frustrating than seeing people whiz by you on their bikes while you're struggling on yours. Simply put: if you don't feel good on your bike, you're not going to use it.

You can get a decent used steel frame bicycle in the Bay Area for $500-$600 or cheaper, depending on exactly what size frame you need, what type of components you want, etc. If you live elsewhere, you can probably get one cheaper. A decent new road bike will be something like $1000 or more if you want to get really fancy. If you're buying a new bike, I'm a big fan of Surly Bikes, but there's nothing wrong with getting a used bike. If you get a used bike, make sure you ride it and test that it can shift smoothly and brake quickly.

You should get and wear a helmet. You'll easily exceed speeds of 20 mph on your bike, and even in dense urban areas cars frequently exceedp speeds of 30 mph or more. For a comparison, falling off the top of a two story building entails an impact of about 20 mph. At 20 mph, much less at higher speeds, you can very easily die in a head on collision.

Next, make absolutely sure that you have both a front and rear light if you're going to light in any kind of low light conditions. Riding in the dark without a light is incredibly dangerous, because you'll be moving quickly, be hard to see, and be making very little noise. I like the silicone lights that don't require any mounting gear that you can put on and off your bike really easily (mine are "Snugg" brand and cost $15 for a pair on Amazon). These are great for riding around and being seen. However, they're not going to illuminate the road in front of you. If you plan on biking in really dark areas you'll want a bigger/brighter clip on light—I'd recommend the ones that are 1 watt or higher power output (most of the ones in the store will be 0.5 watts, which isn't ideal). Make sure you always remove your lights when locking your bike outdoors.

For locks, at the minimum you need a U-lock and cable lock.[1] The U-lock will lock your rear wheel and frame, the cable will lock your front wheel. Note that all of the cables you buy can be cut fairly easily (in a few minutes perhaps); the point of the cable is to deter someone from stealing the front wheel (which is fairly cheap), the U-lock is what will actually be securing your frame. I highly recommend the 5" Mini Kryptonite U-Lock. The 5" locks are not only the smallest ones, but they're also the most secure. U-locks can be easily broken by someone with a jack, if there's enough space to get the jack in between the bars of the lock to bend it. The 5" locks don't admit enough space for someone with a jack to get a hold on the lock. However, you'll really need an adequate rack to lock your bike with a 5" lock. For instance, it's generally not possible to lock your bike to a parking meter with a 5" lock whereas you can with a larger size. When you lock your bike, you need to place the U-lock so that it secures the rear wheel through the rear triangle of the bike. You generally should not directly lock the frame. By locking the rear wheel through the rear triangle, the U-lock is actually going through both the frame and the rear wheel (although it may not look like it!). The cable loops through the front-wheel and back around the U-lock.

In areas with high rates of bike theft, such as San Francisco, you'll need some way to secure your seat as well. I biked and locked my bike outdoors for years in Berkeley, Oakland, and Los Angeles and never had a problem with seat theft. As soon as I started biking in San Francisco, I got my seat stolen twice in the course of a month (both times having left the bike alone for less than an hour). So whether or not you need this really depends on where you live. Bike stores will sell special locks for seats. You can keep the lock on the seat all of the time because you'll only need to remove it in the rare situations when you need to adjust the seat height. If you don't like the look of a seat lock, or want to spend less money, you can also try securing the seat post bolt by using security bolts or hot gluing a BB into the bolt head.

If you're going to ride in the rain, I strongly recommend a detachable rear fender. Otherwise you're going to get a muddy butt. I've never found a front fender to be necessary; if it's rainy enough to need one, you're going to get drenched anyway.

[1] If you have security bolts for your front wheel, you can probably omit buying and carrying a cable lock.

On Not Having A LinkedIn Account


I don't have a LinkedIn account, which some people find to be a bit strange. I'd like to talk a bit about that.

As a software engineer with an awesome job, I really do not need a constant barrage of recruiter spam. Here are the specifics:

My experience with LinkedIn is I'd get a torrential inflow of recruiter spam (i.e. "Join our HOT VC-backed stealth startup!!!") that wasn't useful to me at all.

Worse, I found that some people would "stalk" me on LinkedIn before coming in for job interviews. As in, I'd go in to a job interview, and someone would mention something about my past that they had looked up on LinkedIn. This has happened once with my Twitter account too, which is even creepier.

Since LinkedIn provides no value to me and is yet-another-way-to-track-me, I don't have an account with them. EZPZ.

Final Conflict—Ashes To Ashes


I was recently turned on to Final Conflict's seminal album Ashes To Ashes from this Pitchfork album review. The album review made the album sound awesome, and I'm pretty into some of the other acts from the 80's LA/OC hardcore scene (e.g. Black Flag, TSOL, Adolescents), so I had to check it out.

Put simply, this album is fucking great. I personally have a strong preference for the hardcore sound (i.e. compared to thrash/black/heavy metal) because that's the shit I grew up on, so even though the whole scene was a bit before my time I get nostalgic for it. That said, there are some pretty prominent metal influences in this album that clearly place the album in the late 80s. For instance, the track Abolish Police features some awesome wailing guitar sections not as common in the earlier hardcore stuff (but seen for example in the later Black Flag material). Some of the tracks like Shattered Mirror strongly evoke the sound of some other LA/OC acts like TSOL or Adolescents; in particular, this track reminds me of some of the tracks from the Adolescents' debut album. There are some awesome samples of Reagan-era political speeches on tracks like Political Glory and The Last Sunrise.

tl;dr if you're into hardcore stuff, check this album out.



I added an RSS feed to this "blog", again using Python's excellent lxml module. This ended up being really convenient because of the way I was already using lxml to generate the articles from markdown. There's a method .text_content() on the lxml html nodes, so I can already take the marked up article content and extract the text content from it. Thus, the generator script (lovingly called generate.py) ends up being a thin wrapper that generates HTML from the markdown files, then does some element tree gymnastics, and magically out of this comes an HTML element tree that's rendered as the blog content itself, and an RSS element tree.

tl;dr magic happens and then blog.

Cloud Nothings—Attack On Memory


Right now this album is my favorite thing. Especially the first two tracks, holy shit.



Those of you who know me well know that while my music interests are varied, lately (as in, the past few years) I've mostly been listening to hip hop music. I wanted to do a review of a new album I've been really into lately that isn't a hip hop album. That album is Sunbather by Deafheaven.

I found this album somehow by stumbling across links on Pitchfork. I think I was checking out some bands I had found on Vimeo, and a Deafheaven link came up at the bottom of one of the pages. Anyway, I saw the Pitchfork album review, saw that it was rated well and read the album description, and I decided to check out the album. It's an incredibly easy album to get into because the opening track, Dream House, is so powerful. It's very atmospheric with fast-paced guitars and percussion, and very emotive-but-subdued "screamo" vocals. The next track, Irresistible, blends in perfectly with the first track and provides a really nice contrast; it is a very melodic entirely instrumental track. The album generally follows this pattern of a long black metal/emo/screamo track usually followed by a shorter more melodic track.

I can't really do the full album review the same justice as the experts can, so I refer you to the already linked Pitchfork review, as well as The Needle Drop's album review.

What I really love about this album is how accessible and melodic it is, and yet how emotive and powerful a lot of the tracks are. I don't listen to a lot of black metal (which is I guess how the band labels themselves), and I think black metal is generally a somewhat inaccessible genre for outsiders. Yet I was able to pick this album up really easily. This may because the album is non-traditional to the genre, but I like it.

I'm especially excited because I'm attending the Pitchfork Music Festival in July, and I found out (having already bought tickets) that Deafheaven will be performing there. I'm looking forward to seeing them live!

Hello World


I made a simple static site generator for my new blog incarnation. The generator works using Markdown and lxml to generate sites. I am not using any normal templating tools like jinja or mustache.

Since I think it's kind of interesting, articles are structured as separate files in a directory, and an article itself looks like this:

metadata, e.g. the article date
more metadata

blog content starts here

In other words, there is a preamble section of metadata, a blank line, and then the actual markdown text. I parse the metadata, generate HTML using the Python markdown module, and then transform that into an lxml element tree. The lxml element tree is munged to insert the metadata (e.g. the article date).

I decided on this format because

Mostly I intend on using this space to talk about music, bicycles, computers, life, work, and all of that good stuff.