In Defense of C++

I really like C++. It’s not the right tool for every job, but it’s what I reach for when I need to do any kind of systems programming. Unfortunately, C++ has often been cited as a “bad” language, and I feel like that reputation has worsened over time. Recent languages like Go and Rust have taken over a lot of mind share—at least on sites like Hacker News. These are great languages, and I’m happy to see them succeed. But I think a lot of people miss the point of what makes C++ such a great language, and in this post I want to explain things from my point of view.

To make my point, I’m going initially focus on some commonly cited reasons for C++ is an awful language. Then we’ll see why these supposedly awful things are intentional language design decisions that actually make the language rather interesting and unique.

C++ Is Designed For Performance

When you’re designing a programming language, there are a lot of different trade-offs you have to balance. Things like expressiveness, readability, consistency, and performance are among the factors under consideration. C++ is what you get when, at every turn in the language design road, you choose to take the path that leads to higher performance.

One significant difference with C++ compared to most other programming languages, is that C++ has been designed by the people writing C++ compilers. There is no reference implementation of C++. Instead, there are a lot of different C++ compilers written by different vendors. All of these compilers implement the language slightly differently, generate slightly different code, and usually extend the language slightly.

Historically it’s been the case that in many commercial environments, the people implementing the C++ compiler are working closely with the people using the compiler to write systems code. For instance, Microsoft writes a C++ compiler called Visual C++. This is the same compiler that’s used at Microsoft by a lot of different teams, including the Windows kernel team, the SQL Server team, etc. Those teams want to write really highly optimized code. So there’s a tight feedback loop where people working on, say, SQL Server, have direct access to give feedback to the Visual C++ team to guide them in different possible optimizations, language extensions, etc. This is also true in other environments. For instance, Apple is one of the main forces behind the LLVM project, which produces clang, a compiler for C, C++, and Objective-C which Apple uses internally. The engineers at Apple want highly optimized code, and therefore they can work with the LLVM developers to get their needs met. Intel also makes a popular C++ compiler, and they work closely with customers to try to produce the most highly optimized code. And so on—there are plenty of other examples if you go looking.

These compiler writers are the same people who are on the C++ Standards Committee. Thus, the actual semantics for the rules of the language are primarily dictated by the people working on making highly optimized compilers.

Usually the people working on C++ applications choose the language specifically because performance is considered paramount. Because honestly, it’s a pretty strange language with a lot of warts, and why would you use it for any other reason? This creates a feedback cycle where people are using C++ to get high performance, and that drives the language evolution towards confusing “features” (and idiosyncrasies) that are done in the name of making it possible to generate more highly optimized code.

In the following sections I’m going to dive into a bit more detail on several different language aspects (templates, undefined behavior, and copy elision) that show this principle at work.

Templates

Templates exemplify a lot of what I’ve just discussed. If you are fortunate enough to not be too familiar with them, templates are a mechanism provided by the language for programming with generic types.

One reason generic programming in C++ is a strange idea is that C++ has a rather weak type system, one inherited with relatively few changes from C. There’s no concept of “type classes” or “interfaces” or anything like that. Instead templates provide an advanced way to do syntactic, macro-like substitution. You literally write a “template” for what the code should look like, and then at compile time the compiler substitutes in all of the types and generates native code. If you’re familiar with C preprocessor macros, you can think of templates as their really weird, estranged cousin.

This system is really awful, for a lot of reasons.

Let’s say I have a template that works on some type T. In the template I have a variable of type T, and I call the Foo() method on it. That’s fine, but what it means is that the template will only work with types that actually declare a Foo() method with the right type signature. Except there’s no way to actually express that constraint with templates. So if I try to use this template with the wrong type, after a bunch of template and macro expansion the compiler will print out a cryptic error showing the templated code and the failed type substitution. Templates can be both nested and recursive, which makes understanding such compiler errors nearly impossible. Even if you don’t write advanced templates, you might use libraries such as Boost (or even just the STL) that do.

Templates also make compiling code really slow. Let me give a concrete example, involving C++ vectors. One of the most widely used types from STL is std::vector, which is the standard C++ implementation of a vector (or “array”) type. You’ll see std::vector all over the place in real C++ code, just as you see list objects everywhere when you write Python. The template defining std::vector looks like this:

// STL definition of std::vector.
template<class T,
         class Allocator = std::allocator<T>
        > class vector;

Each vector is templated by a container type and an allocator type. You can already see that even for this primitive type we have a nested template, since for a container type T there’s a default allocator type called std::allocator<T> which is itself a template. Things are already getting hairy.

A large C++ program will have vectors with lots of different container types. There will be vectors of integers, vectors of strings, vectors of floats, vectors of vectors, vectors of pointers, and so on. A large program could have hundreds of these types. The compiler needs to generate code for each of these types. Most of the types have the exact same implementation, but the compiler has to regenerate the code each time anyway. For example, the compiler will generate a method called front() which accesses the first element in the vector—and the compiler has to generate the code for this method for each different type used in a vector, hundreds of times, even though the code to do this is always the same. Actually, things are even more complicated. Short, common methods like front() may be declared “inline”. In this case the compiler not only has to not only expand the template hundreds of times for all of the different container types used, the compiler also has to expand and optimize the inline method definitions at every call site!

All of this makes compiling C++ code really slow. C++ requires a linking phase, and templates are even more brutal on the linker, since during the final link phase the linker needs to hold all of the object code in memory. To give you an idea of what this is like in the real world, the reference implementation for Bitcoin is written in C++, and requires 1.5 GB of free memory to build. Mozilla recommends that you have at least 8 GB of memory to compile Firefox, another C++ application. You’ll see the same phenomenon with other big open source projects like LibreOffice or the Chromium web browser. From experience, I know things get even worse for closed-source internal C++ projects with huge code bases.

Now I’ve just said a lot of bad things about templates. So why does C++ have this awful meta-programming system?

Well, templates generate really fast code. With templates everything is independently compiled and optimized with type-specific specialization. For instance, let’s say you want to sort a vector of integers. To do this quickly you want to use the native instructions on the CPU for integer comparison, and you want to inline those comparison operations. On the other hand, if you want to sort a vector of strings you’re going to have to use string comparison routines, which don’t have native CPU implementations (but which can still be inlined). With templates the compiler has the option of maximally optimizing every specific templated type to take advantage of things integers having native comparison operators.

Templates are objectively extremely difficult to understand and use correctly, but they also make it a lot easier for the compiler to optimize the snot out of your code compared to other generic programming systems.

Undefined Behavior

C and C++ share a distinctive characteristic: their language specifications are littered with what is called “undefined behavior” (UB). What this means is that when these situations are triggered by a C or C++ program, the behavior is literally undefined by the standard. Here’s an example of a short program that is UB in both C and C++:

// Legal code that triggers at least two different kinds of UB.
int x;
printf("x = %d\n", x, x);

There are a lot of things that are considered UB; a few prominent examples include:

What actually happens in these cases is that the compiler is free to generate code as if these types of undefined conditions will never happen. Thus if such conditions do happen, the result is undefined, because the compiler may have generated invalid code. This is done specifically because it allows the compiler to generate more highly optimized code, i.e. in the name of performance. I’ll examine a few of the easier understand ones to see why.

Let’s say the C++ compiler had to check array indexing to ensure they were in bounds. That would make every array indexing operation slower, even though most of them are correct! And what would the compiler do anyway—throw an exception? This is a feasible idea, but many C++ programmers disable exceptions for performance. The same argument applies to pointer dereferencing and integer overflows. And that’s not the only optimization UB allows the compiler to make—the compiler can generate faster code in common integer-based for loops if it’s assumed that the loop variable cannot overflow. Similar arguments apply to strict aliasing and so on.

Most programmers would agree that UB makes the language confusing and difficult to understand. The situation is particularly bad because literally anything can happen during UB, and thus UB makes it very difficult to reason about degenerate program states. But we live with this, because it allows compilers to make a number of assumptions that improve performance, and this makes our programs fast.

Copy Elision

Consider the following code:

// Create a vector with one element.
std::vector<Point> make_singleton_point(const Point &p) {
    return std::vector<Point>{p};
}

// Actually use make_singleton_point().
int main(int argc, char **argv) {
   std::vector<Point> x = make_singleton_point({1, 2});
   std::cout << x.size() << "\n";
   return 0;
}

Here’s a question for you: how many times is a Point object instantiated, and how many times is a std::vector<Point> object instantiated? Are copy constructors for either type called, and if so, how many times? Are the rules here affected by whether or not Point is a POD type?

The C++ standard allows something called copy elision, which basically allows the compiler to omit unnecessary copies. In particular, a common type of copy elision is return value optimization, which can be applied above. When copy elision happens an object’s copy constructor may not be called in places that you would typically expect. This is something that can lead to observable behavior, e.g. if you have code with side-effects in an object’s copy constructor or destructor.

One thing that’s confusing about copy elision is that while the language has a strict set of rules about when copy elision can be applied, actually applying it is optional. And the rules that do exist are pretty complicated—I can never seem to get them quite right. The actual situations in which a compiler will do copy elision also change on what version of the compiler and language standard you’re using.

Back to my original question about how many objects from the code listing are instantiated and copied. I confess that I actually can not tell you just from looking at the code. There are a bunch of possibilities here, and which ones are legal are confusing. This is the subject of countless Stack Overflow postings.

If you thought this was fun, wait until you get into C++11, which introduces move constructors and “emplace” methods, which make this already-confusing subject even more difficult to understand.

The Joy Of C++

Now that I’ve excoriated C++, let me tell you why I love it. I really like writing efficient code. It’s what gets me excited. Some people are really into testing, or refactoring, or applying “design patterns”, or any number of other things. But for me—I like writing fast, efficient code. My love of x86 assembly comes from the same place. I really like writing code where I feel like I have full control of the CPU and memory and can use them to the fullest.

Once you understand what the compiler is doing, the rules for C++ become a lot more manageable. C++ is a language designed to let compilers generate efficient code. If you know what lets compilers generate efficient code, a lot of the time it will help you understand what the rules for C++ are. At the minimum, it will help you know what situations are likely to be nuanced. Over time, if you’re like me, you may begin to experience Stockholm syndrome and start to actually enjoy writing C++. Writing fast code is fun, and while micro-optimizing memory copies and minimizing pointer indirection isn’t for everyone, it can be enjoyable if you have a knack for it.

There are a lot of things I wouldn’t use C++ for. Frequently, it really is too low-level. You can make a basic HTTP request in one line of Python or Go, but doing the same in C++ is not easy. The language gives you a lot of rope to hang yourself with. This is particularly problematic when you have to work on a big code base with many programmers, a lot of whom may not be language experts.

But if you’re willing to engage in some masochism and want to write code that’s really, really fast, C++ is hard to beat. And I don’t see that changing any time soon.