C++ and Thoughts On Java, Go, and Rust

Historically most programming situations that called for high performance, particularly in the field of systems programming, would have been written in C or C++. That has changed a lot over the last twenty five years or so. Java has increasingly been the programming language of choice for high performance server applications. Recently Go and Rust have made inroads as high performance alternatives to Java that claim to offer similar or better performance. Yet I still find that I really love writing C++, and at least for situations where speed or memory utilization are important I prefer writing C++ to any other language. In this post I will explore some of the aspects of C++ that I find appealing, and why Java, Go, and Rust don’t make the cut for me.

Why Not C?

I’ll start with the easiest question that I am frequently asked when the topic of C++ comes up, which is why I don’t like writing C. Actually I don’t mind writing C, particularly for smaller programs. Except for a few mostly unimportant corner cases you can think of C as a subset of C++, and therefore a lot of the things I like about C++ also apply to C.

Given the choice though, I do prefer C++. The most important reason for this preference is that C++ has much simpler error handling than C. A commonly occurring pattern in all programming languages is the need to clean up resources after an error has occurred. In languages with garbage collection features this will usually be handled by the garbage collector. C and C++ do not have garbage collectors, but C++ does have an elegant solution to this problem by way of object destructors. Resources that must be freed on error are typically wrapped in C++ by a class or struct that implements a destructor freeing the resource; this pattern is called RAII. The RAII pattern ensures that you can return out of a function at any point and any classes allocated on the stack prior to the return will have their destructors run immediately. This is simply not possible in C. Instead, C encourages the use of goto statements for error handling. In general I find that these statements make the flow of execution harder to understand, especially if a function allocates more than one or two resources. The goto facility also requires that the programmer remembers to use them in every situation where a resource is allocated, whereas C++ object destructors are always run automatically leading to fewer chances to make mistakes.

Another important reason I prefer C++ to C is that C++ offers a much more complete standard library, particularly with respect to high performance data structures. The C standard library does not really give you any data structures at all. If you want to use things like dynamically sized vectors or hash tables you either need to implement them yourself, or you need to pull in a nonstandard third party library like GLib. As just discussed, these data structure libraries require the programmer to do a lot of manual error handling to ensure that resources are freed in all cases.

Most C data structure libraries make extensive use of preprocessor macros and/or function pointers to implement polymorphism. Both of these approaches have serious limitations. Preprocessor macros operate on a textual level and therefore can easily lead to surprising bugs and compiler errors if used incorrectly. Many text editors and IDEs have difficulty understanding macros for a few inescapable reasons. For instance, it’s common to specify preprocessor defines as compilation flags, meaning that if your editor isn’t tightly integrated with your build system it has no way to actually understand how macros will be expanded. Function pointers have a different set of problems. For one, the syntax for using function pointers is widely regarded as confusing. The other problem with function pointers is that they usually cause the program to have extra indirection for function invocation because they require an extra pointer lookup. Function pointers also prevent inlining, an important compiler optimization. For many data structure applications this causes significant performance overhead compared to C++ data structures implemented using templates. Some C programs try to work around this problem by way of extensive code generation which allows the compiler to inline functions, but doing this effectively with the cpp preprocessor is either impossible or incredibly verbose and error prone.

Due to these data structure problems, most C++ programs using the STL are faster than C programs using equivalent data structures. Furthermore templates offer a lot of safety features not available to C programs using macros/function pointers. It’s always possible to write C code that is as fast as C++, but frequently doing so requires either manually inlining things or the use of novel code generation tools.

The final reason that I prefer C++ to C is that in cases where one does want to call into C code, it’s incredibly easy to do so from C++. All major C libraries that I can think of are annotated to allow use from C++ code. Typically this just requires a small amount of boilerplate at the top and bottom of the header file, like this:

#ifdef __cplusplus
extern "C" {
#endif

/* C code goes here */

#ifdef __cplusplus
}
#endif

What this does is ensure that the C++ compiler won’t attempt to do name mangling for code in the extern "C" block. This not only allows C++ code to include these C libraries, it can also be used to write C wrappers to C++ code. The ease of calling C code from C++ means that there aren’t generally features or libraries that would be available to you in C but not available from C++; and if you do find such a library, it’s trivial to modify it to be compatible with C++.

Why C++

C++ has a number of unique features that aren’t available in competing languages like Java, Go, and Rust. Some of the features I’ll outline here are available in some of these environments, but none of these languages offer all of these features.

The first feature that I really like is C++ templates. This is a widely reviled language feature, and indeed when overused templates can cause a lot of problems, particularly with respect to understandable code and compiler error messages. That being said, C++ templates are an extremely powerful metaprogramming capability, and must more powerful than C macros. I’d rather be given enough rope to hang myself than not have it at all. This is also my main objection to Go: Go has no serious generic metaprogramming capabilities unless you count go generate (which is arguably a lot more confusing than C++ templates). The ability to do polymorphic metaprogramming is an essential part of DRY. As a programmer there’s nothing I hate more than writing tedious boilerplate that a compiler could implement automatically (and more succintly than I could). Templates are an extremely powerful way to write short, correct, and fast programs.

Another feature that I’ve really come to appreciate recently is first class support for assembly language. The __asm__ keyword is reserved in both C and C++ explicitly to allow compilers to provide the ability to inline assembly code. This is a relatively unique feature. Of the alternative languages I mentioned, only Rust supports inline assembly. One should be judicious when using inline assembly, but in cases where it comes up it is absolutely indispensable.

Aside from supporting inline assembly, all major C and C++ debuggers have extremely high quality support for inspecting generated assembly and stepping through assembly. This is a critical feature for understanding how code is optimized and for debugging certain situations. For instance, if you want to understand how indirect jumps work, what’s happening in LTO code, or how inlined code works you need to be able to inspect and step through the generated assembly. Once again Rust is the only alternative language out there that really provides first class support for this, in Rust’s case by piggy backing off of GDB/LLDB, the GNU and LLVM C/C++ debuggers.

C++ doesn’t have an included runtime, by which I specifically mean that there’s no garbage collection that will interfere with your optimized code. This is a big deal for a couple of reasons. The first is that for high performance code garbage collectors can easily get in the way much more than they help. It’s not uncommon to hear of Java programmers doing things like majorly refactoring their code to make use of things like object pools to try to reduce the frequency with which the garbage collector runs. While this can be an effective technique at mitigating GC pauses, it’s an example of a situation where the “feature” of garbage collection causes developers to try to actively work around their programming environment. I content that this is especially bad because it causes developers to express application logic in unintuitive ways.

The other major problem with garbage collected languages is that they have serious problems dealing with large heap sizes. This is particularly a problem for databases and database-like applications. By database here I mean any program that has a working data set larger than a few gigabytes, and in particular any program where the data size might exceed the amount of memory on a machine. If you try to actually map all of this memory into a runtime like Java the garbage collector will have huge problems with latency (or throughput) as it scans the heap for unreferenced objects. A common way that programs will work around this is by using a structured on-disk format and then using regular disk I/O to access the data. This can be somewhat effective because the kernel page cache will cache recently accessed data in memory. The page cache, however, is always less effective than mapping data into RSS memory because accessing data via the page cache requires extra system calls to be made. This is precisely the reason that high performance database engines like InnoDB (which is written in C++!) map this type of data into userspace memory and avoid using the page cache.

One other problem I’ve noticed with modern C++ alternatives is that they often have poor support for advanced operating system primitives. For instance, if we continue considering the use case of implementing a database, the Linux kernel offers a number of advanced features for optimizing I/O. Most people should be familiar with the system calls read(2), write(2), and lseek(2). These are the basic foundations of doing disk I/O. There are some less well known alternatives to these that are designed to optimize the number of context switches applications need to make. For instance, the system calls pread(2) and pwrite(2) allow applications to coalesce read/seek and write/seek into one system call in each case respectively. Likewise, Linux offers “vectorized I/O” capabilities via the system calls readv(2) and writev(2) which allow a read or write system call to specify the input/output data as a set of multiple buffers which allows the application to avoid doing extra memory copies (or extra system calls). So far the system calls I’ve mentioned are not too well known, but critical for writing high performance applications. Linux even goes so far as to implement the baroque system calls preadv(2) and pwritev(2) which allow combining read/write operations with seeking and vectorized I/O, all in a single system call! This is extra fast but also extra unportable; for instance, OS X does not implement preadv(2) or pwritev(2). And all of these things go out the window when trying to target Windows which has a totally different set of system calls for doing these types of operations. Therefore a lot of these system calls are unavailable in runtimes that try to offer portability as a major selling point. As far as I can tell you can’t invoke preadv(2) or pwritev(2) from Java or Go (but you can in Rust if you don’t mind calling into unsafe code). Java and Go do implement pread(2) and pwrite(2) but only if you use special interfaces; for instance, with Java you must use java.nio to use pread(2). These languages at a fundamental disadvantage for implementing high performance database applications because the high performance system calls are either unavailable or difficult to use.

I’ve used some of the advanced I/O capabilities in Linux as an example here, but there are simpler examples that fall apart too. Try calling fork(2), a Unix system call coming up on 50 years of age, on Go, Rust, or Java. None of these languages support it because forking a process requires careful handling of file descriptors that survive the fork. As far as I know none of these language support fork specifically because they internally use evented I/O loops and no one knows how to expose this in a sane way to application developers. This is unfortunate because one of the main use cases for forking is forking a process early in its life cycle before there even are a lot of file descriptors to worry about, for instance to create a daemon process. You might also run into problems forking a complex C++ application that has a lot of open file descriptors, but at least you have the option to handle the matter, which you don’t have at all in other environments.

Why Not Rust

From the analysis I’ve presented so far it should be somewhat apparent that of the alternative languages to C++, Rust is the one that offers the most features I enjoy from C++. And indeed, I do think that Rust is pretty cool, and it’s a lot more palatable to me than Go or Java. Rust is definitely more targeted towards C++ programmers than either Java or Go, and that is borne from its history as an attempt to develop a real world alternative to C++ for use at Mozilla.

The main gripe I have with Rust is that while it does have a lot of really powerful features, it’s unclear to me what the major advantages it’s trying to offer over C++. For instance, one of the major language features touted in Rust is pointer ownership with move semantics which is a powerful way to avoid memory leaks via static verification that memory is freed, and with no runtime overhead. This is indeed a cool feature. It’s also exactly equivalent to a C++ std::unique_ptr. Therefore if you want to write code that statically verifies correctness with respect to releasing allocated memory and with no runtime overhead you can just as easily use C++ as you can use Rust.

In fact, if you’ve been following the recent language developments in C++ (e.g. the recent C++11 and C++14 language standards) you’ll find that many of the advanced Rust language features were added to C++ before or at around the same time as their introduction in Rust. Unique pointers are one example of this, but it also applies in other languages such as builtin concurrency primitves and move semantics. There are some unique things about Rust such as a lack of null pointers, but even that guarantee goes out the window if you need to call unsafe code, which you will if you are doing advanced system programming. If you’ve kept abreast of the latest C++ developments you might appreciate the simplicity of Rust compared to C++, but it’s hard to be impressed by the actual language features.

I’m going to keep my eye on Rust because I do think that it shows a lot of promise, but for the time being the language features it offers over C++ isn’t enough in my opinion to give up the tight integration C++ has with C and the native debugging support that GDB offers with C++.

Conclusions

The number of C++ programmers out there is probably diminishing, and certainly the core group of C++ developers in the wild is getting older. This is a direct consequence of the fact that the space historically occupied by C++ is being crowded by other high performance languages. That being said, I don’t see C++ going anywhere anytime soon. C++ has been extremely successful at incorporating new features and modernizing itself over the past few years. C++ is also unique in its facilities for tight interoperation with C and assembly.

If there’s any unifying principle in the C++ language specification and the way the language has been evolved, it’s been to make C++ fast. A lot of the aspects of C++ that people dislike are areas where the language has chosen to give more options to allow compilers to generate fast code at the expense of making the language more complicated or less understandable. It’s natural that many developers will feel like that’s the wrong tradeoff: that developers spend more time trying to understand and debug programs than they spend trying to eke out every last bit of performance. That’s definitely true in many applications, but the fact also remains that there are a lot of specialized situations where performance is paramount. In these situations it’s hard to beat C++.