A Funny Thing About C and C++

June 13, 2016

I recently read an interesting paper about the difference between the ISO C standard and C as implemented by compilers, and it got me thinking more about how people learn C and C++. Most languages are very complex, and for most questions that developers have it's impractical to try to find answers in the actual language specification. To start, it's frequently impossible to answer questions using language standards because it may not be possible to find the answer without knowing the correct terminology. Even in cases where one does know the terminology to look for, many language standards are so verbose that consulting them is impractical for a non-expert.

Instead, what most developers do when they have a question about a language corner case is to write a program that exercises the corner case and then observe what the program actually does. Most languages have a de facto reference implementation, and typically the reference implementation mirrors the specification almost exactly. For instance, if you have a question about how an obscure aspect of Python works you can almost always test a program using CPython to understand the dictated behavior. There are a couple of obscure things that are implementation-specific (e.g. what is allowed with respect to garbage collection), but in the vast majority of cases this approach works fine. This approach also works well with other languages like Ruby, PHP, Javascript, and Go.

C and C++ are very different. There is no de facto reference implementation of C or C++. There are a lot of different C/C++ compilers out there, and unlike many other languages the C and C++ standards are frequently finalized before any compilers actually fully implement the new semantics (this is particularly true for C++). Additionally there is a huge amount of "undefined behavior" allowed by the language specifications. Therefore when you write a test program in C or C++, you can't be sure if the observed behavior you get is actually part of the language specification or simply the behavior of a particular compiler. The problem is compounded by the sheer complexity of the language specifications. The last time I looked, the ISO C standard was 700 pages long. That's just C! I can't even fathom how many pages the C++ standard must be, if it even exists in a single document.

Another interesting thing about C and C++ is that for real-world programs to execute you must link them. Most of the details of linking are not specified by the language standards. This is extremely evident if you look at complex C or C++ code that attempts to be cross-platform compatible between Windows and Unix---generally there will be a plethora of preprocessor macros near function definitions to ensure that methods have similar linkage semantics on the two platforms. In a number of cases there are linking behaviors available on Windows but not on Unix, and vice versa.

What most people learn when they write C or C++ is how a particular compiler works, not how the languages are actually specified. This is why writing portable C or C++ code is so challenging. It's no wonder there are so few true experts out there in the C and C++ communities.