I recently read
an interesting paper about
the difference between the ISO C standard and C as implemented by compilers, and
it got me thinking more about how people learn C and C++. Most languages are
very complex, and for most questions that developers have it’s impractical to
try to find answers in the actual language specification. To start, it’s
frequently impossible to answer questions using language standards because it
may not be possible to find the answer without knowing the correct terminology.
Even in cases where one does know the terminology to look for, many language
standards are so verbose that consulting them is impractical for a non-expert.
Instead, what most developers do when they have a question about a language
corner case is to write a program that exercises the corner case and then
observe what the program actually does. Most languages have a de facto reference
implementation, and typically the reference implementation mirrors the
specification almost exactly. For instance, if you have a question about how an
obscure aspect of Python works you can almost always test a program using
CPython to understand the dictated behavior. There are a couple of obscure
things that are implementation-specific (e.g. what is allowed with respect to
garbage collection), but in the vast majority of cases this approach works fine.
This approach also works well with other languages like Ruby, PHP, Javascript,
and Go.
C and C++ are very different. There is no de facto reference implementation of C
or C++. There are a lot of different C/C++ compilers out there, and unlike many
other languages the C and C++ standards are frequently finalized before any
compilers actually fully implement the new semantics (this is particularly true
for C++). Additionally there is a huge amount of “undefined behavior” allowed by
the language specifications. Therefore when you write a test program in C or
C++, you can’t be sure if the observed behavior you get is actually part of the
language specification or simply the behavior of a particular compiler. The
problem is compounded by the sheer complexity of the language specifications.
The last time I looked, the ISO C standard was 700 pages long. That’s just C! I
can’t even fathom how many pages the C++ standard must be, if it even exists in
a single document.
Another interesting thing about C and C++ is that for real-world programs to
execute you must link them. Most of the details of linking are not specified by
the language standards. This is extremely evident if you look at complex C or
C++ code that attempts to be cross-platform compatible between Windows and
Unix—generally there will be a plethora of preprocessor macros near function
definitions to ensure that methods have similar linkage semantics on the two
platforms. In a number of cases there are linking behaviors available on
Windows but not on Unix, and vice versa.
What most people learn when they write C or C++ is how a particular compiler
works, not how the languages are actually specified. This is why writing
portable C or C++ code is so challenging. It’s no wonder there are so few true
experts out there in the C and C++ communities.