Declaring C String Constants The Right Way

C string constants can be declared using either pointer syntax or array syntax:

// Option 1: using pointer syntax.
const char *ptr = "Lorem ipsum";

// Option 2: using array syntax.
const char arr[] = "dolor sit amet";

Is there a difference between the two? And which should you choose?

They are different, and the array syntax (option two) generates smaller, faster code. I'm going to prove this and demonstrate how they're different, by looking at the x86 generated by GCC and Clang for these two different cases.

To see how they're different, I'm going to look at a program that uses both forms, which will let us see how the compiler generates code in both cases. Let's continue this program by defining a bogus method that takes void pointers, and then functions that invoke use the bogus method with the two globals just defined:

// Bogus function, just to see how arguments are passed.
void bogus();

// Invoke bogus using ptr.
void do_ptr() { bogus(&ptr, ptr); }

// Invoke bogus using arr.
void do_arr() { bogus(&arr, arr); }

In case you're not familiar with how this works: C forward declarations that don't declare any parameters are implicitly treated as variadic functions. The caller can pass any number of arguments, with any type, to such a function. This is how I've defined bogus(), since it ensures that no type conversions will happen. The methods do_ptr() and do_arr() pass the address of their variable as the first parameter to bogus(), and pass the variable by value as the second parameter.

We can't actually run this code, since bogus() is just a forward declaration. But we can generate object files from this code, which is good enough to understand what the compiler does.

A Look At Some Assembly

We can see what code is generated by passing the -S flag to the compiler. Here's the code generated by GCC 7.1.1, when using gcc -O2 -S:

;; GCC 7.1.1
do_ptr:
        movq    ptr(%rip), %rsi
        movl    $ptr, %edi
        xorl    %eax, %eax
        jmp     bogus

do_arr:
        movl    $arr, %esi
        xorl    %eax, %eax
        movq    %rsi, %rdi
        jmp     bogus

As you can see, the pointer and array versions involve different generated code. If we compile using Clang 3.9.1, with the invocation clang -O2 -S, we get the exact same code for do_ptr(), but slightly different coded for do_arr():

;; Clang 3.9.1
do_ptr:
        movq    ptr(%rip), %rsi
        movl    $ptr, %edi
        xorl    %eax, %eax
        jmp     bogus

do_arr:
        movl    $arr, %edi
        movl    $arr, %esi
        xorl    %eax, %eax
        jmp     bogus

We'll start with an explanation of do_ptr, which is the same under both compilers. The SysV x86-64 ABI requires that the first argument to a function be passed in %rdi, and the second argument be passed in %rsi. In the code here we'll see %rdi and %edi used interchangeably, and same with %rsi and %esi; for this purpose we can treat them as the same. The difference is whether we're referring to the full 64-bit register, or just the lower 32 bits.

Here's my annotated version of the first two instructions, which are the ones we care about:

;; Code for do_ptr (same in GCC and Clang)

;; Dereference ptr, and copy it to %rsi. This is second parameter, ptr.
movq    ptr(%rip), %rsi

;; Copy the immediate value of ptr to %edi. This is the first parameter, &ptr.
movl    $ptr, %edi

Since ptr is a real pointer, it has its own memory address, distinct from the memory it points to. Somewhere in memory is the string "Lorem ipsum", and somewhere else in memory is a 64-bit value pointing to that string. The pointer ptr isn't the string, it's the 64-bit value that points to the string. When ptr is passed by value to bogus(), the compiler has to dereference the pointer, and that's what we're seeing in the generated code.

The second function, do_arr, is a lot simpler. Let's start with the Clang version, since it's a little easier to understand. Again, I'm just going to look at the mov instructions right now, since they're what matters. Here's my annotated version:

;; Code for do_arr, Clang 3.9.1

;; Copy the immediate value of &arr to %edi.
movl    $arr, %edi

;; Copy the immediate value of arr into %esi.
movl    $arr, %esi

The array version is much simpler: the address of an array is the same as the value of the array, and the actual address is known at link time. Therefore the compiler can do two immediate copies, without accessing main memory! There's no need to dereference any memory.

GCC does basically the same thing as Clang, except instead of copying the same constant value twice, it copies the constant once, and then does a register-to-register copy:

;; Code for do_arr, GCC 7.1.1

;; Copy the immediate value of arr into %esi. This is the second parameter, &arr.
movl    $arr, %esi

;; Copy %rsi into %rdi. This is the first parameter, &arr.
movq    %rsi, %rdi

Pointer Semantics vs. Array Semantics

I mentioned this in passing already, but the key thing about a pointer is that it has its own memory address. A pointer is not the thing it points to, it's a separate thing entirely. This is not true of an array: an array is the data.

If you take the address of a pointer, you get the memory location of the pointer itself. If you want to know the memory address the pointer points to, you take the value of the pointer.

Arrays are not pointers. They are just data. There's no difference between the address of an array, and the value of an array.

When the linker finally generates the code for the declaration of ptr, it puts the string "Lorem ipsum" somewhere in memory, and then creates a 64-bit value pointing to that string. But the linker doesn't have to do this for arr: it actually uses the address of the string data itself wherever arr is used.

Going Deeper: Disassembly

Let's really dig in here, by creating an actual object file and disassembling it. That will give us more insight into what's happening, because we'll see how big the instructions are. This actually impacts performance, because more compact instructions occupy less memory, are more likely to fit into cache lines, and so forth. To look at the disassembly we'll use gcc -O2 -c and clang -O2 -c to generate object files. Then we'll pass the object files to objdump -d. Here's the disassembled version with GCC:

;; GCC 7.1.1

;; Implementation takes 19 bytes
0000000000000000 <do_ptr>:
   0:	48 8b 35 00 00 00 00 	mov    0x0(%rip),%rsi        # 7 <do_ptr+0x7>
   7:	bf 00 00 00 00       	mov    $0x0,%edi
   c:	31 c0                	xor    %eax,%eax
   e:	e9 00 00 00 00       	jmpq   13 <do_ptr+0x13>

;; Implementation takes 15 bytes
0000000000000020 <do_arr>:
  20:	be 00 00 00 00       	mov    $0x0,%esi
  25:	31 c0                	xor    %eax,%eax
  27:	48 89 f7             	mov    %rsi,%rdi
  2a:	e9 00 00 00 00       	jmpq   2f <do_arr+0xf>

And here's the disassembly for the code generated by Clang:

;; Clang 3.9.1

;; Implementation takes 19 bytes
0000000000000000 <do_ptr>:
   0:	48 8b 35 00 00 00 00 	mov    0x0(%rip),%rsi        # 7 <do_ptr+0x7>
   7:	bf 00 00 00 00       	mov    $0x0,%edi
   c:	31 c0                	xor    %eax,%eax
   e:	e9 00 00 00 00       	jmpq   13 <do_ptr+0x13>

;; Implementation takes 17 bytes
0000000000000020 <do_arr>:
  20:	bf 00 00 00 00       	mov    $0x0,%edi
  25:	be 00 00 00 00       	mov    $0x0,%esi
  2a:	31 c0                	xor    %eax,%eax
  2c:	e9 00 00 00 00       	jmpq   31 <do_arr+0x11>

Let's just start with a brief, but fun, digression: which generates better code, GCC or Clang? The implementation of do_ptr() is the same with both compilers, but the implementation of do_arr() differs between GCC and Clang. Both compilers generate do_arr() use two mov instructions, but GCC uses 8 bytes total to encode the mov instructions, and Clang uses 10 bytes. GCC wins here by two bytes. However, GCC also introduces a register data dependency, because it uses a register-to-register copy for the second mov, whereas Clang uses a constant-to-register copy for the second mov. The register-to-register copy used by GCC has the potential to affect how CPU pipelining works, due to the introduction of a data dependency. Whether or not this actually affects pipelining highly dependent on the actual CPU generation and microarchitecture. Someone who works at Intel or AMD might know the full ramifications of how this would affect pipelining, but I am not that person. Therefore I'm going to call this one a wash, since they should take the same number of clock cycles, and there's an argument to be made either way.

The do_ptr() method is doing something totally different. It populates %rsi doing a copy from memory into a register, using the instruction pointer relative addressing scheme. This is slower than do_arr(), because it requires doing a real memory fetch. The overhead of a memory fetch is unpredictable. In the best case, the data is in the L1 cache, and the memory fetch is nearly free. The worse case would be when the data isn't in the L1/L2/L3 caches, which requires accessing main memory. Actually the worst worst case is where the data needs to be fetched from memory and a TLB miss occurs when doing the virtual-to-physical translation. When a TLB miss happens not only does the data need to be fetched from main memory, the CPU will also have to walk the process' page table to find the physical memory address for the fetch.

Since we're good software engineers, we know that an L1 cache reference is about 1ns, and a main memory fetch is about 100ns. Furthermore, the mov instruction used here is seven bytes wide! So not only is this code slower, the mov instruction used by do_ptr() is two bytes wider than the mov instruction used by do_arr(). One again, the explanation here is that do_ptr() has to dereference ptr, which is why it needs a different kind of mov instruction.

If you've been paying close attention, you might wonder why the compiler is using this weird looking instruction pointer relative addressing scheme for do_ptr. Normally instruction pointer relative addressing is used for position independent code (PIC), but this code doesn't actually need to be position independent, since the address of the string constant isn't randomized. The reason is much more subtle: the compiler is using this addressing mode as a trick to save a few bytes of space. You'll notice that the mov instruction is 7 bytes wide, but memory locations on 64-bit machines are 8 bytes in size. Relative addressing is done using a 32-bit relative immediate value. If an immediate 64-bit value was used the instruction would end up being 10 bytes wide: 2 bytes to encode the mov plus output register, and 8 bytes to encode the 64-bit immediate value.

Conclusion

Since C arrays can "decay" into pointer types, we often think of them as the same, but they're actually different. Pointers have their own memory address, which points to data. An array is more like the data itself, and does not have a distinct address from the data it points to. Using the array syntax is slightly better: arrays don't need to be dereferenced, and you'll save a few bytes of memory (and disk space) by using an array.

When possible, you should prefer declaring global string constants using array syntax.

P.S. If you were paying really close attention, you may have noticed that the calls to bogus() were preceded by mysterious xorl %eax, %eax instructions, which set the value of %eax to 0, even though that register appears to be unused. In an upcoming blog post, I'll explain why the compiler emits this instruction.