C string constants can be declared using either pointer syntax or array syntax:
// Option 1: using pointer syntax.
const char *ptr = "Lorem ipsum";
// Option 2: using array syntax.
const char arr[] = "dolor sit amet";
Is there a difference between the two? And which should you choose?
They are different, and the array syntax (option two) generates smaller, faster code. I'm going to prove this and demonstrate how they're different, by looking at the x86 generated by GCC and Clang for these two different cases.
To see how they're different, I'm going to look at a program that uses both forms, which will let us see how the compiler generates code in both cases. Let's continue this program by defining a bogus method that takes void pointers, and then functions that invoke use the bogus method with the two globals just defined:
// Bogus function, just to see how arguments are passed.
void bogus();
// Invoke bogus using ptr.
void do_ptr() { bogus(&ptr, ptr); }
// Invoke bogus using arr.
void do_arr() { bogus(&arr, arr); }
In case you're not familiar with how this works: C forward declarations that
don't declare any parameters are implicitly treated as variadic functions. The
caller can pass any number of arguments, with any type, to such a function. This
is how I've defined bogus()
, since it ensures that no type conversions will
happen. The methods do_ptr()
and do_arr()
pass the address of their variable
as the first parameter to bogus()
, and pass the variable by value as the
second parameter.
We can't actually run this code, since bogus()
is just a forward declaration.
But we can generate object files from this code, which is good enough to
understand what the compiler does.
A Look At Some Assembly
We can see what code is generated by passing the -S
flag to the compiler.
Here's the code generated by GCC 7.1.1, when using gcc -O2 -S
:
;; GCC 7.1.1
do_ptr:
movq ptr(%rip), %rsi
movl $ptr, %edi
xorl %eax, %eax
jmp bogus
do_arr:
movl $arr, %esi
xorl %eax, %eax
movq %rsi, %rdi
jmp bogus
As you can see, the pointer and array versions involve different generated code.
If we compile using Clang 3.9.1, with the invocation clang -O2 -S
, we get the
exact same code for do_ptr()
, but slightly different coded for do_arr()
:
;; Clang 3.9.1
do_ptr:
movq ptr(%rip), %rsi
movl $ptr, %edi
xorl %eax, %eax
jmp bogus
do_arr:
movl $arr, %edi
movl $arr, %esi
xorl %eax, %eax
jmp bogus
We'll start with an explanation of do_ptr
, which is the same under both
compilers. The SysV x86-64 ABI requires that the first argument to a function be
passed in %rdi
, and the second argument be passed in %rsi
. In the code here
we'll see %rdi
and %edi
used interchangeably, and same with %rsi
and
%esi
; for this purpose we can treat them as the same. The difference is
whether we're referring to the full 64-bit register, or just the lower 32 bits.
Here's my annotated version of the first two instructions, which are the ones we care about:
;; Code for do_ptr (same in GCC and Clang)
;; Dereference ptr, and copy it to %rsi. This is second parameter, ptr.
movq ptr(%rip), %rsi
;; Copy the immediate value of ptr to %edi. This is the first parameter, &ptr.
movl $ptr, %edi
Since ptr
is a real pointer, it has its own memory address, distinct from the
memory it points to. Somewhere in memory is the string "Lorem ipsum", and
somewhere else in memory is a 64-bit value pointing to that string. The pointer
ptr
isn't the string, it's the 64-bit value that points to the string. When
ptr
is passed by value to bogus()
, the compiler has to dereference the
pointer, and that's what we're seeing in the generated code.
The second function, do_arr
, is a lot simpler. Let's start with the Clang
version, since it's a little easier to understand. Again, I'm just going to look
at the mov
instructions right now, since they're what matters. Here's my
annotated version:
;; Code for do_arr, Clang 3.9.1
;; Copy the immediate value of &arr to %edi.
movl $arr, %edi
;; Copy the immediate value of arr into %esi.
movl $arr, %esi
The array version is much simpler: the address of an array is the same as the value of the array, and the actual address is known at link time. Therefore the compiler can do two immediate copies, without accessing main memory! There's no need to dereference any memory.
GCC does basically the same thing as Clang, except instead of copying the same constant value twice, it copies the constant once, and then does a register-to-register copy:
;; Code for do_arr, GCC 7.1.1
;; Copy the immediate value of arr into %esi. This is the second parameter, &arr.
movl $arr, %esi
;; Copy %rsi into %rdi. This is the first parameter, &arr.
movq %rsi, %rdi
Pointer Semantics vs. Array Semantics
I mentioned this in passing already, but the key thing about a pointer is that it has its own memory address. A pointer is not the thing it points to, it's a separate thing entirely. This is not true of an array: an array is the data.
If you take the address of a pointer, you get the memory location of the pointer itself. If you want to know the memory address the pointer points to, you take the value of the pointer.
Arrays are not pointers. They are just data. There's no difference between the address of an array, and the value of an array.
When the linker finally generates the code for the declaration of ptr
, it puts
the string "Lorem ipsum" somewhere in memory, and then creates a 64-bit value
pointing to that string. But the linker doesn't have to do this for arr
: it
actually uses the address of the string data itself wherever arr
is used.
Going Deeper: Disassembly
Let's really dig in here, by creating an actual object file and disassembling
it. That will give us more insight into what's happening, because we'll see how
big the instructions are. This actually impacts performance, because more
compact instructions occupy less memory, are more likely to fit into cache
lines, and so forth. To look at the disassembly we'll use gcc -O2 -c
and
clang -O2 -c
to generate object files. Then we'll pass the object files to
objdump -d
. Here's the disassembled version with GCC:
;; GCC 7.1.1
;; Implementation takes 19 bytes
0000000000000000 <do_ptr>:
0: 48 8b 35 00 00 00 00 mov 0x0(%rip),%rsi # 7 <do_ptr+0x7>
7: bf 00 00 00 00 mov $0x0,%edi
c: 31 c0 xor %eax,%eax
e: e9 00 00 00 00 jmpq 13 <do_ptr+0x13>
;; Implementation takes 15 bytes
0000000000000020 <do_arr>:
20: be 00 00 00 00 mov $0x0,%esi
25: 31 c0 xor %eax,%eax
27: 48 89 f7 mov %rsi,%rdi
2a: e9 00 00 00 00 jmpq 2f <do_arr+0xf>
And here's the disassembly for the code generated by Clang:
;; Clang 3.9.1
;; Implementation takes 19 bytes
0000000000000000 <do_ptr>:
0: 48 8b 35 00 00 00 00 mov 0x0(%rip),%rsi # 7 <do_ptr+0x7>
7: bf 00 00 00 00 mov $0x0,%edi
c: 31 c0 xor %eax,%eax
e: e9 00 00 00 00 jmpq 13 <do_ptr+0x13>
;; Implementation takes 17 bytes
0000000000000020 <do_arr>:
20: bf 00 00 00 00 mov $0x0,%edi
25: be 00 00 00 00 mov $0x0,%esi
2a: 31 c0 xor %eax,%eax
2c: e9 00 00 00 00 jmpq 31 <do_arr+0x11>
Let's just start with a brief, but fun, digression: which generates better code,
GCC or Clang? The implementation of do_ptr()
is the same with both compilers,
but the implementation of do_arr()
differs between GCC and Clang. Both
compilers generate do_arr()
use two mov
instructions, but GCC uses 8 bytes
total to encode the mov
instructions, and Clang uses 10 bytes. GCC wins here
by two bytes. However, GCC also introduces a register data dependency, because
it uses a register-to-register copy for the second mov
, whereas Clang uses a
constant-to-register copy for the second mov
. The register-to-register copy
used by GCC has the potential to affect how CPU pipelining works, due to the
introduction of a data dependency. Whether or not this actually affects
pipelining highly dependent on the actual CPU generation and
microarchitecture. Someone who works at Intel or AMD might know the full
ramifications of how this would affect pipelining, but I am not that person.
Therefore I'm going to call this one a wash, since they should take the same
number of clock cycles, and there's an argument to be made either way.
The do_ptr()
method is doing something totally different. It populates %rsi
doing a copy from memory into a register, using the instruction pointer relative
addressing scheme. This is slower than do_arr()
, because it requires doing a
real memory fetch. The overhead of a memory fetch is unpredictable. In the best
case, the data is in the L1 cache, and the memory fetch is nearly free. The
worse case would be when the data isn't in the L1/L2/L3 caches, which requires
accessing main memory. Actually the worst worst case is where the data needs to
be fetched from memory and a TLB miss occurs when doing the
virtual-to-physical translation. When a TLB miss happens not only does the data
need to be fetched from main memory, the CPU will also have to walk the process'
page table to find the physical memory address for the fetch.
Since we're good software engineers, we know
that
an L1 cache reference is about 1ns, and a main memory fetch is about 100ns.
Furthermore, the mov
instruction used here is seven bytes wide! So not only is
this code slower, the mov
instruction used by do_ptr()
is two bytes wider
than the mov
instruction used by do_arr()
. One again, the explanation here
is that do_ptr()
has to dereference ptr
, which is why it needs a different
kind of mov
instruction.
If you've been paying close attention, you might wonder why the compiler is
using this weird looking instruction pointer relative addressing scheme for
do_ptr
. Normally instruction pointer relative addressing is used for position
independent code (PIC), but this code doesn't actually need to be position
independent, since the address of the string constant isn't randomized. The
reason is much more subtle: the compiler is using this addressing mode as a
trick to save a few bytes of space. You'll notice that the mov
instruction is
7 bytes wide, but memory locations on 64-bit machines are 8 bytes in size.
Relative addressing is done using a 32-bit relative immediate value. If an
immediate 64-bit value was used the instruction would end up being 10 bytes
wide: 2 bytes to encode the mov
plus output register, and 8 bytes to encode
the 64-bit immediate value.
Conclusion
Since C arrays can "decay" into pointer types, we often think of them as the same, but they're actually different. Pointers have their own memory address, which points to data. An array is more like the data itself, and does not have a distinct address from the data it points to. Using the array syntax is slightly better: arrays don't need to be dereferenced, and you'll save a few bytes of memory (and disk space) by using an array.
When possible, you should prefer declaring global string constants using array syntax.
P.S. If you were paying really close attention, you may have noticed that the calls
to bogus()
were preceded by mysterious xorl %eax, %eax
instructions, which
set the value of %eax
to 0, even though that register appears to be unused. In
an upcoming blog post, I'll explain why the compiler emits this instruction.