C String Literals Are Lvalues

C values come in two types: lvalues and rvalues. The intuitive way to think of this is that lvalues are the things allowed on the left side of expressions, and rvalues are the things allowed on the right side. For instance, consider the following two assignment statements:

int x;
x = 1;  // OK
1 = x;  // Error

The first assignment statement is legal because x is an lvalue and 1 is an rvalue, and everything appears in the right place. The second assignment statement is illegal because it has an rvalue on the left-hand side. In fact, if you try to compile this code you’ll get a cryptic error explaining this:

$ cc x.c
x.c: In function ‘main’:
x.c:3:5: error: lvalue required as left operand of assignment
   1 = x;
     ^

Literals like 7, true, and 1.0f are normally rvalues. This means that literals can only appear on the right side of an assignment expression. However, there’s one corner case: string literals (e.g. "hello") are actually lvalues!

To understand why, we need to understand what the type of a string literal is in C. You might assume that a string literal has type char * or const char *, but that’s wrong. In C, a string literal has type char []. There are a few different reasons why this makes sense, but one way to think about this is how the sizeof operator is supposed to work on a string literal. Consider the following C program; what do you think will be printed to the screen?

#include <stdio.h>

int main() {
  printf("sizeof hello = %zd\n", sizeof "hello");
  return 0;
}

This program actually prints sizeof hello = 6, because 6 is the length of the string "hello", including the terminating null byte. If string literals were pointer types, then this would just print the size of a pointer, which would be 4 for a 32-bit system or 8 for a 64-bit system. Since string literals are actually arrays, the sizeof operator can instead print the actual size of the array, which is the string length in this case. This is very convenient, since it allows compile-time substitution of the size of string literals, which arises frequently.

Another reason that string literals should be array types, rather than pointer types, is that it lets them be used in array contexts. For instance, you can write code like this, which only works if string literals are arrays:

// Unusual function taking a fixed-sized character array.
void foo(char s[6]) {
    // do something with s
}

int main() {
    foo("hello");
    return 0;
}

In general, array types in C will automatically decay into pointer types if necessary. For instance, the parameter for strlen() is a const pointer:

// Takes a pointer type, not an array.
size_t strlen(const char *s);

When passing a string literal like "hello" to strlen(), the value decays from a char [] to a const char *. This works because arrays decay into pointer types, and non-const values can be used in const contexts. Thus an expression like strlen("hello") is perfectly valid, even though "hello" is really an array, not a pointer.

Arrays are always lvalues (since they must have an address in memory), and thus C string literals are also lvalues.

Mutating String Literals

The following program is legal according to the C spec, but is logically invalid:

// Legal, but naughty.
int main() {
  "foo"[0] = 0;
  return 0;
}

When compiling this program, GCC gives a warning (“warning: assignment of read-only location”), but still produces an executable. The executable terminates with a segmentation fault when run. What actually happens though, when we run a program like this? GDB shows the following disassembly when the program crashes:

(gdb) disas
Dump of assembler code for function main:
   0x0000000000400487 <+0>:	push   %rbp
   0x0000000000400488 <+1>:	mov    %rsp,%rbp
=> 0x000000000040048b <+4>:	movb   $0x0,0x9e(%rip)        # 0x400530
   0x0000000000400492 <+11>:	mov    $0x0,%eax
   0x0000000000400497 <+16>:	pop    %rbp
   0x0000000000400498 <+17>:	retq
End of assembler dump.

This is showing us that the compiler is trying to write a single zero byte to memory location 0x400530. We can ask GDB what’s mapped at this memory location, but the output isn’t too useful:

(gdb) info proc mappings
process 29646
Mapped address spaces:

          Start Addr           End Addr       Size     Offset objfile
            0x400000           0x401000     0x1000        0x0 /home/evan/a.out
            ... many more lines ...

This isn’t useful because we see the range that 0x400530 is mapped at, but not what permissions that range is mapped with. To see the permissions, we need to look at /proc:

$ cat /proc/29646/maps
00400000-00401000 r-xp 00000000 fd:02 20710675                /home/evan/a.out
... more lines ...

This shows that memory address 0x400530 is mapped with permissions r-xp, which means that the memory location is readable and executable, but not writable. In other words, the compiler has generated valid code, but the linker has arranged the memory layout such that the generated code will segfault. This is a weird idiosyncrasy of C: the language standard generally pretends like linkers don’t exist, even though in practice linking has a big effect on the runtime behavior of programs.

String Literals In C++

As a minor (but important) point of trivia, the type of string literals in C++ is different from C. In C, a string literal has type char []. In C++, a string literal has type const char [].

The fact that C++ makes a string literal into a const value improves things somewhat. For instance, if you compile the previous example with a C++ compiler, you should get a compiler error (rather than a warning), since mutating const values is strictly prohibited. As always you can easily circumvent these checks with type casts or aliased pointers, so you still need to exercise caution.

For completeness, I should also add that while C++ has lvalues and rvalues, it further subdivides them into a much more complicated value category zoo, with exotic types like glvalues and xvalues.