C values come in two types: lvalues and rvalues. The intuitive way to think of this is that lvalues are the things allowed on the left side of expressions, and rvalues are the things allowed on the right side. For instance, consider the following two assignment statements:
int x;
x = 1; // OK
1 = x; // Error
The first assignment statement is legal because x
is an lvalue and 1 is an
rvalue, and everything appears in the right place. The second assignment
statement is illegal because it has an rvalue on the left-hand side. In fact, if
you try to compile this code you'll get a cryptic error explaining this:
$ cc x.c
x.c: In function ‘main’:
x.c:3:5: error: lvalue required as left operand of assignment
1 = x;
^
Literals like 7
, true
, and 1.0f
are normally rvalues. This means that
literals can only appear on the right side of an assignment expression. However,
there's one corner case: string literals (e.g. "hello"
) are actually lvalues!
To understand why, we need to understand what the type of a string literal is in
C. You might assume that a string literal has type char *
or const char *
,
but that's wrong. In C, a string literal has type char []
. There are a few
different reasons why this makes sense, but one way to think about this is how
the sizeof
operator is supposed to work on a string literal. Consider the
following C program; what do you think will be printed to the screen?
#include <stdio.h>
int main() {
printf("sizeof hello = %zd\n", sizeof "hello");
return 0;
}
This program actually prints sizeof hello = 6
, because 6 is the length of the
string "hello"
, including the terminating null byte. If string literals were
pointer types, then this would just print the size of a pointer, which would be
4 for a 32-bit system or 8 for a 64-bit system. Since string literals are
actually arrays, the sizeof
operator can instead print the actual size of the
array, which is the string length in this case. This is very convenient, since
it allows compile-time substitution of the size of string literals, which arises
frequently.
Another reason that string literals should be array types, rather than pointer types, is that it lets them be used in array contexts. For instance, you can write code like this, which only works if string literals are arrays:
// Unusual function taking a fixed-sized character array.
void foo(char s[6]) {
// do something with s
}
int main() {
foo("hello");
return 0;
}
In general, array types in C will automatically decay into pointer types if
necessary. For instance, the parameter for strlen()
is a const pointer:
// Takes a pointer type, not an array.
size_t strlen(const char *s);
When passing a string literal like "hello"
to strlen()
, the value decays
from a char []
to a const char *
. This works because arrays decay into
pointer types, and non-const values can be used in const contexts. Thus an
expression like strlen("hello")
is perfectly valid, even though "hello"
is
really an array, not a pointer.
Arrays are always lvalues (since they must have an address in memory), and thus C string literals are also lvalues.
Mutating String Literals
The following program is legal according to the C spec, but is logically invalid:
// Legal, but naughty.
int main() {
"foo"[0] = 0;
return 0;
}
When compiling this program, GCC gives a warning ("warning: assignment of read-only location"), but still produces an executable. The executable terminates with a segmentation fault when run. What actually happens though, when we run a program like this? GDB shows the following disassembly when the program crashes:
(gdb) disas
Dump of assembler code for function main:
0x0000000000400487 <+0>: push %rbp
0x0000000000400488 <+1>: mov %rsp,%rbp
=> 0x000000000040048b <+4>: movb $0x0,0x9e(%rip) # 0x400530
0x0000000000400492 <+11>: mov $0x0,%eax
0x0000000000400497 <+16>: pop %rbp
0x0000000000400498 <+17>: retq
End of assembler dump.
This is showing us that the compiler is trying to write a single zero byte to memory location 0x400530. We can ask GDB what's mapped at this memory location, but the output isn't too useful:
(gdb) info proc mappings
process 29646
Mapped address spaces:
Start Addr End Addr Size Offset objfile
0x400000 0x401000 0x1000 0x0 /home/evan/a.out
... many more lines ...
This isn't useful because we see the range that 0x400530 is mapped at, but not
what permissions that range is mapped with. To see the permissions, we need to
look at /proc
:
$ cat /proc/29646/maps
00400000-00401000 r-xp 00000000 fd:02 20710675 /home/evan/a.out
... more lines ...
This shows that memory address 0x400530 is mapped with permissions r-xp
, which
means that the memory location is readable and executable, but not writable. In
other words, the compiler has generated valid code, but the linker has arranged
the memory layout such that the generated code will segfault. This is a weird
idiosyncrasy of C: the language standard generally pretends like linkers don't
exist, even though in practice linking has a big effect on the runtime behavior
of programs.
String Literals In C++
As a minor (but important) point of trivia, the type of string literals in C++
is different from C. In C, a string literal has type char []
. In C++, a string
literal has type const char []
.
The fact that C++ makes a string literal into a const value improves things somewhat. For instance, if you compile the previous example with a C++ compiler, you should get a compiler error (rather than a warning), since mutating const values is strictly prohibited. As always you can easily circumvent these checks with type casts or aliased pointers, so you still need to exercise caution.
For completeness, I should also add that while C++ has lvalues and rvalues, it further subdivides them into a much more complicated value category zoo, with exotic types like glvalues and xvalues.