A little known fact about C is that the following two declarations are not considered equivalent by the C compiler:
void hello1() { puts("Hello, world!"); }
void hello2(void) { puts("Hello, world!"); }
The C compiler will consider hello1()
to be a variadic function, and will
consider hello2()
to be a function that takes no arguments.
Now what's interesting here is that if you don't use the macros va_start()
and
va_arg()
, the compiler won't actually generate the code for the two funtions
differently. In other words, because hello1()
is not actually variadic, and we
didn't put any variadic argument unpacking code into it, the object code for
these two will be the same. However, the calling convention is affected.
Looking At Some Object Code
Consider the following program:
#include <stdio.h>
void hello1() { puts("Hello, world!"); }
void hello2(void) { puts("Hello, world!"); }
int main(int argc, char **argv) {
hello1();
hello2();
return 0;
}
I'm going to analyze what happens with gcc -O1
, using GCC 5.3.1. (Note: at -O2
and above GCC gets too smart and will just directly embed puts()
calls into
main()
, so that's why using -O1
is necessary here.)
We get the following in the disassembled output:
0000000000400536 <hello1>:
400536: 48 83 ec 08 sub $0x8,%rsp
40053a: bf 10 06 40 00 mov $0x400610,%edi
40053f: e8 cc fe ff ff callq 400410 <puts@plt>
400544: 48 83 c4 08 add $0x8,%rsp
400548: c3 retq
0000000000400549 <hello2>:
400549: 48 83 ec 08 sub $0x8,%rsp
40054d: bf 10 06 40 00 mov $0x400610,%edi
400552: e8 b9 fe ff ff callq 400410 <puts@plt>
400557: 48 83 c4 08 add $0x8,%rsp
40055b: c3 retq
000000000040055c <main>:
40055c: 48 83 ec 08 sub $0x8,%rsp
400560: b8 00 00 00 00 mov $0x0,%eax
400565: e8 cc ff ff ff callq 400536 <hello1>
40056a: e8 da ff ff ff callq 400549 <hello2>
40056f: b8 00 00 00 00 mov $0x0,%eax
400574: 48 83 c4 08 add $0x8,%rsp
400578: c3 retq
400579: 0f 1f 80 00 00 00 00 nopl 0x0(%rax)
This isn't that interesting. As you can see the object code for hello1
and
hello2
are exactly identical (other than the immediate argument passed to the
relative CALL
instruction), since neither is truly variadic nor takes any
arguments.
But what if we change the order we call them in? If we change the definition of
main()
like this:
int main(int argc, char **argv) {
hello1();
hello2();
return 0;
}
Then the generated code for main()
will be slightly different. The new code is
like this:
000000000040055c <main>:
40055c: 48 83 ec 08 sub $0x8,%rsp
400560: e8 e4 ff ff ff callq 400549 <hello2>
400565: b8 00 00 00 00 mov $0x0,%eax
40056a: e8 c7 ff ff ff callq 400536 <hello1>
40056f: b8 00 00 00 00 mov $0x0,%eax
400574: 48 83 c4 08 add $0x8,%rsp
400578: c3 retq
400579: 0f 1f 80 00 00 00 00 nopl 0x0(%rax)
You can see something slightly different. Previously the call looked like:
400560: b8 00 00 00 00 mov $0x0,%eax
400565: e8 cc ff ff ff callq 400536 <hello1>
40056a: e8 da ff ff ff callq 400549 <hello2>
But now it looks like:
400560: e8 e4 ff ff ff callq 400549 <hello2>
400565: b8 00 00 00 00 mov $0x0,%eax
40056a: e8 c7 ff ff ff callq 400536 <hello1>
What's interesting here is that we can see that %eax
is being cleared before
the call to hello1()
. But it's not cleared before the call to hello2()
.
If we remove the call to hello1()
altogether, it's even more obvious what's
happening. Our new main()
is
int main(int argc, char **argv) {
hello2();
return 0;
}
And the new generated code is:
000000000040055c <main>:
40055c: 48 83 ec 08 sub $0x8,%rsp
400560: e8 e4 ff ff ff callq 400549 <hello2>
400565: b8 00 00 00 00 mov $0x0,%eax
40056a: 48 83 c4 08 add $0x8,%rsp
40056e: c3 retq
40056f: 90 nop
As you can see here %eax
is never cleared at all before calling hello2()
.
The reason this happens is because on x86-64, the calling ABI specifies that
%al
(i.e. the bottom-most byte of %eax
) holds the number of vector registers
that are used in the variadic function call. The vector registers are used when
you pass variadic arguments in that contain floating point numbers. So even
though the generate code for hello1()
is exactly the same as the generated
code for hello2()
, and does not actually inspect the contents of %al
, the
caller of hello1()
must clear the %eax
register before calling hello1()
.
This is an extra instruction that has to happen every time you call a function like this. In practice, the overhead of this is so small that you probably wouldn't be able to measure it. In fact, modern Intel CPUs can execute multiple integer operations operations in a single clock cycle (as long as there are no register dependencies), so there's a good chance that in some of these cases there would literally be no overhead. For instance, in the first example we saw the instruction stream
40055c: 48 83 ec 08 sub $0x8,%rsp
400560: b8 00 00 00 00 mov $0x0,%eax
A modern Intel CPU will execute both of these instructions simultaneously, in a
single clock cycle, since there is no data dependency between %rsp
and %eax
.
Making a "Variadic" Call
However, what is scarier is that because of this rule if you accidentally pass
arguments to hello1()
the compiler won't generate an error or warning, even if
you compile with -Wall
! For instance, if you compile this program:
#include <stdio.h>
void hello1() { puts("Hello, world!\n"); }
int main(int argc, char **argv) {
hello1(1);
return 0;
}
Then gcc -Wall
not throw a warning even though this is clearly a mistake. The
code generated is interesting in this example:
0000000000400536 <hello1>:
400536: 48 83 ec 08 sub $0x8,%rsp
40053a: bf 00 06 40 00 mov $0x400600,%edi
40053f: e8 cc fe ff ff callq 400410 <puts@plt>
400544: 48 83 c4 08 add $0x8,%rsp
400548: c3 retq
0000000000400549 <main>:
400549: 48 83 ec 08 sub $0x8,%rsp
40054d: bf 01 00 00 00 mov $0x1,%edi
400552: b8 00 00 00 00 mov $0x0,%eax
400557: e8 da ff ff ff callq 400536 <hello1>
40055c: b8 00 00 00 00 mov $0x0,%eax
400561: 48 83 c4 08 add $0x8,%rsp
400565: c3 retq
400566: 66 2e 0f 1f 84 00 00 nopw %cs:0x0(%rax,%rax,1)
40056d: 00 00 00
As you can see, the compiler will load 1 into %edi
even though hello1()
doesn't actually read that value; in fact, hello1()
immediately clobbers
%edi
with the pointer to the string literal that is passed to puts()
.
Making a Floating Point Variadic Call
Here's what happens if we use a vector register, this time by calling it with a floating point number:
#include <stdio.h>
void hello1() { puts("Hello, world!\n"); }
int main(int argc, char **argv) {
hello1(1.0f);
return 0;
}
And the generated code is:
0000000000400536 <hello1>:
400536: 48 83 ec 08 sub $0x8,%rsp
40053a: bf 00 06 40 00 mov $0x400600,%edi
40053f: e8 cc fe ff ff callq 400410 <puts@plt>
400544: 48 83 c4 08 add $0x8,%rsp
400548: c3 retq
0000000000400549 <main>:
400549: 48 83 ec 08 sub $0x8,%rsp
40054d: f2 0f 10 05 bb 00 00 movsd 0xbb(%rip),%xmm0 # 400610 <__dso_handle+0x18>
400554: 00
400555: b8 01 00 00 00 mov $0x1,%eax
40055a: e8 d7 ff ff ff callq 400536 <hello1>
40055f: b8 00 00 00 00 mov $0x0,%eax
400564: 48 83 c4 08 add $0x8,%rsp
400568: c3 retq
400569: 0f 1f 80 00 00 00 00 nopl 0x0(%rax)
As you can see, here the literal value 1 is stored into %eax
to indicate that
the parameter passed in is stored in the vector register %xmm0
.
What Makes This Really Weird
This is already kind of strange as it is. But here's what's really interesting.
You'll notice that in the example where we invoked hello1(1)
that the argument
was stored into %edi
, but nothing else changed. That's because in C, when you
call a variadic function the calling ABI doesn't actually tell you how many
arguments you got!
So when you make a call like this:
printf("Hello from %d\n", getpid());
The way that printf()
knows that it was passed a single argument is by
actually scanning the format string for %
formatters (%d
in this case), and
then counting those. This is why if you call printf()
with the too few
arguments you can can get unexpected things printed to stdout, since you'll
print whatever happens to be in the registers that printf()
is looking for
that argument in.
In the very uncommon case when you write a variadic function that doesn't have something like a format string, the convention typically used is that that one of the fixed arguments to the function holds the number of arguments to expect. For instance, you might have a function declared like:
void magic(int num_args, ...);
And then to actually call it you'd have to use an invocation like:
magic(3, 1.0f, "foo", 42);
Then the implementation of magic()
will have to use the va_start()
and
va_arg()
macros and know to stop calling va_arg()
purely based on inspecting
num_arg
, since there is literally no other way for magic()
to know how many
variadic arguments you passed in.
This means that considering a function declaration like:
void magic();
as variadic makes almost no sense. Because there's no a priori way for the
implementation of this version of magic()
to know how many variadic arguments
it was passed in. In fact, to use the va_start()
macro you are required to
pass it the last non-variadic argument in the function signature. That means
that even if you did come up with a weird protocol for figuring out when
va_arg()
should stop being called, you'd have to no way to initialize your
va_list
using the va_start()
macro. Really the only way to do this at all
would be to write inline assembler in the C function, which wouldn't make a lot
of sense.
C is a rather beautiful language, but it definitely has its warts.