- Implement x86-64 extern(C), hopefully correctly.
- Tried to be a bit smarter about extern(D) while I was there.
Interestingly, this code seems to be generating more efficient code than
gcc and llvm-gcc in some edge cases, like returning a `{ [7 x i8] }` loaded from
a stack slot from an extern(C) function. (gcc generates 7 1-byte loads, while
this code generates a 4-byte, a 2-byte and a 1-byte load)
I also added some changes to make sure structs being returned from functions or
passed in as parameters are stored in memory where the rest of the backend seems
to expect them to be. These should be removed when support for first-class
aggregates improves.
functions.
There's no need to waste cycles with extern(D), which we get to define
ourselves. Fixes tests/mini/asm8.d. (Since the asm abiret code already assumed
{xmm0, xmm1} returns)