从__builtin_eh_return看callee saved register
问题
C++的异常处理看起来是一个比较神奇的功能,能够在运行时穿越堆栈,从异常发生位置直达异常处理位置。通过gcc的代码可以看到,这个堆栈回溯的一个关键步骤是这个宏,其中又使用了gcc的内置指令__builtin_eh_return。网上关于__builtin_eh_return这个内置函数的资料较少,结合gcc的源代码可以猜测,这个内置函数的主要功能是和常规的return指令的功能类似,主要就是在函数结束的时候恢复该函数修改的(clobbered)寄存器,也就是所谓的callee saved registers。
/* Install TARGET into CURRENT so that we can return to it. This is a
macro because __builtin_eh_return must be invoked in the context of
our caller. */
#define uw_install_context(CURRENT, TARGET) \
do \
{ \
long offset = uw_install_context_1 ((CURRENT), (TARGET)); \
void *handler = uw_frob_return_addr ((CURRENT), (TARGET)); \
_Unwind_DebugHook ((TARGET)->cfa, handler); \
__builtin_eh_return (offset, handler); \
} \
while (0)
测试
SO上说明callee是需要保留EBP//ESI/EDI寄存器的,连超级大佬Raymond Chen都来站台
The Windows and SystemV calling convention for x86-32 requires functions to preserve the ebx, esi, edi, and ebp registers. But these are just conventions. – Raymond Chen
但是简单测试下并非如此,生成的代码中明显修改了rsi寄存器的值,但是在__builtin_eh_return指令中并没有恢复rsi寄存器的内容。
movq %rdx, %rsi
movq %rax, %rdi
完整测试栗子如下
tsecer@harry: cat eh_return.c
void foo(long xx, void * yy)
{
extern int bar(long, long);
bar(xx, xx);
__builtin_eh_return (1111, (void*)(2222L));
}
tsecer@harry: g++ -S eh_return.c
tsecer@harry: cat eh_return.s
.file "eh_return.c"
.text
.globl _Z3foolPv
.type _Z3foolPv, @function
_Z3foolPv:
.LFB0:
.cfi_startproc
pushq %rbp
.cfi_def_cfa_offset 16
.cfi_offset 6, -16
movq %rsp, %rbp
.cfi_def_cfa_register 6
pushq %rdx
pushq %rax
subq $16, %rsp
.cfi_offset 1, -24
.cfi_offset 0, -32
movq %rdi, -24(%rbp)
movq %rsi, -32(%rbp)
movq -24(%rbp), %rdx
movq -24(%rbp), %rax
movq %rdx, %rsi
movq %rax, %rdi
call _Z3barll
movl $1111, %edx
movl $2222, %eax
movq %rdx, %rcx
movq %rax, 8(%rbp,%rcx)
movq -16(%rbp), %rax
movq -8(%rbp), %rdx
leaq 8(%rbp,%rcx), %rcx
movq 0(%rbp), %rbp
.cfi_restore 6
.cfi_def_cfa 2, 8
movq %rcx, %rsp
ret
.cfi_endproc
.LFE0:
.size _Z3foolPv, .-_Z3foolPv
.section .note.GNU-stack,"",@progbits
tsecer@harry:
如果觉得这个测试有问题,可以看下gcc自带库中的代码,从汇编代码可以看到,在函数的序言(prelogue)中是修改了rsi和rdi的值,但是在执行__builtin_eh_return函数的时候只是恢复了rbx、和r12——r15,明显没有恢复被修改了的rsi和rdi寄存器。
(gdb) disas
Dump of assembler code for function _Unwind_RaiseException:
0x00007ffff73431a0 <+0>: push %rbp
0x00007ffff73431a1 <+1>: mov %rsp,%rbp
0x00007ffff73431a4 <+4>: push %r15
0x00007ffff73431a6 <+6>: push %r14
0x00007ffff73431a8 <+8>: push %r13
0x00007ffff73431aa <+10>: push %r12
0x00007ffff73431ac <+12>: lea -0x3a0(%rbp),%r14
0x00007ffff73431b3 <+19>: push %rbx
0x00007ffff73431b4 <+20>: push %rdx
0x00007ffff73431b5 <+21>: lea 0x10(%rbp),%rsi
0x00007ffff73431b9 <+25>: push %rax
0x00007ffff73431ba <+26>: mov %rdi,%r12
0x00007ffff73431bd <+29>: mov %r14,%rdi
0x00007ffff73431c0 <+32>: lea -0x1c0(%rbp),%r13
###.....
0x00007ffff734344b <+683>: call 0x7ffff7342f00 <_Unwind_RaiseException_Phase2>
0x00007ffff7343450 <+688>: cmp $0x7,%eax
0x00007ffff7343453 <+691>: jne 0x7ffff7343310 <_Unwind_RaiseException+368>
0x00007ffff7343459 <+697>: mov %rbx,%rsi
0x00007ffff734345c <+700>: mov %r14,%rdi
0x00007ffff734345f <+703>: call 0x7ffff7340d80 <uw_install_context_1>
0x00007ffff7343464 <+708>: mov -0x218(%rbp),%r8
0x00007ffff734346b <+715>: mov -0x220(%rbp),%rdi
0x00007ffff7343472 <+722>: mov %r8,%rsi
0x00007ffff7343475 <+725>: call 0x7ffff7343190 <_Unwind_DebugHook>
=> 0x00007ffff734347a <+730>: mov %rax,%rcx
0x00007ffff734347d <+733>: mov %r8,0x8(%rbp,%rax,1)
0x00007ffff7343482 <+738>: mov -0x38(%rbp),%rax
0x00007ffff7343486 <+742>: lea 0x8(%rbp,%rcx,1),%rcx
0x00007ffff734348b <+747>: mov -0x30(%rbp),%rdx
0x00007ffff734348f <+751>: mov -0x28(%rbp),%rbx
0x00007ffff7343493 <+755>: mov -0x20(%rbp),%r12
0x00007ffff7343497 <+759>: mov -0x18(%rbp),%r13
0x00007ffff734349b <+763>: mov -0x10(%rbp),%r14
0x00007ffff734349f <+767>: mov -0x8(%rbp),%r15
0x00007ffff73434a3 <+771>: mov 0x0(%rbp),%rbp
0x00007ffff73434a7 <+775>: mov %rcx,%rsp
0x00007ffff73434aa <+778>: ret
End of assembler dump.
(gdb)
答案
But in x86-64 System V, the designers chose registers from scratch, and (as my answer on that linked question shows) found that using RDI and RSI for the first 2 args saved instructions (when building SPECint with an early x86-64 port of gcc). Probably because gcc at the time liked to inline memset or memcpy using rep stosd, or the library implementation used that.
大致来说:32bits的sysv对esi/edi的使用和64bits的sysv对于rsi/rdi的使用约定并不相同,而widows和sysv对于64bits下的rsi/rdi的约定也不相同,所以容易引起混淆。如果想当然的以为32bits中esi/edi是callee saved,扩展到64bits之后对应的rsi/rdi也是callee saved,那就有些想当然(的错误)了。
补充
从gdb的代码可以看到,intel的寄存器并不是严格按照字母顺序编码数值的
static const char *att_names64[] = {
"%rax", "%rcx", "%rdx", "%rbx", "%rsp", "%rbp", "%rsi", "%rdi",
"%r8", "%r9", "%r10", "%r11", "%r12", "%r13", "%r14", "%r15"
};
从下面的汇编代码也可以看到,rbx并不是第二个而是第四个寄存器。
tsecer@harry: cat gcc_inline_push_reg.c
void foo()
{
__asm__(
"push %rax\n\t"
"push %rbx\n\t"
"push %rcx\n\t"
"push %rdx\n\t"
"push %rsp\n\t"
"push %rbp\n\t"
"push %rsi\n\t"
"push %rdi\n\t"
);
}
tsecer@harry: gcc -g -c gcc_inline_push_reg.c
tsecer@harry: gdb gcc_inline_push_reg.o -quiet
Registered pretty printers for UE classes
Registered pretty printers for UE classes
Reading symbols from gcc_inline_push_reg.o...
(gdb) disas/r foo
Dump of assembler code for function foo:
0x0000000000000000 <+0>: 55 push %rbp
0x0000000000000001 <+1>: 48 89 e5 mov %rsp,%rbp
0x0000000000000004 <+4>: 50 push %rax
0x0000000000000005 <+5>: 53 push %rbx
0x0000000000000006 <+6>: 51 push %rcx
0x0000000000000007 <+7>: 52 push %rdx
0x0000000000000008 <+8>: 54 push %rsp
0x0000000000000009 <+9>: 55 push %rbp
0x000000000000000a <+10>: 56 push %rsi
0x000000000000000b <+11>: 57 push %rdi
0x000000000000000c <+12>: 90 nop
0x000000000000000d <+13>: 5d pop %rbp
0x000000000000000e <+14>: c3 ret
End of assembler dump.
(gdb)
那为什么Ax到Dx不是按照字母顺序编码为0——3呢?从这些讨论可以知道:或许可以认为AX到Dx只是一种巧合的注记表示方法,它们分别是Accumulate、Base、Counter、Double(和Accumulate一起组成更长的一个数值)的缩写,或许从逻辑上(或者386发布时主要是用的汇编语言来看)理解,Accumulate或许和Counter更长在一起使用?
在SE的这个讨论帖子中,又更多的深入讨论,其中一个观点就是根据使用频率对寄存器进行的数值编码:
i always learned these registers as accumulate, count, data, and base. They weren't ordered alphabetically so much as ordered by usage, ax for most arithmetic operations, cx for loop counters, dx for either left over arithmetic (think of the remainder or carry for div/mul) or i/o data, and bx for a base pointer to memory. Roughly, the ACDB is the order of importance for your average use case – Steve Cox Dec 1, 2017 at 18:54
有人提到pusha指令的一个细节,所有寄存器入栈的顺序也是,ACDB,从侧面印证寄存器内部使用的是这种顺序的编码
The AX/CX/DX/BX order also makes an appearance in PUSHA, which suggests it might correspond to the internal register file implementation... – Stephen Kitt Dec 1, 2017 at 15:31