C性能调优---GCC编译选项-fomit-frame-pointer
2013-11-26 21:02 islandscape 阅读(8301) 评论(0) 编辑 收藏 举报在看《C程序性能优化》一书时,作者提到使用gcc编译器选项-fomit-frame-pointer能够提高程序性能,自己有些不解,决定探个究竟。
假设有如下简单程序:
#include <stdio.h> int add(int a, int b) { return a + b; } int main() { int sum = 0; sum = add(1,2); printf("%d\n",sum); return 0; }
不使用-fomit-frame-pointer选项编译出的二进制经过反汇编的代码如下:
00000000 <add>: 0: 55 push %ebp 1: 89 e5 mov %esp,%ebp 3: 8b 45 0c mov 0xc(%ebp),%eax 6: 8b 55 08 mov 0x8(%ebp),%edx 9: 01 d0 add %edx,%eax b: 5d pop %ebp c: c3 ret 0000000d <main>: d: 55 push %ebp e: 89 e5 mov %esp,%ebp 10: 83 e4 f0 and $0xfffffff0,%esp 13: 83 ec 20 sub $0x20,%esp 16: c7 44 24 1c 00 00 00 movl $0x0,0x1c(%esp) 1d: 00 1e: c7 44 24 04 02 00 00 movl $0x2,0x4(%esp) 25: 00 26: c7 04 24 01 00 00 00 movl $0x1,(%esp) 2d: e8 fc ff ff ff call 2e <main+0x21> 32: 89 44 24 1c mov %eax,0x1c(%esp) 36: b8 00 00 00 00 mov $0x0,%eax 3b: 8b 54 24 1c mov 0x1c(%esp),%edx 3f: 89 54 24 04 mov %edx,0x4(%esp) 43: 89 04 24 mov %eax,(%esp) 46: e8 fc ff ff ff call 47 <main+0x3a> 4b: b8 00 00 00 00 mov $0x0,%eax 50: c9 leave 51: c3 ret
加上编译选项-fomit-frame-pointer反汇编得到的代码如下:
00000000 <add>: 0: 8b 44 24 08 mov 0x8(%esp),%eax 4: 8b 54 24 04 mov 0x4(%esp),%edx 8: 01 d0 add %edx,%eax a: c3 ret 0000000b <main>: b: 55 push %ebp c: 89 e5 mov %esp,%ebp e: 83 e4 f0 and $0xfffffff0,%esp 11: 83 ec 20 sub $0x20,%esp 14: c7 44 24 1c 00 00 00 movl $0x0,0x1c(%esp) 1b: 00 1c: c7 44 24 04 02 00 00 movl $0x2,0x4(%esp) 23: 00 24: c7 04 24 01 00 00 00 movl $0x1,(%esp) 2b: e8 fc ff ff ff call 2c <main+0x21> 30: 89 44 24 1c mov %eax,0x1c(%esp) 34: b8 00 00 00 00 mov $0x0,%eax 39: 8b 54 24 1c mov 0x1c(%esp),%edx 3d: 89 54 24 04 mov %edx,0x4(%esp) 41: 89 04 24 mov %eax,(%esp) 44: e8 fc ff ff ff call 45 <main+0x3a> 49: b8 00 00 00 00 mov $0x0,%eax 4e: c9 leave 4f: c3 ret
可以看到不加-fomit-frame-pointer选项编译出来的代码少了一些,最主要的区别是少了栈帧的切换和栈地址的保存,栈是从高地址向低地址扩展,而堆是从低地址向高地址扩展。在x86体系结构中,栈顶寄存器是esp,栈底寄存器位ebp,esp的值要小于ebp的值。函数调用时先将函数返回值、传入参数依次压入栈中,CPU访问时采用0x8(%esp)方式访问传入的参数,使用-fomit-frame-pointer会由于没有保存栈调用地址,而导致无法追踪函数调用顺序,我想gcc,vs等编译器记录函数调用顺序都是采用这种方式吧。