当GCC以优化方式编译代码的时候,它会执行Dead Code Elimiation(DCE), 就是把那些源代码中定义但是却从未调用到的函数从中间目标文件中去掉.(.o文件)
例如下面这段代码:
#include <stdio.h>
static void test() {
printf ("this code is never called.");
}
int main() {
printf("this is main function.");
return 0;
}
这里我们定义了一个static函数和一个main函数.按照C语言的约定,static 函数是只在当前模块可见, 非static函数则可被其它模块所包含.
然后我们通过检查使用和不使用DCE时,GCC的汇编输出来观察DCE的作用.
不使用DCE:
gcc -S -fno-builtin -fdump-ipa-cgraph test.c -o test.S
这里生成汇编结果说明了DCE的过程是在编译阶段已经完成, 命令中
-fdump-ipa-cgraph, 这是个调试输出选项,会生成一个.cgraph文件,我们后面会进一步查看这个文件.
查看test.S可以发现_test这个函数的定义出现在汇编代码中,
1 .cstring
2 LC0:
3 .ascii "Hello World\12\0"
4 .text
5 _test:
6 pushl %ebp
7 movl %esp, %ebp
8 subl $24, %esp
9 call L3
10 "L00000000001$pb":
11 L3:
12 popl %ecx
13 leal LC0-"L00000000001$pb"(%ecx), %eax
14 movl %eax, -16(%ebp)
15 leave
16 ret
17 .globl _main
18 _main:
19 pushl %ebp
20 movl %esp, %ebp
21 pushl %ebx
22 subl $20, %esp
23 call L6
24 "L00000000002$pb":
25 L6:
26 popl %ebx
27 leal LC0-"L00000000002$pb"(%ebx), %eax
28 movl %eax, (%esp)
29 call L_printf$stub
30 movl $0, %eax
31 addl $20, %esp
32 popl %ebx
33 leave
34 ret
35 .section __IMPORT,__jump_table,symbol_stubs,self_modifying_code+pure_instructions,5
36 L_printf$stub:
37 .indirect_symbol _printf
38 hlt ; hlt ; hlt ; hlt ; hlt
39 .subsections_via_symbols
40
41
如果编译生成.o文件的话,使用nm工具可以看到中间目标文件中的符号定义中也存在test这个函数.
00000017 T _main
U _printf
00000000 t _test
接下来打开DCE开关看看:
gcc -O -S -fno-builtin -fdump-ipa-cgraph test.c -o test.S
可以看到,test函数已经不在了.nm中也不见了_test符号.
U _printf
随汇编过程,GCC还会生成一个.cgraph的文件,这里面记录了gcc移除函数过程.
1 Initial entry points: main
2 Unit entry points: main
3
4 Initial callgraph:
5
6 main/4: 16 insns needed tree inlinable
7 called by:
8 calls: printf/3
9 printf/3:
10 called by: main/4
11 calls:
12 test/2: tree
13 called by:
14 calls:
15 __sputc/1: tree
16 called by:
17 calls:
18 __swbuf/0:
19 called by:
20 calls:
21
22 Reclaiming functions: test __sputc
23
24 Reclaimed callgraph:
25
26 main/4: 16 insns needed tree inlinable
27 called by:
28 calls: printf/3
29 printf/3:
30 called by: main/4
31 calls:
32 __swbuf/0:
33 called by:
34 calls:
35
36 Marking local functions:
37
38 Marked callgraph:
39
40 main/4: 16 insns needed tree inlinable
41 called by:
42 calls: printf/3
43 printf/3:
44 called by: main/4
45 calls:
46 __swbuf/0:
47 called by:
48 calls:
49
50 Deciding on inlining. Starting with 16 insns.
51
52 Inlining always_inline functions:
53
54 Deciding on smaller functions:
55
56 Deciding on functions called once:
57
58 Reclaiming functions: __swbuf
59 Reclaimed 0 insns
60 Inlined 0 calls, eliminated 0 functions, 16 insns turned to 16 insns.
61
62 Optimized callgraph:
63
64 main/4: 16 insns needed tree inlinable
65 called by:
66 calls: printf/3
67 printf/3:
68 called by: main/4
69 calls:
70
71 Final callgraph:
72
73 main/4: 16 insns needed inlinable asm_written
74 called by:
75 calls:
76 printf/3:
77 called by:
78 calls: