
刘文学 + 原创作品转载请注明出处 http://blog.csdn.net/wdxz6547/article/details/51112486 + 《Linux内核分析》MOOC课程http://mooc.study.163.com/course/USTC-1000029000



  1. 一个程序文件(.c, .cpp, .java .go) 文件是怎样变成二进制文件的.
  2. 二进制文件是怎样被载入并运行的.


  1. 一个二进制文件的格式是怎么样的? 不同的语言的二进制文件格式会不同么? 主要探讨 ELF 格式文件
  2. 静态链接和动态链接的差别
  3. 可运行文件与进程的地址空间的映射关系

一个程序文件(.c, .cpp, .java .go) 文件是怎样变成二进制文件的

C 文件 –> 预处理 –> 汇编成汇编代码(.asm) –> 汇编成目标码(.o) –> 链接成可运行文件

  1. 预处理: 把 include 的文件包括进来及宏定义替换

gcc -E -o hello.cpp hello.c

  1. 编译

gcc -x cpp-output -S -o hello.s hello.cpp

  1. 汇编: 生成二进制文件(之前都是可读的文本文件, 此步骤生成二进制文件,
    包括一些机器指令, 但不是可运行文件)

gcc -x assembler -c hello.s -o hello.o

  1. 链接(ELF 格式文件)

gcc -o hello hello.o //默认动态
gcc -o hello.static hello.o -static //静态

$ readelf -h hello.o

ELF Header:
  Magic:   7f 45 4c 46 02 01 01 00 00 00 00 00 00 00 00 00
  Class:                             ELF64
  Data:                              2's complement, little endian
  Version:                           1 (current)
  OS/ABI:                            UNIX - System V
  ABI Version:                       0
  Type:                              REL (Relocatable file)
  Machine:                           Advanced Micro Devices X86-64
  Version:                           0x1
  Entry point address:               0x0
  Start of program headers:          0 (bytes into file)
  Start of section headers:          320 (bytes into file)
  Flags:                             0x0
  Size of this header:               64 (bytes)
  Size of program headers:           0 (bytes)
  Number of program headers:         0
  Size of section headers:           64 (bytes)
  Number of section headers:         13
  Section header string table index: 10

$ readelf -h hello

ELF Header:
  Magic:   7f 45 4c 46 02 01 01 00 00 00 00 00 00 00 00 00
  Class:                             ELF64
  Data:                              2's complement, little endian
  Version:                           1 (current)
  OS/ABI:                            UNIX - System V
  ABI Version:                       0
  Type:                              EXEC (Executable file)
  Machine:                           Advanced Micro Devices X86-64
  Version:                           0x1
  Entry point address:               0x400440
  Start of program headers:          64 (bytes into file)
  Start of section headers:          4504 (bytes into file)
  Flags:                             0x0
  Size of this header:               64 (bytes)
  Size of program headers:           56 (bytes)
  Number of program headers:         9
  Size of section headers:           64 (bytes)
  Number of section headers:         30
  Section header string table index: 27

$ readelf -h hello.static

ELF Header:
  Magic:   7f 45 4c 46 02 01 01 03 00 00 00 00 00 00 00 00
  Class:                             ELF64
  Data:                              2's complement, little endian
  Version:                           1 (current)
  OS/ABI:                            UNIX - GNU
  ABI Version:                       0
  Type:                              EXEC (Executable file)
  Machine:                           Advanced Micro Devices X86-64
  Version:                           0x1
  Entry point address:               0x400f4e
  Start of program headers:          64 (bytes into file)
  Start of section headers:          789968 (bytes into file)
  Flags:                             0x0
  Size of this header:               64 (bytes)
  Size of program headers:           56 (bytes)
  Number of program headers:         6
  Size of section headers:           64 (bytes)
  Number of section headers:         31
  Section header string table index: 28

$ ldd hello
linux-vdso.so.1 => (0x00007fff06ffe000)
libc.so.6 => /lib/x86_64-linux-gnu/libc.so.6 (0x00007fc6c2d40000)
/lib64/ld-linux-x86-64.so.2 (0x00007fc6c3125000)



A.out --> COFF --> PE (Windows)
               --> ELF (Linux)

ABI 与目标文件格式关系: 目标文件一般也叫ABI 文件, 实际目标文件已经是二进制兼容的格式(即该二进制文件已经适应到某一种 CPU 体系结构的二进制指令).


Object 參与程序的链接(创建一个程序)和运行(运行一个程序)

Linking View Execution View
============ ==============
ELF header ELF header
Program header table (optional) Program header table
Section 1 Segment 1
… Segment 2
Section n …
Section header table Section header table (optional)

ELF 头在文件的开头, 保存了线路图(road map), 描写叙述了文件的组织情况


section 头表: 包括描写叙述文件 sections 部分, 每一个 section 在这个表中都有一个入口;
每一个入口给出了该 section 的名字, 大小等信息


当创建或添加一个进程映像的时候, 系统理论上将拷贝一个文件的段到一个虚拟的内存段

           File Offset   File                  Virtual Address
           ===========   ====                  ===============
                     0   ELF header
  Program header table
                         Other information
                 0x100   Text segment          0x8048100
                         0x2be00 bytes         0x8073eff  //8048100 + 2be00
               0x2bf00   Data segment          0x8074f00
                         0x4e00 bytes          0x8079cff
               0x30d00   Other information

静态链接的 ELF 可运行文件与进程的地址空间的关系




由前面章节的知识推測, 运行一个二进制文件的基本思路:

开启一个新的进程, 该进程主要工作就是载入并运行可运行文件, 主要包括载入与运行两部分; 当代码运行到载入可运行文件的时候, 调用 execve 系统调用. 该调用应该将可运行文件的内容载入到内存而且重置堆栈, sp, ip, 等关键寄存器, 之后运行可运行文件里指定的代码,这里必定涉及到寄存器相关的操作.

这里将以 bash 为例解释一个程序的运行的过程(其它相似).

  1. Shell 将命令行參数和环境參数传递给Bash 的 main 函数, main 函数将命令行解析后传递给系统调用 execve

首先, 我们在 bash 中输入一个命令


因为 bash 也是 C 程序, 因此它也一定有 main 函数. 关于 shell 怎样到达 execve 的过程略.
假设你想看你运行的程序在 execve 是怎么运行的,

int execve(const char * filename,char * const argv[ ],char * const envp[ ]);

$ strace ./hello

execve("./hello", ["./hello"], [/* 78 vars */]) = 0
brk(0)                                  = 0xacd000
access("/etc/ld.so.nohwcap", F_OK)      = -1 ENOENT (No such file or directory)
mmap(NULL, 8192, PROT_READ|PROT_WRITE, MAP_PRIVATE|MAP_ANONYMOUS, -1, 0) = 0x7f08182cc000
access("/etc/ld.so.preload", R_OK)      = -1 ENOENT (No such file or directory)
open("/etc/ld.so.cache", O_RDONLY|O_CLOEXEC) = 3
fstat(3, {st_mode=S_IFREG|0644, st_size=122541, ...}) = 0
mmap(NULL, 122541, PROT_READ, MAP_PRIVATE, 3, 0) = 0x7f08182ae000
close(3)                                = 0
access("/etc/ld.so.nohwcap", F_OK)      = -1 ENOENT (No such file or directory)
open("/lib/x86_64-linux-gnu/libc.so.6", O_RDONLY|O_CLOEXEC) = 3
read(3, "\177ELF\2\1\1\0\0\0\0\0\0\0\0\0\3\0>\0\1\0\0\0\320\37\2\0\0\0\0\0"..., 832) = 832
fstat(3, {st_mode=S_IFREG|0755, st_size=1840928, ...}) = 0
mmap(NULL, 3949248, PROT_READ|PROT_EXEC, MAP_PRIVATE|MAP_DENYWRITE, 3, 0) = 0x7f0817ce7000
mprotect(0x7f0817ea2000, 2093056, PROT_NONE) = 0
mmap(0x7f08180a1000, 24576, PROT_READ|PROT_WRITE, MAP_PRIVATE|MAP_FIXED|MAP_DENYWRITE, 3, 0x1ba000) = 0x7f08180a1000
mmap(0x7f08180a7000, 17088, PROT_READ|PROT_WRITE, MAP_PRIVATE|MAP_FIXED|MAP_ANONYMOUS, -1, 0) = 0x7f08180a7000
close(3)                                = 0
mmap(NULL, 4096, PROT_READ|PROT_WRITE, MAP_PRIVATE|MAP_ANONYMOUS, -1, 0) = 0x7f08182ad000
mmap(NULL, 8192, PROT_READ|PROT_WRITE, MAP_PRIVATE|MAP_ANONYMOUS, -1, 0) = 0x7f08182ab000
arch_prctl(ARCH_SET_FS, 0x7f08182ab740) = 0
mprotect(0x7f08180a1000, 16384, PROT_READ) = 0
mprotect(0x600000, 4096, PROT_READ)     = 0
mprotect(0x7f08182ce000, 4096, PROT_READ) = 0
munmap(0x7f08182ae000, 122541)          = 0
fstat(1, {st_mode=S_IFCHR|0620, st_rdev=makedev(136, 10), ...}) = 0
mmap(NULL, 4096, PROT_READ|PROT_WRITE, MAP_PRIVATE|MAP_ANONYMOUS, -1, 0) = 0x7f08182cb000
write(1, "hello kernel", 12hello kernel)            = 12
exit_group(0)                           = ?
+++ exited with 0 +++

main 实际调用 execve 系统调用完毕命令运行


        const char __user *, filename,
        const char __user *const __user *, argv,
        const char __user *const __user *, envp)
    return do_execve(getname(filename), argv, envp);


int do_execve(struct filename *filename,
    const char __user *const __user *__argv,
    const char __user *const __user *__envp)
    struct user_arg_ptr argv = { .ptr.native = __argv }; //复制环境变量和參数信息
    struct user_arg_ptr envp = { .ptr.native = __envp };
    return do_execveat_common(AT_FDCWD, filename, argv, envp, 0);

 * sys_execve() executes a new program.
static int do_execveat_common(int fd, struct filename *filename,
                  struct user_arg_ptr argv,
                  struct user_arg_ptr envp,
                  int flags)
    file = do_open_execat(fd, filename, flags);
        retval = PTR_ERR(file);
        if (IS_ERR(file))
            goto out_unmark;

    retval = copy_strings(bprm->envc, envp, bprm);
    if (retval < 0)
        goto out;
    retval = copy_strings(bprm->argc, argv, bprm);
    if (retval < 0)
goto out;
    retval = exec_binprm(bprm);
    if (retval < 0)
        goto out;

static int exec_binprm(struct linux_binprm *bprm)
    pid_t old_pid, old_vpid;
        int ret;
        /* Need to fetch pid before load_binary changes it */
        old_pid = current->pid;
        old_vpid = task_pid_nr_ns(current, task_active_pid_ns(current->parent));

    ret = search_binary_handler(bprm);
        if (ret >= 0) {
            trace_sched_process_exec(current, old_pid, bprm);
            ptrace_event(PTRACE_EVENT_EXEC, old_vpid);

    return ret;

int search_binary_handler(struct linux_binprm *bprm) {
    list_for_each_entry(fmt, &formats, lh) {
            if (!try_module_get(fmt->module))
            retval = fmt->load_binary(bprm);
            if (retval < 0 && !bprm->mm) {
                /* we got to flush_old_exec() and failed after it */
                force_sigsegv(SIGSEGV, current);
                return retval;
            if (retval != -ENOEXEC || !bprm->file) {
                return retval;

 * This structure defines the functions that are used to load the binary formats that
 * linux accepts.
struct linux_binfmt {
    struct list_head lh;
    struct module *module;
    int (*load_binary)(struct linux_binprm *);
    int (*load_shlib)(struct file *);
    int (*core_dump)(struct coredump_params *cprm);
    unsigned long min_coredump; /* minimal dump size */


static struct linux_binfmt elf_format = {
    .module     = THIS_MODULE,
    .load_binary    = load_elf_binary,
    .load_shlib = load_elf_library,
    .core_dump  = elf_core_dump,
    .min_coredump   = ELF_EXEC_PAGESIZE,

static int load_elf_binary(struct linux_binprm *bprm)
    start_thread(regs, elf_entry, bprm->p);
    retval = 0;

start_thread(struct pt_regs *regs, unsigned long new_ip, unsigned long new_sp)
    start_thread_common(regs, new_ip, new_sp,
                __USER_CS, __USER_DS, 0);


static void
start_thread_common(struct pt_regs *regs, unsigned long new_ip,
            unsigned long new_sp,
            unsigned int _cs, unsigned int _ss, unsigned int _ds)
    loadsegment(fs, 0);
    loadsegment(es, _ds);
    loadsegment(ds, _ds);
    regs->ip        = new_ip;
    regs->sp        = new_sp;
    regs->cs        = _cs;
    regs->ss        = _ss;
    regs->flags     = X86_EFLAGS_IF;

眼下 Linux 支持的二进制格式

binfmt_script - support for interpreted scripts that are starts from the #! line;

static struct linux_binfmt script_format = {
    .module     = THIS_MODULE,
    .load_binary    = load_script,

binfmt_misc - support different binary formats, according to runtime configuration of the Linux kernel;
binfmt_misc detects binaries via a magic or filename extension and invokes a specified wrapper. This
should obsolete binfmt_java, binfmt_em86 and binfmt_mz.

static struct linux_binfmt misc_format = {
    .module = THIS_MODULE,
    .load_binary = load_misc_binary,

binfmt_elf - support elf format;

binfmt_aout - support a.out format;

static struct linux_binfmt script_format = {
    .module     = THIS_MODULE,
    .load_binary    = load_script,

binfmt_flat - support for flat format;
binfmt_elf_fdpic - Support for elf FDPIC binaries;

som_format - support som format used by HP-UX.;

static struct linux_binfmt som_format = {
    .module     = THIS_MODULE,
    .load_binary    = load_som_binary,
    .load_shlib = load_som_library,
    .core_dump  = som_core_dump,
    .min_coredump   = SOM_PAGESIZE

flat_format : support flat_format

static struct linux_binfmt flat_format = {
    .module     = THIS_MODULE,
    .load_binary    = load_flat_binary,
    .core_dump  = flat_core_dump,
    .min_coredump   = PAGE_SIZE

binfmt_em86 - support for Intel elf binaries running on Alpha machines.

static struct linux_binfmt em86_format = {
    .module     = THIS_MODULE,
    .load_binary    = load_em86,

elf_fdpic_format :

static struct linux_binfmt elf_fdpic_format = {
    .module     = THIS_MODULE,
    .load_binary    = load_elf_fdpic_binary,
    .core_dump  = elf_fdpic_core_dump,
    .min_coredump   = ELF_EXEC_PAGESIZE,

各种格式通过 register_binfmt 注冊

execve -> do_execve -> do_execveat_common -> exec_binprm –> search_binary_handler
–> load_elf_binary -> start_thread –> start_thread_common

当中 start_thread_common 通过改动内核 EIP 作为程序新的起点.


相应 ELF 文件能够參考 load_elf_library 函数








a) 符号解析:目标文件定义和引用符号。
b) 重定位:编译器和汇编器生成从地址0開始的代码和数据节。链接后可运行文件里的各个段的虚拟地址都已经确定。链接器就改动全部对这些符号的引用,从而重定位这些节。


a) 可重定位目标文件:包括二进制代码和数据。(形式name.o)
b) 可运行目标文件:包括二进制代码和数据。能够复制到存储器并运行。(形式name.out)
c) 共享目标文件:一种特殊类型的可重定位目标文件,能够在载入或者运行时被动态地载入到存储器并链接。


不是, 由 ld 程序



$ ls

dllibexample.c  dllibexample.h  main.c  shlibexample.c  shlibexample.h

$ gcc -fPIC -shared shlibexample.c -o libshlibexample.so

$ gcc -fPIC -shared dllibexample.c -o libdllibexample.so

$ gcc main.c -o main -L . -lshlibexample -ldl


$ ./main

This is a Main program!
Calling SharedLibApi() function of libshlibexample.so!
This is a shared libary!
Calling DynamicalLoadingLibApi() function of libdllibexample.so!
This is a Dynamical Loading libary!



qemu-system-x86_64 -kernel ../linux-3.18.6/arch/x86/boot/bzImage -initrd ../rootfs.img -S -s

(gdb) file ../linux-3.18.6/vmlinux
Reading symbols from ../linux-3.18.6/vmlinux…done.
(gdb) remote target:1234
Undefined remote command: “target:1234”. Try “help remote”.
(gdb) target remote:1234
Remote debugging using :1234
0x0000000000000000 in irq_stack_union ()
(gdb) b sys_execve
Breakpoint 1 at 0xffffffff811626f0: file fs/exec.c, line 1604.
(gdb) b load_elf_binary
Breakpoint 2 at 0xffffffff811aa260: load_elf_binary. (2 locations)
(gdb) b start_thread
Breakpoint 3 at 0xffffffff810013b0: file arch/x86/kernel/process_64.c, line 249.




  1. 可运行程序的装载是一个系统调用。


  2. 新的程序仍然有同样的PID。而且继承了调用execve函数时已打开的全部的文件描写叙述符。

posted @ 2018-01-29 11:52  llguanli  阅读(142)  评论(0编辑  收藏  举报