Linux ELF格式分析【转】

转自:https://www.cnblogs.com/feng9exe/p/6899351.html

http://www.cnblogs.com/hzl6255/p/3312262.html

ELF, Executable and Linking Format, 是一种用于可执行文件、目标文件、共享库和核心转储的标准文件格式。  ELF格式是是UNIX系统实验室作为ABI(Application Binary Interface)而开发和发布的。

这里简单介绍一下相关历史:  
- UNIX:        最初采用的格式为a.out,之后被System V中的COFF取代,最后则被SVR4中的ELF格式所取代。  
- Windows:   采用的则是COFF格式的变种PE格式 
- MAC OS X: 采用的是Mach-O格式

ELF有四种不同的类型:  
1. 可重定位文件(Relocatable): 编译器和汇编器产生的.o文件,需要被Linker进一步处理  
2. 可执行文件(Executable): Have all relocation done and all symbol resolved except perhaps shared library symbols that must be resolved at run time  
3. 共享对象文件(Shared Object): 即动态库文件(.so)  
4. 核心转储文件(Core File): 

1.ELF文件结构 

可以从两个角度来描述ELF文件结构  
~1. Compilers,assemblers,linkers: 由Section header table描述的Sections组成  
~2. System loader: 由Program header table描述的Segments组成

ELF_struct

TIP:  
- A single segment usually consist of several sections.  
- Relocatable files have Section header tables. Executable files have Program header tables. Shared object files have both  
- Sections are intended for further processing by a linker, while the segments are intended to be mapped into memory  
- 只有ELF header是固定在文件的首部, 而Program header和Section header的位置则由ELF header指出

ELF数据表示: 六种数据类型(32-bit)

Name Size Alignment Purpose
Elf32_Addr 4 4 Unsigned program address
Elf32_Off 4 4 Unsigned file offset
Elf32_Half 2 2 Unsigned medium interger
Elf32_Word 4 4 unsigned interger
Elf32_Sword 4 4 Signed interger
unsigned char 1 1 Unsigned small interger

@1: 

ELF header: 在文件开始处,描述了整个文件的组织,占用 52-bytes

#define EI_NIDENT (16)
typedef struct
{
  unsigned char e_ident[EI_NIDENT];   /* Magic number and other info */
  Elf32_Half    e_type;               /* Object file type */
  Elf32_Half    e_machine;            /* Architecture */
  Elf32_Word    e_version;            /* Object file version */
  Elf32_Addr    e_entry;              /* Entry point virtual address */
  Elf32_Off     e_phoff;              /* Program header table file offset */
  Elf32_Off     e_shoff;              /* Section header table file offset */
  Elf32_Word    e_flags;              /* Processor-specific flags */
  Elf32_Half    e_ehsize;             /* ELF header size in bytes */
  Elf32_Half    e_phentsize;          /* Program header table entry size */
  Elf32_Half    e_phnum;              /* Program header table entry count */
  Elf32_Half    e_shentsize;          /* Section header table entry size */
  Elf32_Half    e_shnum;              /* Section header table entry count */
  Elf32_Half    e_shstrndx;           /* Section header string table index */
} Elf32_Ehdr;

我们来看看一个最基本的ELF header

[root@bogon ~]# readelf -h a.out 
ELF Header:
  Magic:   7f 45 4c 46 01 01 01 00 00 00 00 00 00 00 00 00 
  Class:                             ELF32
  Data:                              2's complement, little endian
  Version:                           1 (current)
  OS/ABI:                            UNIX - System V
  ABI Version:                       0
  Type:                              EXEC (Executable file)
  Machine:                           Intel 80386
  Version:                           0x1
  Entry point address:               0x80482a0                 /* e_entry */
  Start of program headers:          52 (bytes into file)      /* e_phoff */
  Start of section headers:          1992 (bytes into file)    /* e_shoff: See Starting address of section headers */
  Flags:                             0x0
  Size of this header:               52 (bytes)                /* e_ehsize */
  Size of program headers:           32 (bytes)                /* e_phentsize */
  Number of program headers:         8                         /* e_phnum */
  Size of section headers:           40 (bytes)                /* e_shentsize */
  Number of section headers:         29                        /* e_shnum */
  Section header string table index: 26                        /* e_shstrndx */

从elf header我们可以得到如下信息?

@2:

section header:  包含section的信息。

每个section header占 40-bytes (即e_shentsize大小)

/* Section header.  */
typedef struct
{
  elf32_word    sh_name;        /* Section name (string tbl index) */
  elf32_word    sh_type;        /* Section type */
  elf32_word    sh_flags;       /* Section flags */
  elf32_addr    sh_addr;        /* Section virtual addr at execution */
  elf32_off     sh_offset;      /* Section file offset */
  elf32_word    sh_size;        /* Section size in bytes */
  elf32_word    sh_link;        /* Link to another section */
  elf32_word    sh_info;        /* Additional section information */
  elf32_word    sh_addralign;   /* Section alignment */
  elf32_word    sh_entsize;     /* Entry size if section holds table */
} elf32_shdr;

Section Type(*sh_type*) 

PROGBITS:           This holds program contents including code, data, and debugger information. 
NOBITS:             Like PROGBITS. However, it occupies no space. 
SYMTAB and DYNSYM:  These hold symbol table.                              [See below]
STRTAB:             This is a string table, like the one used in a.out.   [See below]
REL and RELA:       These hold relocation information. 
DYNAMIC and HASH:   This holds information related to dynamic linking. 

下面列举了一些常见的Section:

.text:  (PROGBITS:ALLOC+EXECINSTR)
     可执行代码
.data:  (PROGBITS:ALLOC+WRITE)
     初始化数据
.rodata:(PROGBITS:ALLOC)
     只读数据
.bss:   (NOBITS:ALLOC+WRITE)
     未初始化数据,运行时会置0
.rel.text, .rel.data, and .rel.rodata:(REL)
     静态链接的重定位信息
.rel.plt: (REL)
     The list of elements in the PLT, which are liable to the relocatio during the dynamic linking(if PLT is used)
.rel.dyn: (REL)
     The relocation for dynamically linked functions(if PLT is not used)     
.symtab: 
符号表 .strtab:
字符串表 .shstrtab:
Section String Table, 段名表 .init, .fini: (PROGBITS:ALLOC+EXECINSTR)
程序初始化与终结代码段 .interp: (PROGBITS:ALLOC)
This section holds the pathname of a program interpreter.For present,this is used to run the run-time dynamic linker to load the program and to link in any required shared libraries. .got, .plt: (PROGBIT)
动态链接的跳转表和全局入口表.

TIP: 符号表(symtab)和字符串表(strtab)的区别 
strtab就是记录ELF文件中的字符串常量,变量名等等 
symtab记录的则是函数和变量(符号), 主要用于链接时目标文件之间对地址的引用

下面是基本的Section header tables [0x7c8 = 1992]

[root@bogon ~]# readelf -s a.out 
there are 29 section headers, starting at offset 0x7c8:
section headers:
  [nr] name              type            addr     off    size   es flg lk inf al
  [ 0]                   null            00000000 000000 000000 00      0   0  0
  [ 1] .interp           progbits        08048134 000134 000013 00   a  0   0  1
  [ 2] .note.abi-tag     note            08048148 000148 000020 00   a  0   0  4
  [ 3] .hash             hash            08048168 000168 000024 04   a  4   0  4
  [ 4] .dynsym           dynsym          0804818c 00018c 000040 10   a  5   1  4
  [ 5] .dynstr           strtab          080481cc 0001cc 000045 00   a  0   0  1
  [ 6] .gnu.version      versym          08048212 000212 000008 02   a  4   0  2
  [ 7] .gnu.version_r    verneed         0804821c 00021c 000020 00   a  5   1  4
  [ 8] .rel.dyn          rel             0804823c 00023c 000008 08   a  4   0  4
  [ 9] .rel.plt          rel             08048244 000244 000010 08   a  4  11  4
  [10] .init             progbits        08048254 000254 000017 00  ax  0   0  4
  [11] .plt              progbits        0804826c 00026c 000030 04  ax  0   0  4
  [12] .text             progbits        080482a0 0002a0 000198 00  ax  0   0 16
  [13] .fini             progbits        08048438 000438 00001c 00  ax  0   0  4
  [14] .rodata           progbits        08048454 000454 00000c 00   a  0   0  4
  [15] .eh_frame_hdr     progbits        08048460 000460 00001c 00   a  0   0  4
  [16] .eh_frame         progbits        0804847c 00047c 000058 00   a  0   0  4
  [17] .ctors            progbits        080494d4 0004d4 000008 00  wa  0   0  4
  [18] .dtors            progbits        080494dc 0004dc 000008 00  wa  0   0  4
  [19] .jcr              progbits        080494e4 0004e4 000004 00  wa  0   0  4
  [20] .dynamic          dynamic         080494e8 0004e8 0000c8 08  wa  5   0  4
  [21] .got              progbits        080495b0 0005b0 000004 04  wa  0   0  4
  [22] .got.plt          progbits        080495b4 0005b4 000014 04  wa  0   0  4
  [23] .data             progbits        080495c8 0005c8 000004 00  wa  0   0  4
  [24] .bss              nobits          080495cc 0005cc 000008 00  wa  0   0  4
  [25] .comment          progbits        00000000 0005cc 000114 00      0   0  1
  [26] .shstrtab         strtab          00000000 0006e0 0000e5 00      0   0  1
  [27] .symtab           symtab          00000000 000c50 000440 10     28  49  4
  [28] .strtab           strtab          00000000 001090 000249 00      0   0  1
key to flags:
  w (write), a (alloc), x (execute), m (merge), s (strings)
  i (info), l (link order), g (group), x (unknown)
  o (extra os processing required) o (os specific), p (processor specific)

string table:

这里的string是以null结尾的字符序列,用来表示Symbol和Section的名称,用索引来引用该字符串 
对于Section string[.shstrtab] , ELF Header中的成员变量e_shstrndx则指明了所在Section, 
索引则保存在每个Elf32_Shdr的sh_name中

SeeMore

symbol table: 

定位和重定位程序的符号定义和引用

SeeMore

Relocation table:

SeeMore 

@3: 

Program header: 指出怎样创建进程映像,含有每个program header的入口

每个Program segment Header占 32-bytes(即e_phentsize大小)

typedef struct
{
  Elf32_Word    p_type;        /* Segment type */
  Elf32_Off     p_offset;      /* Segment file offset */
  Elf32_Addr    p_vaddr;       /* Segment virtual address */
  Elf32_Addr    p_paddr;       /* Segment physical address */
  Elf32_Word    p_filesz;      /* Segment size in file */
  Elf32_Word    p_memsz;       /* Segment size in memory */
  Elf32_Word    p_flags;       /* Segment flags */
  Elf32_Word    p_align;       /* Segment alignment */
} Elf32_Phdr;

Type of segment(*p_type*)

PT_PHDR:    Specifies the location and size of the program header table itself, both in the file and in the memory image of the program.
PT_LOAD:    This segment is a loadable segment.
PT_DYNAMIC: This array element specifies dynamic linking information.
PT_INTERP:  This element specified the location and size of a null-terminated path name to invoke as an interpreter.

下面是Program header实例

[root@bogon ~]# readelf -l a.out 
Elf file type is EXEC (Executable file)
Entry point 0x80482a0
There are 8 program headers, starting at offset 52
Program Headers:
  Type           Offset   VirtAddr   PhysAddr   FileSiz MemSiz  Flg Align
  PHDR           0x000034 0x08048034 0x08048034 0x00100 0x00100 R E 0x4
  INTERP         0x000134 0x08048134 0x08048134 0x00013 0x00013 R   0x1
      [Requesting program interpreter: /lib/ld-linux.so.2]
  LOAD           0x000000 0x08048000 0x08048000 0x004d4 0x004d4 R E 0x1000
  LOAD           0x0004d4 0x080494d4 0x080494d4 0x000f8 0x00100 RW  0x1000
  DYNAMIC        0x0004e8 0x080494e8 0x080494e8 0x000c8 0x000c8 RW  0x4
  NOTE           0x000148 0x08048148 0x08048148 0x00020 0x00020 R   0x4
  GNU_EH_FRAME   0x000460 0x08048460 0x08048460 0x0001c 0x0001c R   0x4
  GNU_STACK      0x000000 0x00000000 0x00000000 0x00000 0x00000 RW  0x4
 Section to Segment mapping:
  Segment Sections...
   00     
   01     .interp 
   02     .interp .note.ABI-tag .hash .dynsym .dynstr .gnu.version .gnu.version_r .rel.dyn .rel.plt .init .plt .text .fini .rodata .eh_frame_hdr .eh_frame 
   03     .ctors .dtors .jcr .dynamic .got .got.plt .data .bss 
   04     .dynamic 
   05     .note.ABI-tag 
   06     .eh_frame_hdr 
   07

@4:

Section: 提供了目标文件的各项信息(如指令、数据、符号表、重定位信息等)

2. ELF文件分析

很多工具可以用来分析ELF文件

除了上面的readelf外,还有objdump,objcopy等   

# objdump -x /bin/ls                         # 查看ELF文件的section
# objdump -j .data -s /bin/ls                # 显示指定section内容
#
# objcopy -O binary -j .text a.out text.bin  # 将.text section导入到text.bin文件中

完整的分析教程:  <Linux C编程一站式学习-ELF文件>

3. ELF文件解析

很多地方有对ELF文件的解析 Linux对ELF文件的加载: 

execve() –> sys_execve() –> do_execve() –> search_binary_handler() -elf-> load_elf_binary()/load_elf_library()

binutils中readelf很形象的解析了ELF文件

开源项目ELFToolChain

atratus/coLinux/LINE: 其中的ELF Loader值得参考

4. 参考文档

RefSpes:   Linux Foundation Referenced Specifications

SysV ABI:  System V ABI

ELF规范:    Executable and Linking Format Specification V1.2

ELF格式:    ELF Format

PE格式:     PE Format

------------------越是喧嚣的世界,越需要宁静的思考------------------ 合抱之木,生于毫末;九层之台,起于垒土;千里之行,始于足下。 积土成山,风雨兴焉;积水成渊,蛟龙生焉;积善成德,而神明自得,圣心备焉。故不积跬步,无以至千里;不积小流,无以成江海。骐骥一跃,不能十步;驽马十驾,功在不舍。锲而舍之,朽木不折;锲而不舍,金石可镂。蚓无爪牙之利,筋骨之强,上食埃土,下饮黄泉,用心一也。蟹六跪而二螯,非蛇鳝之穴无可寄托者,用心躁也。
posted @ 2022-02-04 16:37  Sky&Zhang  阅读(166)  评论(0编辑  收藏  举报