第七章 链接 (linking)

# Wed 27 Dec 18:57:00 GMT 2017

------------------------------------
Part II running programs on a system
------------------------------------

第七章 链接 (linking)

7.1 编译器驱动程序

编译驱动程序代表用户在需要时间调用语言预处理器,
编译器,汇编器和连接器。生成 .i, .s, .o及可执行程序。

当运行程序时,shell调用系统的加载器(loader)函数,
把执行文件的代码和数据复制到内存,然后将控制转移到程序
的开头。

7.2 静态链接

链接器两个主要任务:

+ 符号解析(symbol resolution).
+ 重定位(relocation)。

7.3 目标文件

有三个形式:

+ 可重定位目标文件
+ 可执行目标文件。
+ 共享目标文件: 特殊的可重定向目标文件,可以在加载
或运行时被动态的加载进内存并链接。

unix and linux use ELF(executable,linkable format)
windows: PE (portable executable)
MacOS-X: Mach-O format.

7.4 可重定位目标文件

ELF header
.text 以编译程序的机器代码
.rodata 只读数据:如跳转表
.data 以初始化的全局和静态c变量
.bss 未初始化的全局和静态c变量
.symtab 存放程序中定义和引用的函数和全局变量的信息
.rel.text 调用外部函数或引用全局变量的指令都需要修改
.rel.data In general, any initiallized global vari-
able whose initial value is the address
of a global variable or externally defined
function will need to be modified.
.debug
.line
.strtab

7.5 符号和符号表

包含定义和引用的符号的信息:

1. 用模块m本身定义并能被其他模块引用的全局符号。即
非静态的c函数和全局变量。
2. 其他模块定义并被m快本身引用的全局符号。即外部符号,
对应于其他模块定义的非静态c函数和全局变量。
3. 只被模块m定义和引用的局部符号。即带static属性的c
函数和全局变量。这些符号只在本模块中可见。

There are three special pseudosections that don't have
entries in the 'section header table':

+ ABS is for symbols that should not be reloated.
+ UNDEF is for undefined symbols: symbols that are
referenced in this object module but defined else-
where.
+ COMMON is for uninitialized data objects that are
not yet allocated.

Note: the pseudosections exist only in relocatable
object files.

7.6 符号解析

The linker resolves symbol references by associating
each reference with exactly one symbol definition from the
symbol tables of its input relocateable object files.

7.6.1

At comiling time, the compiler exorts each global
symbol to the assembler as either 'strong' or 'weak'.

Functions and initialized global variables get strong
symbols. Uninitialized global varibles get weak symbols.

And then, Linux linkers use the following rules for
dealing with duplicate symbol names:

Rule 1. Multiple strong symbols with same name are
not allowed.
Rule 2. same name for strong and weak symbols, choose
strong symbols.
Rule 3. same name for all weak symbols, choose any of
the weak symbols.

7.6.2 static libraries

static libraries are stored on disk in a particular
file format known as an 'archive'.

An archive is a collection of concatenated relocatable
object files, with a header that describes the size and
location of each member object file.

Archive filenames are denoted with the .a suffix.

To create a static library of some functions:


ar rcs libsome.a some.o any.o

7.7 Relocation

Relocation consists of two steps:

1. Relocating sections and symbol definitions.
2. Relocating symbol references within sections.

7.7.1 Relocation Enties

Relocation entries for code and data are placed in
.rel.tex and .rel.data respectively.

7.8 executable object files

For any segment s, the linker must choose a starting
address,vaddr, such that:

'vaddr mod align = off mod align' //off: first offset

This alignment requirement is an optimazation that
enables segments in the object file to be transferred
efficiently to memory when the program executes.

7.9 Loading Executable Object Files

每个linux程序都有一个运行时内存映像。在linux x86-64
系统中,代码段总是从地址0x400000处开始的,后面是数据段。
运行时堆在数据段之后,通过调用malloc库向上增长。堆后面的区域是为共享模块保留的。用户的栈总是从最大的合法用户地址:
2^48 - 1 开始,向较小内存地址增长。

栈上的区域,从地址2^48开始,是为内核准备的。所谓内核
就是操作系统驻留在内存的部分。

7.10 Dynamic Linking with Shared Libraries

A shared library is an object module that, at either
run time or load time, can be loaded at an arbitrary
memory address and linked with a program in memory.

This process is known as 'dynamic linking' and is
performed by a program called a 'dynamic linker'.

shared libraries are also referred to as shared
objects, and on Linux systems they are indicated by the
'.so' suffix. Microsoft operating systems make heavy use
of shared libraries, which they refer to as 'DLLs'
(dynamic link libraries).

Shared libraries are shared in two ways:

1. there is exactly one .so file for a particular
library. the code and data in this .so file are
shared by all of the executable object files.
2. A single copy of the .text section of a shared
library in memory can be shared by different
running processes.

To build a shared library libvector.so :

$: gcc -shared -fpic -o libvector.so addvec.c multvec.c

Once we have created the library,we would then link:

$: gcc -o prog21 main2.c ./libvector.so

When the loader loads and runs the prog21, the loader
will notice the prog21 contains '.interp' section, which
contains the path name of the dynamic linker,which is
itself a shared object(e.g. ld-linux.so on linux systems)
. The dynamic linker then finishes the linking task by:

+ relocating the text and data of libc.so into some
memory segment
+ relocating the text and data of libvector.so into
another memory segment
+ relocating any references in prog21 to symbols
defined by libc.so and libvector.so

Finalliy, the dynamic linker passes control to the
application(prog21).

From this point on, the locations of the shared
libraries are fixed and do not change during execution
of the program.


Linux systems provide a simple interface to the
dynamic linker that allows apllication programs to load
and link shared libraries at run time.

#include <dlfcn.h>
void *dlopen(const char *filename, int flag);
returns: pointer to handle if ok,NULL on error

#include <dlfcn.h>
void *dlsym(void *handle,char *symbol);
returns: pointer to symbol if ok, NULL on error

#include <dlfcn.h>
int dlclose(void *handle);
returns: 0 if OK, -1 on error

#include <dlfcn.h>
const char *dlerror(void);
returns: error message if previous call to dlopen
,dlsym,or dlclose failed;
NULL if previous call was OK

$: gcc -rdynamic -o a.out dll.c -ldl

note: -rdynamic: make dll.c global symbols are also
available for symbol resolution.
-ldl: short for libdl.so

PIC Data Refernces

No matter where we load an object module(including
shared object modules) in memory, the data segment is
always the same distance from the code segment.

GOT (global offset table) at the beginning of the
data segment. The GOT contains an 8-byte entry for each
gobal data object (procedure or global variable) that is
referenced by the object module.


7.12 位置无关代码( position-independent code, PIC)

用户对gcc使用-fpic选项生成。共享库的编译必须使用此选项


PIC Function Calls

7.13 library interpositioning

7.13.1 compile-time interpositioning

gcc -DCOMPILETIME -c mymalloc.c
gcc -I. -o intc int.c mymalloc.o

7.13.2 Link-time interpositioning

The Linux static linker supports link-time interposi-
tioning with the --wrap f flag. This flag tells the linker
to resolve references to symbol f as __wrap_f, and to
resolve references to symbol __real_f as f.

gcc -DLINKTIME -c mymalloc.c
gcc -c int.c

gcc -Wl,--wrap,malloc -wl,--wrap,free -o int1 int.o
mymalloc.o

7.13.3 run-time interpositioning

This mechanism is based on the dynamic linker's LD_
PRELOAD environment variable.

gcc -DRUNTIME -shared -fpic -o mymalloc.so mymalloc.c
-ldl

gcc -o intr int.c

linux> LD_PRELOAD="./mymalloc.so" ./intr # execute

7.14 tools for manipulating object files

the GNU binutils package:

ar strings: list all of printable string in obj.
strip nm: list symbols defined in symbol table.
size: list name and size of sections in obj.
readelf objdump

LDD program for manipulating shared libraries.

posted @ 2018-03-28 23:26  孤灯下的守护者  阅读(469)  评论(0编辑  收藏  举报