博客园  :: 首页  :: 新随笔  :: 联系 :: 订阅 订阅  :: 管理

PE文件格式[1]

Posted on 2010-06-14 10:11  淡如水wp  阅读(520)  评论(0编辑  收藏  举报

Preface
-------

The PE ("portable executable") file format is the format of executable
binaries (DLLs and programs) for MS windows NT, windows 95 and
win32s; in windows NT, the drivers are in this format, too.
It can also be used for object files and libraries.

The format is designed by Microsoft and standardized by the TIS (tool
interface standard) Committee (Microsoft, Intel, Borland, Watcom, IBM
and others) in 1993, apparently based on a good knowledge of COFF, the
"common object file format" used for object files and executables on
several UNIXes and on VMS.

The win32 SDK includes a header file <winnt.h> containing #defines and
typedefs for the PE-format. I will mention the struct-member-names and
#defines as we go.

You may also find the DLL "imagehelp.dll" to be helpful. It is part of
windows NT, but documentation is scarce. Some of its functions are
described in the "Developer Network"

=============================================

前言

-------

PE(可执行)文件是一种微软的winNT,win95,win32平台上的二进制可执行文件。
在winNT上,驱动程序就是这种格式,它还能被用做obj文件或库。
这个格式是由微软设计,由TIS(工具接口标准)联盟(包括微软,intel,borland,watcom,ibm等等)在1993年标准化的。
显然我们需要知道COFF(通用obj文件格式)是用在unix和虚拟机上的obj谁的和可执行文件。

win32sdk有一个<winnt.h>的头文件,definde和typedef了一些pe格式,我将会一边提到结构体成员名称和define的东西。

你也可以找到一个imagehelp.dll,是winnt的一部分,但是没什么文档,一些函数都在msdn里可以找到。

 

General Layout
--------------

At the start of a PE file we find an MS-DOS executable ("stub"); this
makes any PE file a valid MS-DOS executable.

After the DOS-stub there is a 32-bit-signature with the magic number
0x00004550 (IMAGE_NT_SIGNATURE).

Then there is a file header (in the COFF-format) that tells on which
machine the binary is supposed to run, how many sections are in it, the
time it was linked, whether it is an executable or a DLL and so on. (The
difference between executable and DLL in this context is: a DLL can not
be started but only be used by another binary, and a binary cannot link
to an executable).

After that, we have an optional header (it is always there but still
called "optional" - COFF uses an "optional header" for libraries but not
for objects, that's why it is called "optional"). This tells us more
about how the binary should be loaded: The starting address, the amount
of stack to reserve, the size of the data segment etc..

An interesting part of the optional header is the trailing array of
'data directories'; these directories contain pointers to data in the
'sections'. If, for example, the binary has an export directory, you
will find a pointer to that directory in the array member
IMAGE_DIRECTORY_ENTRY_EXPORT, and it will point into one of the
sections.

Following the headers we find the 'sections', introduced by the 'section
headers'. Essentially, the sections' contents is what you really need to
execute a program, and all the header and directory stuff is just there
to help you find it.
Each section has some flags about alignment, what kind of data it
contains ("initialized data" and so on), whether it can be shared etc.,
and the data itself. Most, but not all, sections contain one or more
directories referenced through the entries of the optional header's
"data directory" array, like the directory of exported functions or the
directory of base relocations. Directoryless types of contents are, for
example, "executable code" or "initialized data".

=============================================

概览 
--------------

在PE文件开头有一个MS - DOS可执行标记(“STUB”),这使任何的PE文件可以在MS - DOS上执行。
STUB之后有一个32位的数字0x00004550(IMAGE_NT_SIGNATURE)。
然后有一个file-header (COFF格式),告诉我们这个PE支持哪些硬件平台,有多少section,link的时间,以及是否是可执行文件或DLL等。
可执行文件和DLL的区别是:一个DLL不能自启动,只能通过其他二进制文件调用,二进制文件不能链接成一个可执行文件)。
接着
有一个“optional header”
(这其实是一直都会有的,但仍叫做
“可选”,COFF用一个“可选的文件头”做为库而不是做为obj这就是为什么它被称为“可选”)。
告诉我们如何被加载这个二进制文件:起始地址,保留的堆栈数量,数据区的大小等。
“optional header”中有一块比较有意思的部分,叫做data directories的跟随数组;directories里包含了数据在sections里的指针。
举个例子,这个二进制文件有导出directories,你可以在数组里找到叫做IMAGE_DIRECTORY_ENTRY_EXPORT的数据指向这个导出directories的指针。

接下来是section headers。用来描述sections。
实际上在sections里才真正包含了执行程序的内容,通过文件头和目录里可以找到sections。

每个section有一些对齐标记,包括什么数据类型,是否共享等。
而数据本身,绝大多数(不是所有sections都有或多或少directories的引用。
比如导出函数的目录和重定向的目录,没目录的内容是可执行代码或初使化数据。

结构如下图所示。
    +-------------------+
    | DOS-stub             |
    +-------------------+
    | file-header            |
    +-------------------+
    | optional header     |
    |- - - - - - - - - -     |
    |                            |
    | data directories     |
    |                            |
    +-------------------+
    |                            |
    | section headers     |
    |                            |
    +-------------------+
    |                      |
    | section 1            |
    |                            |
    +-------------------+
    |                            |
    | section 2              |
    |                            |
    +-------------------+
    |                   |
    | ...               |
    |                   |
    +-------------------+
    |                   |
    | section n         |
    |                   |
    +-------------------+