《Peering Inside the PE: A Tour of the Win32 Portable Executable File Format》阅读笔记一
Posted on 2018-11-09 17:17 Ady Lee 阅读(203) 评论(0) 编辑 收藏 举报---恢复内容开始---
The format of an operating system's executable file is in many ways a mirror of the operating system.
Winnt.h是一个非常重要的头文件,其中定义了大部分windows下的内部结构。
The PE format is documented (in the loosest sense of the word) in the WINNT.H header file.About midway through WINNT.H is a section titled "Image Format." This section starts out with small tidbits from the old familiar MS-DOS MZ format and NE format headers before moving into the newer PE information. WINNT.H provides definitions of the raw data structures used by PE files, but contains only a few useful comments to make sense of what the structures and flags mean. Whoever wrote the header file for the PE format (the name Michael J. O'Leary keeps popping up) is certainly a believer in long, descriptive names, along with deeply nested structures and macros. When coding with WINNT.H, it's not uncommon to have expressions like this:
文章作者Matt Pietrek写了一个分析PE格式的程序,开源的,代码地址是:
http://github.com/zed-0xff/pedump
而且,你也可以上传PE文件到网站http://pedump.me/,进行在线的PE格式分析。
在线分析结果如下:
imports:
Let's go over a few fundamental ideas that permeate the design of a PE file (see Figure 1). I'll use the term "module" to mean the code, data, and resources of an executable file or DLL that have been loaded into memory.
The first important thing to know about PE files is that the executable file on disk is very similar to what the module will look like after Windows has loaded it. The Windows loader doesn't need to work extremely hard to create a process from the disk file. The loader uses the memory-mapped file mechanism to map the appropriate pieces of the file into the virtual address space.(exe文件在磁盘上的存储格式和被加载到内存之后的存储格式很相似,所以使用一种memory-mapped机制将exe映射到虚拟地址空间即可)
For Win32, all the memory used by the module for code, data, resources, import tables, export tables, and other required module data structures is in one contiguous block of memory. All you need to know in this situation is where the loader mapped the file into memory. You can easily find all the various pieces of the module by following pointers that are stored as part of the image.
Another idea you should be acquainted with is the Relative Virtual Address (RVA). Many fields in PE files are specified in terms of RVAs. An RVA is simply the offset of some item, relative to where the file is memory-mapped. For example, let's say the loader maps a PE file into memory starting at address 0x10000 in the virtual address space. If a certain table in the image starts at address 0x10464, then the table's RVA is 0x464.
To convert an RVA into a usable pointer, simply add the RVA to the base address of the module. The base address is the starting address of a memory-mapped EXE or DLL and is an important concept in Win32. For the sake of convenience, Windows NT and Windows 95 uses the base address of a module as the module's instance handle (HINSTANCE).What's important for Win32 is that you can call GetModuleHandle for any DLL that your process uses to get a pointer for accessing the module's components.(基址加上RVA就得到可用地址,而基址就是DLL或EXE文件的module handle,可以通过GetModuleHandle获取)
The final concept that you need to know about PE files is sections.Unlike segments, sections are blocks of contiguous memory with no size constraints. Some sections contain code or data that your program declared and uses directly, while other data sections are created for you by the linker and librarian, and contain information vital to the operating system. In some descriptions of the PE format, sections are also referred to as objects. The term object has so many overloaded meanings that I'll stick to calling the code and data areas sections.
The PE Header
Like all other executable file formats, the PE file has a collection of fields at a known (or easy to find) location that define what the rest of the file looks like. This header contains information such as the locations and sizes of the code and data areas, what operating system the file is intended for, the initial stack size, and other vital pieces of information.As with other executable formats from Microsoft, this main header isn't at the very beginning of the file. The first few hundred bytes of the typical PE file are taken up by the MS-DOS stub. This stub is a tiny program that prints out something to the effect of "This program cannot be run in MS-DOS mode."
As in other Microsoft executable formats, you find the real header by looking up its starting offset, which is stored in the MS-DOS stub header. The WINNT.H file includes a structure definition for the MS-DOS stub header that makes it very easy to look up where the PE header starts. The e_lfanew field is a relative offset (or RVA, if you prefer) to the actual PE header. To get a pointer to the PE header in memory, just add that field's value to the image base:
Once you have a pointer to the main PE header, the fun can begin. The main PE header is a structure of type IMAGE_NT_HEADERS, which is defined in WINNT.H. This structure is composed of a DWORD and two substructures and is laid out as follows(PE头是一个IMAGE_NT_HEADERS结构类型):
直观点看就是这样的:
再详细点就是这样:
The Signature field viewed as ASCII text is "PE\0\0".
Following the PE signature DWORD in the PE header is a structure of type IMAGE_FILE_HEADER. The fields of this structure contain only the most basic information about the file.
IMAGE_FILE_HEADER Fields
WORD Machine
- The CPU that this file is intended for. The following CPU IDs are defined:
WORD NumberOfSections
- The number of sections in the file.
DWORD TimeDateStamp
- The time that the linker (or compiler for an OBJ file) produced this file. This field holds the number of seconds since December 31st, 1969, at 4:00 P.M.(以前思考过怎么获得exe文件的时间戳,原来PE文件中可以提取)
DWORD PointerToSymbolTable
- The file offset of the COFF symbol table. This field is only used in OBJ files and PE files with COFF debug information. PE files support multiple debug formats, so debuggers should refer to the IMAGE_DIRECTORY_ENTRY_DEBUG entry in the data directory (defined later).
DWORD NumberOfSymbols
- The number of symbols in the COFF symbol table. See above.
WORD SizeOfOptionalHeader
- The size of an optional header that can follow this structure. In OBJs, the field is 0. In executables, it is the size of the IMAGE_OPTIONAL_HEADER structure that follows this structure.
WORD Characteristics
- Flags with information about the file. Some important fields:
Other fields are defined in WINNT.H
The third component of the PE header is a structure of type IMAGE_OPTIONAL_HEADER. For PE files, this portion certainly isn't optional.
The COFF format allows individual implementations to define a structure of additional information beyond the standard IMAGE_FILE_HEADER. The fields in the IMAGE_OPTIONAL_HEADER are what the PE designers felt was critical information beyond the basic information in the IMAGE_FILE_HEADER.
All of the fields of the IMAGE_OPTIONAL_HEADER aren't necessarily important to know about . The more important ones to be aware of are the ImageBase and the Subsystem fields. You can skim or skip the description of the fields.
IMAGE_OPTIONAL_HEADER Fields
WORD Magic
- Appears to be a signature WORD of some sort. Always appears to be set to 0x010B.
BYTE MajorLinkerVersion
BYTE MinorLinkerVersion
- The version of the linker that produced this file. The numbers should be displayed as decimal values, rather than as hex. A typical linker version is 2.23.
DWORD SizeOfCode
- The combined and rounded-up size of all the code sections. Usually, most files only have one code section, so this field matches the size of the .text section.
DWORD SizeOfInitializedData
- This is supposedly the total size of all the sections that are composed of initialized data (not including code segments.) However, it doesn't seem to be consistent with what appears in the file.
DWORD SizeOfUninitializedData
- The size of the sections that the loader commits space for in the virtual address space, but that don't take up any space in the disk file. These sections don't need to have specific values at program startup, hence the term uninitialized data. Uninitialized data usually goes into a section called .bss.
DWORD AddressOfEntryPoint
- The address where the loader will begin execution. This is an RVA, and usually can usually be found in the .text section.
DWORD BaseOfCode
- The RVA where the file's code sections begin. The code sections typically come before the data sections and after the PE header in memory. This RVA is usually 0x1000 in Microsoft Linker-produced EXEs. Borland's TLINK32 looks like it adds the image base to the RVA of the first code section and stores the result in this field.
DWORD BaseOfData
- The RVA where the file's data sections begin. The data sections typically come last in memory, after the PE header and the code sections.
DWORD ImageBase
- When the linker creates an executable, it assumes that the file will be memory-mapped to a specific location in memory. That address is stored in this field, assuming a load address allows linker optimizations to take place. If the file really is memory-mapped to that address by the loader, the code doesn't need any patching before it can be run. In executables produced for Windows NT, the default image base is 0x10000. For DLLs, the default is 0x400000. In Windows 95, the address 0x10000 can't be used to load 32-bit EXEs because it lies within a linear address region shared by all processes. Because of this, Microsoft has changed the default base address for Win32 executables to 0x400000. Older programs that were linked assuming a base address of 0x10000 will take longer to load under Windows 95 because the loader needs to apply the base relocations.
DWORD SectionAlignment
- When mapped into memory, each section is guaranteed to start at a virtual address that's a multiple of this value. For paging purposes, the default section alignment is 0x1000.
DWORD FileAlignment
- In the PE file, the raw data that comprises each section is guaranteed to start at a multiple of this value. The default value is 0x200 bytes, probably to ensure that sections always start at the beginning of a disk sector (which are also 0x200 bytes in length). This field is equivalent to the segment/resource alignment size in NE files. Unlike NE files, PE files typically don't have hundreds of sections, so the space wasted by aligning the file sections is almost always very small.
(
SectionAlignment
是在内存中的对齐单位,FileAlignment
是PE格式中的对齐单位,所以从上图中也可以看出来,磁盘中的PE和内存中的PE的大小是不同的)WORD MajorOperatingSystemVersion
WORD MinorOperatingSystemVersion
- The minimum version of the operating system required to use this executable. This field is somewhat ambiguous since the subsystem fields (a few fields later) appear to serve a similar purpose. This field defaults to 1.0 in all Win32 EXEs to date.
WORD MajorImageVersion
WORD MinorImageVersion
- A user-definable field. This allows you to have different versions of an EXE or DLL. You set these fields via the linker /VERSION switch. For example, "LINK /VERSION:2.0 myobj.obj".
WORD MajorSubsystemVersion
WORD MinorSubsystemVersion
- Contains the minimum subsystem version required to run the executable. A typical value for this field is 3.10 (meaning Windows NT 3.1).
DWORD Reserved1
- Seems to always be 0.
DWORD SizeOfImage
- This appears to be the total size of the portions of the image that the loader has to worry about. It is the size of the region starting at the image base up to the end of the last section. The end of the last section is rounded up to the nearest multiple of the section alignment.
DWORD SizeOfHeaders
- The size of the PE header and the section (object) table. The raw data for the sections starts immediately after all the header components.
DWORD CheckSum
- Supposedly a CRC checksum of the file. As in other Microsoft executable formats, this field is ignored and set to 0. The one exception to this rule is for trusted services and these EXEs must have a valid checksum.
WORD Subsystem
- The type of subsystem that this executable uses for its user interface. WINNT.H defines the following values:
WORD DllCharacteristics
- A set of flags indicating under which circumstances a DLL's initialization function (such as DllMain) will be called. This value appears to always be set to 0, yet the operating system still calls the DLL initialization function for all four events.
DWORD SizeOfStackReserve
- The amount of virtual memory to reserve for the initial thread's stack. Not all of this memory is committed, however (see the next field). This field defaults to 0x100000 (1MB). If you specify 0 as the stack size to CreateThread, the resulting thread will also have a stack of this same size.
DWORD SizeOfStackCommit
- The amount of memory initially committed for the initial thread's stack. This field defaults to 0x1000 bytes (1 page) for the Microsoft Linker while TLINK32 makes it two pages.
DWORD SizeOfHeapReserve
- The amount of virtual memory to reserve for the initial process heap. This heap's handle can be obtained by calling GetProcessHeap. Not all of this memory is committed (see the next field).
DWORD SizeOfHeapCommit
- The amount of memory initially committed in the process heap. The default is one page.
DWORD LoaderFlags
- From WINNT.H, these appear to be fields related to debugging support. I've never seen an executable with either of these bits enabled, nor is it clear how to get the linker to set them.
DWORD NumberOfRvaAndSizes
- The number of entries in the DataDirectory array (below).
IMAGE_DATA_DIRECTORY DataDirectory[IMAGE_NUMBEROF_DIRECTORY_ENTRIES]
- An array of IMAGE_DATA_DIRECTORY structures. The initial array elements contain the starting RVA and sizes of important portions of the executable file. Some elements at the end of the array are currently unused. The first element of the array is always the address and size of the exported function table (if present). The second array entry is the address and size of the imported function table, and so on. For a complete list of defined array entries, see the IMAGE_DIRECTORY_ENTRY_XXX #defines in WINNT.H. This array allows the loader to quickly find a particular section of the image , without needing to iterate through each of the images sections, comparing names as it goes along. Most array entries describe an entire section's data. However, the IMAGE_DIRECTORY_ENTRY_DEBUG element only encompasses a small portion of the bytes in the .rdata section.
The Section Table
Between the PE header and the raw data for the image's sections lies the section table. The section table is essentially a phone book containing information about each section in the image. The sections in the image are sorted by their starting address (RVAs), rather than alphabetically.
Before I discuss specific sections, I need to describe the data that the operating system manages the sections with. Immediately following the PE header in memory is an array of IMAGE_SECTION_HEADERs. The number of elements in this array is given in the PE header (the IMAGE_NT_HEADER.FileHeader.NumberOfSections field).
Section table from exe file:
Section table from obj file
IMAGE_SECTION_HEADER Formats
BYTE Name[IMAGE_SIZEOF_SHORT_NAME]
- This is an 8-byte ANSI name (not UNICODE) that names the section. Most section names start with a . (such as ".text"), but this is not a requirement, as some PE documentation would have you believe.
union {
DWORD PhysicalAddress
DWORD VirtualSize
} Misc;
- This field has different meanings, in EXEs or OBJs. In an EXE, it holds the actual size of the code or data. This is the size before rounding up to the nearest file alignment multiple. The SizeOfRawData field (seems a bit of a misnomer) later on in the structure holds the rounded up value.(VirtualSize和SizeOfRawData的区别就在于是否rounded up).For OBJ files, this field indicates the physical address of the section. The first section starts at address 0. To find the physical address in an OBJ file of the next section, add the SizeOfRawData value to the physical address of the current section.
DWORD VirtualAddress
- In EXEs, this field holds the RVA to where the loader should map the section. To calculate the real starting address of a given section in memory, add the base address of the image to the section's VirtualAddress stored in this field. With Microsoft tools, the first section defaults to an RVA of 0x1000. In OBJs, this field is meaningless and is set to 0.
DWORD SizeOfRawData
- In EXEs, this field contains the size of the section after it's been rounded up to the file alignment size. For example, assume a file alignment size of 0x200. If the VirtualSize field from above says that the section is 0x35A bytes in length, this field will say that the section is 0x400 bytes long(此例体现了SizeOfRawData和VirtualSize的区别). In OBJs, this field contains the exact size of the section emitted by the compiler or assembler. In other words, for OBJs, it's equivalent to the VirtualSize field in EXEs.
DWORD PointerToRawData(指该节在文件中的偏移,而VirtualAddress则是在内存中的偏移)
- This is the file-based offset of where the raw data emitted by the compiler or assembler can be found. If your program memory maps a PE or COFF file itself (rather than letting the operating system load it), this field is more important than the VirtualAddress field. You'll have a completely linear file mapping in this situation, so you'll find the data for the sections at this offset, rather than at the RVA specified in the VirtualAddress field.
DWORD PointerToRelocations
- In OBJs, this is the file-based offset to the relocation information for this section. The relocation information for each OBJ section immediately follows the raw data for that section. In EXEs, this field (and the subsequent field) are meaningless, and set to 0. When the linker creates the EXE, it resolves most of the fixups, leaving only base address relocations and imported functions to be resolved at load time. The information about base relocations and imported functions is kept in their own sections, so there's no need for an EXE to have per-section relocation data following the raw section data.
DWORD PointerToRelocations(可以去参看重定位表的定义)
- In OBJs, this is the file-based offset to the relocation information for this section. The relocation information for each OBJ section immediately follows the raw data for that section. In EXEs, this field (and the subsequent field) are meaningless, and set to 0. When the linker creates the EXE, it resolves most of the fixups, leaving only base address relocations and imported functions to be resolved at load time. The information about base relocations and imported functions is kept in their own sections, so there's no need for an EXE to have per-section relocation data following the raw section data.
DWORD PointerToLinenumbers
- This is the file-based offset of the line number table. A line number table correlates source file line numbers to the addresses of the code generated for a given line. In modern debug formats like the CodeView format, line number information is stored as part of the debug information. In the COFF debug format, however, the line number information is stored separately from the symbolic name/type information. Usually, only code sections (such as .text) have line numbers. In EXE files, the line numbers are collected towards the end of the file, after the raw data for the sections. In OBJ files, the line number table for a section comes after the raw section data and the relocation table for that section.
WORD NumberOfRelocations
- The number of relocations in the relocation table for this section (the PointerToRelocations field from above). This field seems relevant only for OBJ files.
WORD NumberOfLinenumbers
- The number of line numbers in the line number table for this section (the PointerToLinenumbers field from above).
DWORD Characteristics
- What most programmers call flags, the COFF/PE format calls characteristics. This field is a set of flags that indicate the section's attributes (such as code/data, readable, or writeable,). For a complete list of all possible section attributes, see the IMAGE_SCN_XXX_XXX #defines in WINNT.H.
参考:
https://msdn.microsoft.com/en-us/library/ms809762.aspx
RVA、VA、Imagebase之间的关系:
http://blog.csdn.net/fantcy/article/details/4474604
As in other Microsoft executable formats, you find the real header by looking up its starting offset, which is stored in the MS-DOS stub header. The WINNT.H file includes a structure definition for the MS-DOS stub header that makes it very easy to look up where the PE header starts. The e_lfanew field is a relative offset (or RVA, if you prefer) to the actual PE header. To get a pointer to the PE header in memory, just add that field's value to the image base:
看雪中的一些参考链接:
http://bbs.pediy.com/showthread.php?threadid=19618
http://bbs.pediy.com/showthread.php?threadid=21932
关于重定位表: