收录查询

MSDN2001中关于symbol文件的相关介绍和MSDN中Matt Pietrek的文章------最好还是下载DDK

The following topics describe symbol files and the functionality provided by the DbgHelp functions.

Note that all DbgHelp functions are single threaded. Therefore, calls from more than one thread to this function will likely result in unexpected behavior or memory corruption. To avoid this, you must synchronize all concurrent calls from more than one thread to this function.

Symbol Files

A symbol file contains the same debugging information that an executable file would contain. However, the information is stored in a debug (.dbg) file or a program database (.pdb), rather than the executable file. Therefore, you can install only the symbol files you will need during debugging. This reduces the file size of the executable, saving load time and disk storage.

Debuggers can determine whether an executable file or DLL contains debugging information by searching for the IMAGE_FILE_DEBUG_STRIPPED characteristic. If this characteristic is present, the debugging information exists in a symbol file.

To create a .dbg file, build your executable file with debugging information according to the directions for your build tools. Next, use the SplitSymbols function or the Rebase tool. The resulting .dbg file uses the PE format.

To create a .pdb file, build your executable file with debugging information according to the directions for your build tools.

The operating system dynamic-link libraries (DLL) have associated symbol (.dbg) files. These files are not installed during installation. To install the system symbol files, create a directory on your hard disk, and copy the files from your system installation compact disc (CD). The symbol files are located in the SUPPORT\DEBUG\I386\SYMBOLS directory tree.

To work with the symbolic debugging information contained in a symbol file, use the symbol handling functions.

*************************************************************

Symbol Handling

The symbol handler functions give applications easy and portable access to the symbolic debugging information of an image. These functions should be used exclusively to ensure access to symbolic information. This is necessary because these functions isolate the application from the symbol format.

For more information, see the following topics:

-----------------------------------------------------

Symbol Handler Initialization

The symbol handler is designed to track various sets of symbol files.

To initialize the symbol handler, call the SymInitialize function. The hProcess parameter can be a unique arbitrary number, a value returned from the GetCurrentProcess function, or the identifier of any running process. The fInvadeProcess parameter indicates whether the symbol handler should enumerate the modules loaded by the process and load symbols for each of its modules. If fInvadeProcess is TRUE, the hProcess parameter must be the value returned from GetCurrentProcess or the identifier of an existing process.

Using fInvadeProcess is a simple way to load all symbol files for a process. However, the symbol handler will not attempt to load symbols for modules subsequently loaded by the LoadLibrary function. You must use the SymLoadModule function in this case.

--------------------------------------------------
Symbol Paths

The library uses the symbol search path to locate debug symbols (.dbg file) for .dll, .exe, and .sys files by appending "\symbols" and "\dll" or "\exe" or "\sys" to the path. For example, the typical location of symbol files for .dll files is c:\mysymbols\symbols\dll. For .exe files, the location is c:\mysymbols\symbols\exe.

To specify where the symbol handler will search disk directories for symbol files, call the SymSetSearchPath function. Alternatively, you can specify a symbol search path in the UserSearchPath parameter of the SymInitialize function.

The UserSearchPath parameter in SymInitialize and the SearchPath parameter in SymSetSearchPath take a pointer to a null-terminated string that specifies a path, or series of paths separated by a semicolon. The symbol handler uses these paths to search for symbol files. If this parameter is specified as a non-null value, the symbol handler searches only the paths set by the application. If this parameter is NULL, the symbol handler first searches the current working directory of the application, then the system root directory (%windir%). If you set the _NT_SYMBOL_PATH or _NT_ALT_SYMBOL_PATH environment variable, the symbol handler searches for symbol files in the following order:

  1. The current working directory of the application.
  2. The _NT_SYMBOL_PATH environment variable.
  3. The _NT_ALT_SYMBOL_PATH environment variable.

To retrieve the search paths, call the SymGetSearchPath function.

The search path for program database (.pdb) files is different than the path for debug (.dbg) files. The algorithm is determined by the functionality of the symbol library. By default, Microsoft Visual C/C++ creates Microsoft format symbols, strips them from the image, and places them in a separate .pdb file. Typically, the .pdb file will be located in the directory that contains the executable image. Visual C/C++ embeds the absolute path to the .pdb file in the executable image. If the symbol handler cannot find the .pdb file in that location or if the .pdb file was moved to another directory, the symbol handler will locate the .pdb file using the search path described for .dbg files.


--------------------------------------------------
Symbol Loading

The symbol handler will load symbols when you call the SymInitialize function with the fInvadeProcess parameter set to TRUE or when you call the SymLoadModule function to specify a module. In either case, the symbol handler either loads the symbols or defers symbol loading until symbols are requested, depending on the options set by the SymSetOptions function.

The symbol handler can be used to retrieve symbolic information for any module; it does not need to be associated with a process specified in the SymInitialize call. To use an arbitrary module, specify the full path to the module image in the ImageName parameter. You can use a path to any executable module that has debugging information (.exe, .dll, .drv, .sys, .scr, .cpl, or .com). Use the BaseOfDll parameter to specify any load address, then symbol addresses will be based from that address.

It may not be necessary to keep a symbol module loaded through the duration of an application. To release the symbol module from the symbol handler's list of modules, use the SymUnLoadModule function. This function releases the memory allocated for the symbol module. To use symbols for that module again, you must call the SymLoadModule function even if the symbol deferred load option is set.

----------------------------------------------------
Deferred Symbol Loading

To conserve time and memory when working with many symbol files, use the SymSetOptions function to set the deferred symbol loading (SYMOPT_DEFERRED_LOADS) option, then use the SymLoadModule or SymInitialize function to load symbols deferred for all modules. The symbol handler will list symbols that are available for the modules, but will not map the debug information into memory until it is requested. This is the preferred method to efficiently use debugging symbols. The following functions will load the deferred symbols:

----------------------------------------------------

Decorated Symbol Names

A decorated symbol name includes characters that distinguish how the symbol has been declared. For __stdcall functions, names include the "@" character and a decimal number that specifies the number of bytes in its function parameters. For example, the decorated name of the LoadLibrary function is LoadLibrary@4. For C++ functions the name decoration is more complex and varies from compiler to compiler.

To retrieve the undecorated symbol name, use the UnDecorateSymbolName function. Alternatively, you can call the SymSetOptions function to request that the symbol handler always present symbols with undecorated names. You must set this option before loading the symbols because the symbol handler creates the symbol name tables at load time.

-----------------------------------------------------

Finding Symbols

After a symbol file has been loaded into the symbol handler, an application can use the symbol locator functions to return symbol information for a specified address. These functions can also find a source code file name and line number location for an address.

Enumerating Symbol Files

To retrieve a list of all symbol files loaded by module name, call the SymEnumerateModules function. To retrieve a list of symbols for a given module, call the SymEnumSymbols function.

Retrieving Symbols by Address

To retrieve symbolic information for a specific address, use the SymFromAddr function. This function retrieves information and stores it in a SYMBOL_INFO structure. Because symbol names are variable in length, you must provide additional buffer space following the SYMBOL_INFO structure declaration. The following is an example using SymFromAddr:

DWORD dwAddress;
DWORD dwDisplacement;
BYTE buffer[MAX_BUFFER_LENGTH];
PSYMBOL_INFO pSymbol = (PSYMBOL_INFO)buffer;

pSymbol->SizeOfStruct = sizeof(SYMBOL_INFO);
pSymbol->MaxNameLen = sizeof(buffer) - sizeof(SYMBOL_INFO) + 1;
SymFromAddr(hProcess, dwAddress, &dwDisplacement, pSymbol)

Note that hProcess is the handle to the process originally passed to the SymInitialize function, and dwAddress contains the address for which a symbol is to be located. The address does not need to be on a symbol boundary. If the address comes after the beginning of a symbol but before the end of the symbol (the beginning of the symbol plus the symbol size), the function will locate the symbol.

Retrieving Symbols by Symbol Name

To retrieve symbolic information in a SYMBOL_INFO structure for a specific module and symbol name, use the SymFromName function. If deferred symbol loading is set, SymFromName will attempt to load the symbol file for a module if it has not already been loaded. To specify a module name along with a symbol name, use the syntax Module!SymName. The "!" character delimits the module name from the symbol name. The following is an example using SymFromName:

BYTE szSymbolName[MAX_SYMBOLNAME_LENGTH];
BYTE buffer[MAX_BUFFER_LENGTH];
PIMAGEHLP_SYMBOL pSymbol = (PSYMBOL_INFO)buffer;

pSymbol->SizeOfStruct = sizeof(SYMBOL_INFO);
pSymbol->MaxNameLen = sizeof(buffer) - sizeof(SYMBOL_INFO) + 1;
SymFromName(hProcess, szSymbolName, pSymbol)

Note that hProcess is the handle to the process originally passed to SymInitialize, and szSymbolName is a null-terminated string that specifies the symbol name for which a symbol is to be located.

Retrieving Line Numbers by Address

To retrieve the source code location for a specific address, use the SymGetLineFromAddr function. This function fills an IMAGEHLP_LINE structure that includes the source file name and line number location referred to by the specified address. The following is an example using SymGetLineFromAddr:

DWORD dwAddress;
DWORD dwDisplacement;
IMAGEHLP_LINE line;

SymSetOptions(SYMOPT_LOAD_LINES);
...
line.SizeOfStruct = sizeof(IMAGEHLP_LINE);
SymGetLineFromAddr(hProcess, dwAddress, &dwDisplacement, &line);

Note that hProcess is the handle to the process originally passed to SymInitialize, and dwAddress contains the address for which the source file name and line number should be located.

Retrieving Line Numbers by Symbol Name

To retrieve source code location for a specific symbol name, use the SymGetLineFromName function. This function is similar to SymGetSymFromName, but retrieves an IMAGEHLP_LINE structure. To use SymGetLineFromAddr or SymGetLineFromName, you must set the load lines option (SYMOPT_LOAD_LINES) using the SymSetOptions function. The following is an example using SymGetLineFromName:

BYTE      szModuleName[MAX_PATH];
BYTE szFileName[MAX_PATH];
DWORD dwLineNumber;
LONG lDisplacement;
IMAGEHLP_LINE line;

SymSetOptions(SYMOPT_LOAD_LINES);
...
line.SizeOfStruct = sizeof(IMAGEHLP_LINE);

SymGetLineFromName(hProcess, szModuleName, szFileName, dwLineNumber, &lDisplacement, &line);

Note that hProcess is the handle to the process originally passed to SymInitialize. Also, szModuleName contains the source module name, which can be NULL. The szFileName parameter contains the source file name, and dwLineNumber contains the line number for which the virtual address should be retrieved.

-------------------------------------------------------

Symbol Handler Cleanup

To free all the memory used by the symbol handler for a process, use the SymCleanup function. This function enumerates all loaded modules, frees each module, and frees the memory allocated for the list of modules. After you call SymCleanup, you cannot use the process handle in the symbol handling functions until you call the SymInitialize function.


*************************************************************

Symbol Servers and Symbol Stores

To set up symbols correctly for debugging can be a challenging task, particularly for kernel debugging. It often requires that you know the names and releases of all products on your computer. The debugger must be able to locate the symbol files that correspond to each product release and service pack. This can result in an extremely long symbol path consisting of a long list of directories.

To simplify these difficulties in coordinating symbol files, a symbol server can be used. A symbol server enables the debuggers to automatically retrieve the correct symbol files without product names, releases, or build numbers. Debugging Tools for Windows contains a symbol server called SymSrv.

The symbol server is activated by including a certain text string in the symbol path. Each time the debugger needs to load symbols for a newly loaded module, it calls the symbol server to locate the appropriate symbol files. The symbol server locates the files in a symbol store. This is a collection of symbol files, an index, and a tool that can be used by an administrator to add and delete files. The files are indexed according to unique parameters such as the time stamp and image size. Debugging Tools for Windows contains a symbol store tool called SymStore.

For more information, see:


Using SymSrv

SymSrv (symsrv.dll) is a symbol server that is included in the Debugging Tools for Windows package.

SymSrv can deliver symbol files from a centralized symbol store. This store can contain any number of symbol files, corresponding to any number of programs or operating systems. The store can also contain binary files (this is useful when debugging minidump files).

The store can contain the actual symbol and binary files, or it can simply contain pointers to symbol files. If the store contains pointers, SymSrv will retrieve the actual files directly from their sources.

SymSrv can also be used to separate a large symbol store into a smaller subset that is appropriate for a specialized debugging task.

Finally, SymSrv can obtain symbol files from an HTTP, HTTPS, or FTP source using the logon information provided by the operating system. SymSrv supports HTTPS sites protected by smartcards, certificates, and regular logins and passwords. However, if you have one copy of SymSrv in your symbol path using an FTP session, you cannot have additional copies using HTTP, HTTPS, or other FTP sessions.

Setting the Symbol Path

To use this symbol server, symsrv.dll must be installed in the same directory as the debugger. The symbol path must be set in one of the following ways:

set _NT_SYMBOL_PATH = symsrv*ServerDLL*DownstreamStore*\\Server\Share 

set _NT_SYMBOL_PATH = symsrv*ServerDLL*\\Server\Share

set _NT_SYMBOL_PATH = srv*DownstreamStore*\\Server\Share
--------------------------------------------------------------

Using Other Symbol Servers

If you wish to use a different method for your symbol search, you can provide your own symbol server DLL rather than using SymSrv.

Setting the Symbol Path

When implementing a symbol server other than SymSrv, the debugger's symbol path is set in the same way as with SymSrv. See Using SymSrv for an explanation of the symbol path syntax. The only change needed is to replace the string symsrv.dll with the name of your own symbol server DLL.

If you wish, you are free to use a different syntax within the parameters to indicate the use of different technologies such as UNC paths, SQL database identifiers, or Internet specifications.

Implementing Your Own Symbol Server

The central portion of the server is the code that communicates with DbgHelp to find the symbols. Every time DbgHelp needs to load symbols for a newly loaded module, it calls the symbol server to locate the appropriate symbol files. The symbol server locates each file according to unique parameters such as the time stamp or image size. The server returns a validated path to the requested file. To implement this, the server must export the SymbolServer function. The server should also support the SymbolServerSetOptions and SymbolServerGetOptions functions.

Furthermore, DbgHelp will call the SymbolServerClose function, if exported by the server.

You may not change the actual symbol file name returned by your symbol server. DbgHelp stores the name of a symbol file in multiple locations. Therefore, the server must return a file of the same name as that specified when the symbol was requested. This restriction is needed to assure that the symbol names displayed during symbol loading are the ones that the programmer will recognize.

Restrictions on Multiple Symbol Servers

DbgHelp supports the use of only one symbol server at a time, although it is possible to switch to another symbol server by changing the symbol path. This does not limit the symbol server from obtaining its symbol information from a single source: the server can support multiple source instances through the parameters passed to it when it is called. In other words, your symbol path can contain multiple instances of the same symbol server DLL, but not two different symbol server DLLs.

Installing Your Symbol Server

The details of your symbol server installation will depend on your situation. You may wish to set up an installation process that copies your symbol server DLL and sets the _NT_SYMBOL_PATH environment variable automatically.

Depending on the technology used in your server, you may also need to install or access the symbol data itself.



set _NT_SYMBOL_PATH = srv*\\Server\Share

The following table describes elements of this syntax.

Field Description
symsrv This keyword must always appear first. It indicates to the debugger that this item is a symbol server, not just a normal symbol directory.
ServerDLL Specifies the name of the symbol server DLL. If you are using the SymSrv symbol server, this will always be symsrv.dll.
srv This is shorthand for symsrv*symsrv.dll.
DownstreamStore Specifies a local directory or network share that will be used to cache individual symbol files. If DownstreamStore specifies a directory that does not exist, SymStore will attempt to create it.
\\Server\Share Specifies the server and share of the symbol store.

The DownstreamStore parameter can be omitted, as shown in the first syntax example above. This parameter is required if you are accessing symbols from an FTP, HTTP, or HTTPS site, or if you are using compressed files on your store.

If DownstreamStore is not included, the debugger will load all symbol files from the specified Server and Share.

If DownstreamStore is included, the debugger will first look for a symbol file in this location. If the symbol file is not found, the debugger will locate the symbol file from the specified Server and Share, and then cache a copy of this file in the downstream store. The file will be copied to a subdirectory in the tree under DownstreamStore which corresponds to its location in the tree under \\Server\Share.

The symbol server does not have to be the only entry in the symbol path. If the symbol path consists of multiple entries, the debugger checks each entry for the needed symbol files, in order (from left to right), regardless of whether a symbol server or an actual directory is named.

Here are some examples. To use SymSrv as the symbol server with a symbol store on \\mybuilds\mysymbols, set the following symbol path:

set _NT_SYMBOL_PATH= symsrv*symsrv.dll*\\mybuilds\mysymbols

To set the symbol path so that the debugger will copy symbol files from a symbol store on \\mybuilds\mysymbols to your local directory c:\localsymbols, use:

set _NT_SYMBOL_PATH=symsrv*symsrv.dll*c:\localsymbols*\\mybuilds\mysymbols

To set the symbol path so that the debugger will copy symbol files from the FTP site ftp.somecompany.com/manysymbols to a local network directory \\localserver\myshare\mycache, use:

set _NT_SYMBOL_PATH=symsrv*symsrv.dll*\\localserver\myshare\mycache*ftp://ftp.somecompany.com/manysymbols

This last example can also be abbreviated as:

set _NT_SYMBOL_PATH=srv*\\localserver\myshare\mycache*ftp://ftp.somecompany.com/manysymbols

In addition, the symbol path may contain several directories or symbol servers, separated by semicolons. This allows you to locate symbols from multiple locations (or even multiple symbol servers). If a binary has a mismatched symbol file, the debugger cannot locate it using the symbol server because it checks only for the exact parameters. However, the debugger may find a mismatched symbol file with the correct name, using the traditional symbol path, and successfully load it. Even though the file is technically not the correct symbol file, it may provide useful information.

Compressed Files

SymSrv is compatible with symbol stores that contain compressed files, as long as this compression has been done with the compress.exe tool that is distributed with the Platform SDK. Compressed files should have an underscore as the last character in their file extensions (for example, module1.pd_ or module2.db_). For details, see Using SymStore.

If the files on the store are compressed, you must use a downstream store. SymSrv will uncompress all files before caching them on the downstream store.

Deleting the Cache

If you are using a DownstreamStore as a cache, you can delete this directory at any time to save disk space.

It is possible to have a vast symbol store that includes symbol files for many different programs or Windows versions. If you upgrade the version of Windows used on your target computer, the cached symbol files will all match the earlier version. These cached files will not be of any further use, and therefore this might be a good time to delete the cache.

How SymSrv Locates Files

SymSrv creates a fully-qualified UNC path to the desired symbol file. This path begins with the path to the symbol store recorded in the _NT_SYMBOL_PATH environment variable. The SymbolServer routine is then used to identify the name of the desired file; this name is appended to the path as a directory name. Another directory name, consisting of the concatenation of the id, two, and three parameters passed to SymbolServer, is then appended; if any of these values are zero, they are omitted.

The resulting directory is searched for the symbol file, or a symbol store pointer file.

If this search is successful, the path is passed to the caller and TRUE is returned. If the file is not found, FALSE is returned.

-------------------------------------------------------------

Using Other Symbol Servers

If you wish to use a different method for your symbol search, you can provide your own symbol server DLL rather than using SymSrv.

Setting the Symbol Path

When implementing a symbol server other than SymSrv, the debugger's symbol path is set in the same way as with SymSrv. See Using SymSrv for an explanation of the symbol path syntax. The only change needed is to replace the string symsrv.dll with the name of your own symbol server DLL.

If you wish, you are free to use a different syntax within the parameters to indicate the use of different technologies such as UNC paths, SQL database identifiers, or Internet specifications.

Implementing Your Own Symbol Server

The central portion of the server is the code that communicates with DbgHelp to find the symbols. Every time DbgHelp needs to load symbols for a newly loaded module, it calls the symbol server to locate the appropriate symbol files. The symbol server locates each file according to unique parameters such as the time stamp or image size. The server returns a validated path to the requested file. To implement this, the server must export the SymbolServer function. The server should also support the SymbolServerSetOptions and SymbolServerGetOptions functions.

Furthermore, DbgHelp will call the SymbolServerClose function, if exported by the server.

You may not change the actual symbol file name returned by your symbol server. DbgHelp stores the name of a symbol file in multiple locations. Therefore, the server must return a file of the same name as that specified when the symbol was requested. This restriction is needed to assure that the symbol names displayed during symbol loading are the ones that the programmer will recognize.

Restrictions on Multiple Symbol Servers

DbgHelp supports the use of only one symbol server at a time, although it is possible to switch to another symbol server by changing the symbol path. This does not limit the symbol server from obtaining its symbol information from a single source: the server can support multiple source instances through the parameters passed to it when it is called. In other words, your symbol path can contain multiple instances of the same symbol server DLL, but not two different symbol server DLLs.

Installing Your Symbol Server

The details of your symbol server installation will depend on your situation. You may wish to set up an installation process that copies your symbol server DLL and sets the _NT_SYMBOL_PATH environment variable automatically.

Depending on the technology used in your server, you may also need to install or access the symbol data itself.

----------------------------------------------

Using SymStore

SymStore (symstore.exe) is a tool for creating symbol stores. It is included in the Debugging Tools for Windows package.

SymStore stores symbols in a format that enables the debugger to look up the symbols based on the time stamp and size of the image (for a .dbg or executable file), or signature and age (for a .pdb file). The advantage of the symbol store over the traditional symbol storage format is that all symbols can be stored or referenced on the same server and retrieved by the debugger without any prior knowledge of which product contains the corresponding symbol.

Note that multiple versions of .pdb symbol files (for example, public and private versions) cannot be stored on the same server, because they each contain the same signature and age.

SymStore Transactions

Every call to SymStore is recorded as a transaction. There are two types of transactions: add and delete.

When the symbol store is created, a directory, called "000admin", is created under the root of the server. The 000admin directory contains one file for each transaction, as well as the log files server.txt and history.txt. The server.txt file contains a list of all transactions that are currently on the server. The history.txt file contains a chronological history of all transactions.

Each time SymStore stores or removes symbol files, a new transaction number is created. Then, a file, whose name is this transaction number, is created in 000admin. This file contains a list of all the files or pointers that have been added to the symbol store during this transaction. If a transaction is deleted, SymStore will read through its transaction file to determine which files and pointers it should delete.

The add and del options specify whether an add or delete transaction is to be performed. Including the /p option with an add operation specifies that a pointer is to be added; omitting the /p option specifies that the actual symbol file is to be added.

It is also possible to create the symbol store in two separate stages. In the first stage, you use SymStore with the /x option to create an index file. In the second stage, you use SymStore with the /y option to create the actual store of files or pointers from the information in the index file.

This can be a useful technique for a variety of reasons. For instance, this allows the symbol store to be easily recreated if the store is somehow lost, as long as the index file still exists. Or perhaps the computer containing the symbol files has a slow network connection to the computer on which the symbol store will be created. In this case, you can create the index file on the same machine as the symbol files, transfer the index file to the second machine, and then create the store on the second machine.

For a full listing of all SymStore parameters, see SymStore Command-Line Options.

Note  SymStore does not support simultaneous transactions from multiple users. It is recommended that one user be designated "administrator" of the symbol store and be responsible for all add and del transactions.

Transaction Examples

Here are two examples of SymStore adding symbol pointers for build 2195 of Windows 2000 to \\foo\symsrv:

symstore add /r /p /f \\BuildServer\BuildShare\2195free\symbols\*.* /s \\foo\symsrv /t "Windows 2000" /v "Build 2195 x86 free" /c "Sample add"
symstore add /r /p /f \\BuildServer\BuildShare\2195free\symbols\*.* /s \\foo\symsrv /t "Windows 2000" /v "Build 2195 x86 checked" /c "Sample add"

In the following example, SymStore adds the actual symbol files for an application project in \\largeapp\appserver\bins to \\foo\symsrv:

symstore add /r /f \\largeapp\appserver\bins\*.* /s \\foo\symsrv /t "Large Application" /v "Build 432" /c "Sample add"

Here is an example of how an index file is used. First, SymStore creates an index file based on the collection of symbol files in \\largeapp\appserver\bins\. In this case, the index file is placed on a third computer, \\hubserver\hubshare. You use the /g option to specify that the file prefix "\\largeapp\appserver" might change in the future:

symstore add /r /p /g \\largeapp\appserver /f \\largeapp\appserver\bins\*.* /x \\hubserver\hubshare\myindex.txt

Now suppose you move all the symbol files off of the machine \\largeapp\appserver and put them on \\myarchive\appserver. You can then create the symbol store itself from the index file \\hubserver\hubshare\myindex.txt as follows:

symstore add /y \\hubserver\hubshare\myindex.txt /g \\myarchive\appserver /s \\foo\symsrv /p /t "Large Application" /v "Build 432" /c "Sample Add from Index"

Finally, here is an example of SymStore deleting a file added by a previous transaction. See the following section for an explanation of how to determine the transaction ID (in this case, 0000000096).

symstore del /i 0000000096 /s \\foo\symsrv

Compressed Files

SymStore can be used with compressed files in two different ways.

  1. Use SymStore with the /p option to store pointers to the symbol files. After SymStore finishes, compress the files that the pointers refer to.
  2. Use SymStore with the /x option to create an index file. After SymStore finishes, compress the files listed in the index file. Then use SymStore with the /y option (and, if you wish, the /p option) to store the files or pointers to the files in the symbol store. (SymStore will not need to uncompress the files to perform this operation.)

Your symbol server will be responsible for uncompressing the files when they are needed.

If you are using SymSrv as your symbol server, any compression should be done using the compress.exe tool that is distributed with the Platform SDK. Compressed files should have an underscore as the last character in their file extensions (for example, module1.pd_ or module2.db_). For details, see Using SymSrv.

The server.txt and history.txt Files

When a transaction is added, several items of information are added to server.txt and history.txt for future lookup capability. The following is an example of a line in server.txt and history.txt for an add transaction:

0000000096,add,ptr,10/09/99,00:08:32,Windows NT 4.0 SP 4,x86 fre 1.156c-RTM-2,Added from \\mybuilds\symbols,

This is a comma-separated line. The fields are defined as follows.

Field Description
0000000096 Transaction ID number, as created by SymStore.
add Type of transaction. This field can be either add or del.
ptr Whether files or pointers were added. This field can be either file or ptr.
10/09/99 Date when transaction occurred.
00:08:32 Time when transaction started.
Windows NT Product.
x86 fre Version (optional).
Added from Comment (optional)
Unused (Reserved for later use.)

Here are some sample lines from the transaction file 0000000096. Each line records the directory and the location of the file or pointer that was added to the directory.

canon800.dbg\35d9fd51b000,\\mybuilds\symbols\sp4\dll\canon800.dbg
canonlbp.dbg\35d9fd521c000,\\mybuilds\symbols\sp4\dll\canonlbp.dbg
certadm.dbg\352bf2f48000,\\mybuilds\symbols\sp4\dll\certadm.dbg
certcli.dbg\352bf2f1b000,\\mybuilds\symbols\sp4\dll\certcli.dbg
certcrpt.dbg\352bf04911000,\\mybuilds\symbols\sp4\dll\certcrpt.dbg
certenc.dbg\352bf2f7f000,\\mybuilds\symbols\sp4\dll\certenc.dbg

If you use a del transaction to undo the original add transactions, these lines will be removed from server.txt, and the following line will be added to history.txt:

0000000105,del,0000000096

The fields for the delete transaction are defined as follows.

Field Description
0000000105 Transaction ID number, as created by SymStore.
del Type of transaction. This field can be either add or del.
0000000096 Transaction that was deleted.

Symbol Storage Format

SymStore uses the file system itself as a database. It creates a large tree of directories, with directory names based on such things as the symbol file time stamps, signatures, age, and other data.

For example, after several different acpi.dbg files have been added to the server, the directories could look like this:

Directory of \\mybuilds\symsrv\acpi.dbg
10/06/1999 05:46p <DIR> .
10/06/1999 05:46p <DIR> ..
10/04/1999 01:54p <DIR> 37cdb03962040
10/04/1999 01:49p <DIR> 37cdb04027740
10/04/1999 12:56p <DIR> 37e3eb1c62060
10/04/1999 12:51p <DIR> 37e3ebcc27760
10/04/1999 12:45p <DIR> 37ed151662060
10/04/1999 12:39p <DIR> 37ed15dd27760
10/04/1999 11:33a <DIR> 37f03ce962020
10/04/1999 11:21a <DIR> 37f03cf7277c0
10/06/1999 05:38p <DIR> 37fa7f00277e0
10/06/1999 05:46p <DIR> 37fa7f01620a0

In this example, the lookup path for the acpi.dbg symbol file might look something like this: \\mybuilds\symsrv\acpi.dbg\37cdb03962040.

Three files may exist inside the lookup directory:

  1. If the file was stored, then acpi.dbg will exist there.
  2. If a pointer was stored, then a file called file.ptr will exist and contain the path to the actual symbol file.
  3. A file called refs.ptr, which contains a list of all the current locations for acpi.dbg with this timestamp and image size that are currently added to the symbol store.

Displaying the directory listing of \\mybuilds\symsrv\acpi.dbg\37cdb03962040 gives the following:

10/04/1999  01:54p                  52 file.ptr
10/04/1999 01:54p 67 refs.ptr

The file file.ptr contains the text string "\\mybuilds\symbols\x86\2128.chk\symbols\sys\acpi.dbg". Since there is no file called acpi.dbg in this directory, the debugger will try to find the file at \\mybuilds\symbols\x86\2128.chk\symbols\sys\acpi.dbg.

The contents of refs.ptr are used only by SymStore, not the debugger. This file contains a record of all transactions that have taken place in this directory. A sample line from refs.ptr might be:

0000000026,ptr,\\mybuilds\symbols\x86\2128.chk\symbols\sys\acpi.dbg

This shows that a pointer to \\mybuilds\symbols\x86\2128.chk\symbols\sys\acpi.dbg was added with transaction "0000000026".

Some symbol files stay constant through various products or builds or a particular product. One example of this is the Windows 2000 file msvcrt.pdb. Doing a directory of \\mybuilds\symsrv\msvcrt.pdb shows that only two versions of msvcrt.pdb have been added to the symbols server:

Directory of \\mybuilds\symsrv\msvcrt.pdb
10/06/1999 05:37p <DIR> .
10/06/1999 05:37p <DIR> ..
10/04/1999 11:19a <DIR> 37a8f40e2
10/06/1999 05:37p <DIR> 37f2c2272

However, doing a directory of \\mybuilds\symsrv\msvcrt.pdb\37a8f40e2 shows that refs.ptr has several pointers in it.

Directory of \\mybuilds\symsrv\msvcrt.pdb\37a8f40e2
10/05/1999 02:50p 54 file.ptr
10/05/1999 02:50p 2,039 refs.ptr

The contents of \\mybuilds\symsrv\msvcrt.pdb\37a8f40e2\refs.ptr are the following:

0000000001,ptr,\\mybuilds\symbols\x86\2137\symbols\dll\msvcrt.pdb
0000000002,ptr,\\mybuilds\symbols\x86\2137.chk\symbols\dll\msvcrt.pdb
0000000003,ptr,\\mybuilds\symbols\x86\2138\symbols\dll\msvcrt.pdb
0000000004,ptr,\\mybuilds\symbols\x86\2138.chk\symbols\dll\msvcrt.pdb
0000000005,ptr,\\mybuilds\symbols\x86\2139\symbols\dll\msvcrt.pdb
0000000006,ptr,\\mybuilds\symbols\x86\2139.chk\symbols\dll\msvcrt.pdb
0000000007,ptr,\\mybuilds\symbols\x86\2140\symbols\dll\msvcrt.pdb
0000000008,ptr,\\mybuilds\symbols\x86\2140.chk\symbols\dll\msvcrt.pdb
0000000009,ptr,\\mybuilds\symbols\x86\2136\symbols\dll\msvcrt.pdb
0000000010,ptr,\\mybuilds\symbols\x86\2136.chk\symbols\dll\msvcrt.pdb
0000000011,ptr,\\mybuilds\symbols\x86\2135\symbols\dll\msvcrt.pdb
0000000012,ptr,\\mybuilds\symbols\x86\2135.chk\symbols\dll\msvcrt.pdb
0000000013,ptr,\\mybuilds\symbols\x86\2134\symbols\dll\msvcrt.pdb
0000000014,ptr,\\mybuilds\symbols\x86\2134.chk\symbols\dll\msvcrt.pdb
0000000015,ptr,\\mybuilds\symbols\x86\2133\symbols\dll\msvcrt.pdb
0000000016,ptr,\\mybuilds\symbols\x86\2133.chk\symbols\dll\msvcrt.pdb
0000000017,ptr,\\mybuilds\symbols\x86\2132\symbols\dll\msvcrt.pdb
0000000018,ptr,\\mybuilds\symbols\x86\2132.chk\symbols\dll\msvcrt.pdb
0000000019,ptr,\\mybuilds\symbols\x86\2131\symbols\dll\msvcrt.pdb
0000000020,ptr,\\mybuilds\symbols\x86\2131.chk\symbols\dll\msvcrt.pdb
0000000021,ptr,\\mybuilds\symbols\x86\2130\symbols\dll\msvcrt.pdb
0000000022,ptr,\\mybuilds\symbols\x86\2130.chk\symbols\dll\msvcrt.pdb
0000000023,ptr,\\mybuilds\symbols\x86\2129\symbols\dll\msvcrt.pdb
0000000024,ptr,\\mybuilds\symbols\x86\2129.chk\symbols\dll\msvcrt.pdb
0000000025,ptr,\\mybuilds\symbols\x86\2128\symbols\dll\msvcrt.pdb
0000000026,ptr,\\mybuilds\symbols\x86\2128.chk\symbols\dll\msvcrt.pdb
0000000027,ptr,\\mybuilds\symbols\x86\2141\symbols\dll\msvcrt.pdb
0000000028,ptr,\\mybuilds\symbols\x86\2141.chk\symbols\dll\msvcrt.pdb
0000000029,ptr,\\mybuilds\symbols\x86\2142\symbols\dll\msvcrt.pdb
0000000030,ptr,\\mybuilds\symbols\x86\2142.chk\symbols\dll\msvcrt.pdb

This shows that the same msvcrt.pdb was used for multiple builds of symbols for Windows 2000 stored on \\mybuilds\symsrv.

Here is an example of a directory that contains a mixture of file and pointer additions:

Directory of E:\symsrv\dbghelp.dbg\38039ff439000
10/12/1999 01:54p 141,232 dbghelp.dbg
10/13/1999 04:57p 49 file.ptr
10/13/1999 04:57p 306 refs.ptr

In this case, refs.ptr has the following contents:

0000000043,file,e:\binaries\symbols\retail\dll\dbghelp.dbg
0000000044,file,f:\binaries\symbols\retail\dll\dbghelp.dbg
0000000045,file,g:\binaries\symbols\retail\dll\dbghelp.dbg
0000000046,ptr,\\foo\bin\symbols\retail\dll\dbghelp.dbg
0000000047,ptr,\\foo2\bin\symbols\retail\dll\dbghelp.dbg

Thus, transactions 43, 44, and 45 added the same file to the server, and transactions 46 and 47 added pointers. If transactions 43, 44, and 45 are deleted, then the file dbghelp.dbg will be deleted from the directory. The directory will then have the following contents:

Directory of e:\symsrv\dbghelp.dbg\38039ff439000
10/13/1999 05:01p 49 file.ptr
10/13/1999 05:01p 130 refs.ptr

Now file.ptr contains "\\foo2\bin\symbols\retail\dll\dbghelp.dbg", and refs.ptr contains

0000000046,ptr,\\foo\bin\symbols\retail\dll\dbghelp.dbg
----------------------------------------------------------------
Using Other Symbol Stores

It is possible to write your own symbol store creation program, rather than using SymStore.

Since SymStore transactions are all logged in CSV-format text files, you can leverage any existing SymStore log files for use in your own database program.

If you plan to use the SymSrv program provided with Debugging Tools for Windows, it is recommended that you use SymStore as well. Updates to these two programs will always be released together, and therefore their versions will always match.

-----------------------------------------------------------

SymStore Command-Line Options

The following syntax forms are supported for SymStore transactions. The first parameter must always be add or del. The order of the other parameters is immaterial.

symstore add [/r] [/p] [/l] /f File /s Store /t Product [/v Version] [/o] [/c Comment] [/d LogFile] 
symstore add [/r] [/p] [/l] /g Share /f File /x IndexFile [/a] [/o] [/d LogFile]
symstore add /y IndexFile /g Share /s Store [/p] /t Product [/v Version] [/o] [/c Comment] [/d LogFile]
symstore del /i ID /s Store [/o] [/d LogFile]
symstore /?
Parameter Meaning
/f File Specifies the network path of files or directories to add.
/g Share Specifies the server and share where the symbol files were originally stored. When used with /f, Share should be identical to the beginning of the File specifier. When used with /y, Share should be the location of the original symbol files (not the index file). This allows you to later change this portion of the file path in case you move the symbol files to a different server and share.
/i ID Specifies the transaction ID string.
/l Allows the file to be in a local directory rather than a network path. (This option is only used with the /p option.)
/p Causes SymStore to store a pointer to the file, rather than the file itself.
/r Causes SymStore to add files or directories recursively.
/s Store Specifies the root directory for the symbol store.
/t Product Specifies the name of the product.
/v Version Specifies the version of the product.
/c Comment Specifies a comment for the transaction.
/d LogFile Specifies a log file to be used for command output. If this is not included, transaction information and other output is sent to stdout.
/o Causes SymStore to display verbose output.
/x IndexFile Causes SymStore not to store the actual symbol files. Instead, SymStore records information in the IndexFile that will enable SymStore to access the symbol files at a later time.
/a Causes SymStore to append new indexing information to an existing index file. (This option is only used with the /x option.)
/y IndexFile Causes SymStore to read the data from a file created with /x.
/? Displays help text for the SymStore command.

*******************************************************

Minidump Files

Applications can produce user-mode minidump files, which contain a useful subset of the information contained in a crash dump file. Applications can create minidump files very quickly and efficiently. Because minidump files are small, they can be easily sent over the internet to technical support for the application.

A minidump file does not contain as much information as a full crash dump file, but it contains enough information to perform basic debugging operations. To read a minidump file, you must have the binaries and symbol files available for the debugger.

Microsoft® Office® XP and Windows® XP create minidump files for the purpose of analyzing failures on customers' computers.

The following DbgHelp functions are used with minidump files.

MiniDumpCallback
MiniDumpReadDumpStream
MiniDumpWriteDump

***********************************************************

New 64-bit Support(如果是32位机,可以不关注!!)

Where necessary, the DbgHelp library has been widened to support 64-bit Windows. The original function and structure definitions are still in DbgHelp.h, but there are also 64-bit versions of these definitions. For example, DbgHelp.h contains definitions for SymLoadModule and SymLoadModule64. These definitions are nearly identical, but use different types for the return value and the BaseOfDll parameter. (The original version uses the DWORD type for these values and the 64-bit version uses the new DWORD64 type.)

The 64-bit library is supported on all platforms. Therefore, you can use it to write a debugger that runs on 32-bit Windows or a debugger that runs on 64-bit Windows.

When writing new code, it is more efficient to use the 64-bit library. Internally, the original functions simply call the 64-bit functions to perform the work.

The following is a list of the 64-bit functions:

The following is a list of the 64-bit structures:

=====================================================================
要了解上面的symbol file是什么,就必须了解PE文件的格式:(看下面的文章和MS提供的PE文件格式PDF)

http://msdn.microsoft.com/msdnmag/issues/02/02/PE/default.aspx
An In-Depth Look into the Win32 Portable Executable File Format

或者你直接搜索MSDN中----------Matt Pietrek的文章

SUMMARY A good understanding of the Portable Executable (PE) file format leads to a good understanding of the operating system. If you know what's in your DLLs and EXEs, you'll be a more knowledgeable programmer. This article, the first of a two-part series, looks at the changes to the PE format that have occurred over the last few years, along with an overview of the format itself.
After this update, the author discusses how the PE format fits into applications written for .NET, PE file sections, RVAs, the DataDirectory, and the importing of functions. An appendix includes lists of the relevant image header structures and their descriptions.


A long time ago, in a galaxy far away, I wrote one of my first articles for Microsoft Systems Journal (now MSDN® Magazine). The article, "Peering Inside the PE: A Tour of the Win32 Portable Executable File Format," turned out to be more popular than I had expected. To this day, I still hear from people (even within Microsoft) who use that article, which is still available from the MSDN Library. Unfortunately, the problem with articles is that they're static. The world of Win32® has changed quite a bit in the intervening years, and the article is severely dated. I'll remedy that situation in a two-part article starting this month.
You might be wondering why you should care about the executable file format. The answer is the same now as it was then: an operating system's executable format and data structures reveal quite a bit about the underlying operating system. By understanding what's in your EXEs and DLLs, you'll find that you've become a better programmer all around.
Sure, you could learn a lot of what I'll tell you by reading the Microsoft specification. However, like most specs, it sacrifices readability for completeness. My focus in this article will be to explain the most relevant parts of the story, while filling in the hows and whys that don't fit neatly into a formal specification. In addition, I have some goodies in this article that don't seem to appear in any official Microsoft documentation.

Bridging the Gap
Let me give you just a few examples of what has changed since I wrote the article in 1994. Since 16-bit Windows® is history, there's no need to compare and contrast the format to the Win16 New Executable format. Another welcome departure from the scene is Win32s®. This was the abomination that ran Win32 binaries very shakily atop Windows 3.1.
Back then, Windows 95 (codenamed "Chicago" at the time) wasn't even released. Windows NT® was still at version 3.5, and the linker gurus at Microsoft hadn't yet started getting aggressive with their optimizations. However, there were MIPS and DEC Alpha implementations of Windows NT that added to the story.
And what about all the new things that have come along since that article? 64-bit Windows introduces its own variation of the Portable Executable (PE) format. Windows CE adds all sorts of new processor types. Optimizations such as delay loading of DLLs, section merging, and binding were still over the horizon. There are many new things to shoehorn into the story.
And let's not forget about Microsoft® .NET. Where does it fit in? To the operating system, .NET executables are just plain old Win32 executable files. However, the .NET runtime recognizes data within these executable files as the metadata and intermediate language that are so central to .NET. In this article, I'll knock on the door of the .NET metadata format, but save a thorough survey of its full splendor for a subsequent article.
And if all these additions and subtractions to the world of Win32 weren't enough justification to remake the article with modern day special effects, there are also errors in the original piece that make me cringe. For example, my description of Thread Local Storage (TLS) support was way out in left field. Likewise, my description of the date/time stamp DWORD used throughout the file format is accurate only if you live in the Pacific time zone!
In addition, many things that were true then are incorrect now. I had stated that the .rdata section wasn't really used for anything important. Today, it certainly is. I also said that the .idata section is a read/write section, which has been found to be most untrue by people trying to do API interception today.
Along with a complete update of the PE format story in this article, I've also overhauled the PEDUMP program, which displays the contents of PE files. PEDUMP can be compiled and run on both the x86 and IA-64 platforms, and can dump both 32 and 64-bit PE files. Most importantly, full source code for PEDUMP is available for download fropm the link at the top of this article, so you have a working example of the concepts and data structures described here.

Overview of the PE File Format
Microsoft introduced the PE File format, more commonly known as the PE format, as part of the original Win32 specifications. However, PE files are derived from the earlier Common Object File Format (COFF) found on VAX/VMS. This makes sense since much of the original Windows NT team came from Digital Equipment Corporation. It was natural for these developers to use existing code to quickly bootstrap the new Windows NT platform.
The term "Portable Executable" was chosen because the intent was to have a common file format for all flavors of Windows, on all supported CPUs. To a large extent, this goal has been achieved with the same format used on Windows NT and descendants, Windows 95 and descendants, and Windows CE.
OBJ files emitted by Microsoft compilers use the COFF format. You can get an idea of how old the COFF format is by looking at some of its fields, which use octal encoding! COFF OBJ files have many data structures and enumerations in common with PE files, and I'll mention some of them as I go along.
The addition of 64-bit Windows required just a few modifications to the PE format. This new format is called PE32+. No new fields were added, and only one field in the PE format was deleted. The remaining changes are simply the widening of certain fields from 32 bits to 64 bits. In most of these cases, you can write code that simply works with both 32 and 64-bit PE files. The Windows header files have the magic pixie dust to make the differences invisible to most C++-based code.
The distinction between EXE and DLL files is entirely one of semantics. They both use the exact same PE format. The only difference is a single bit that indicates if the file should be treated as an EXE or as a DLL. Even the DLL file extension is artificial. You can have DLLs with entirely different extensions—for instance .OCX controls and Control Panel applets (.CPL files) are DLLs.
A very handy aspect of PE files is that the data structures on disk are the same data structures used in memory. Loading an executable into memory (for example, by calling LoadLibrary) is primarily a matter of mapping certain ranges of a PE file into the address space. Thus, a data structure like the IMAGE_NT_HEADERS (which I'll examine later) is identical on disk and in memory. The key point is that if you know how to find something in a PE file, you can almost certainly find the same information when the file is loaded in memory.
It's important to note that PE files are not just mapped into memory as a single memory-mapped file. Instead, the Windows loader looks at the PE file and decides what portions of the file to map in. This mapping is consistent in that higher offsets in the file correspond to higher memory addresses when mapped into memory. The offset of an item in the disk file may differ from its offset once loaded into memory. However, all the information is present to allow you to make the translation from disk offset to memory offset (see Figure 1).

Figure 1 offsets
Figure 1 Offsets

When PE files are loaded into memory via the Windows loader, the in-memory version is known as a module. The starting address where the file mapping begins is called an HMODULE. This is a point worth remembering: given an HMODULE, you know what data structure to expect at that address, and you can use that knowledge to find all the other data structures in memory. This powerful capability can be exploited for other purposes such as API interception. (To be completely accurate, an HMODULE isn't the same as the load address under Windows CE, but that's a story for yet another day.)
A module in memory represents all the code, data, and resources from an executable file that is needed by a process. Other parts of a PE file may be read, but not mapped in (for instance, relocations). Some parts may not be mapped in at all, for example, when debug information is placed at the end of the file. A field in the PE header tells the system how much memory needs to be set aside for mapping the executable into memory. Data that won't be mapped in is placed at the end of the file, past any parts that will be mapped in.
The central location where the PE format (as well as COFF files) is described is WINNT.H. Within this header file, you'll find nearly every structure definition, enumeration, and #define needed to work with PE files or the equivalent structures in memory. Sure, there is documentation elsewhere. MSDN has the "Microsoft Portable Executable and Common Object File Format Specification," for instance (see the October 2001 MSDN CD under Specifications). But WINNT.H is the final word on what PE files look like.
There are many tools for examining PE files. Among them are Dumpbin from Visual Studio, and Depends from the Platform SDK. I particularly like Depends because it has a very succinct way of examining a file's imports and exports. A great free PE viewer is PEBrowse Professional, from Smidgeonsoft (http://www.smidgeonsoft.com). The PEDUMP program included with this article is also very comprehensive, and does almost everything Dumpbin does.
From an API standpoint, the primary mechanism provided by Microsoft for reading and modifying PE files is IMAGEHLP.DLL.
Before I start looking at the specifics of PE files, it's worthwhile to first review a few basic concepts that thread their way through the entire subject of PE files. In the following sections, I will discuss PE file sections, relative virtual addresses (RVAs), the data directory, and how functions are imported.

PE File Sections
A PE file section represents code or data of some sort. While code is just code, there are multiple types of data. Besides read/write program data (such as global variables), other types of data in sections include API import and export tables, resources, and relocations. Each section has its own set of in-memory attributes, including whether the section contains code, whether it's read-only or read/write, and whether the data in the section is shared between all processes using the executable.
Generally speaking, all the code or data in a section is logically related in some way. At a minimum, there are usually at least two sections in a PE file: one for code, the other for data. Commonly, there's at least one other type of data section in a PE file. I'll look at the various kinds of sections in Part 2 of this article next month.
Each section has a distinct name. This name is intended to convey the purpose of the section. For example, a section called .rdata indicates a read-only data section. Section names are used solely for the benefit of humans, and are insignificant to the operating system. A section named FOOBAR is just as valid as a section called .text. Microsoft typically prefixes their section names with a period, but it's not a requirement. For years, the Borland linker used section names like CODE and DATA.
While compilers have a standard set of sections that they generate, there's nothing magical about them. You can create and name your own sections, and the linker happily includes them in the executable. In Visual C++, you can tell the compiler to insert code or data into a section that you name with #pragma statements. For instance, the statement

#pragma data_seg( "MY_DATA" )

causes all data emitted by Visual C++ to go into a section called MY_DATA, rather than the default .data section. Most programs are fine using the default sections emitted by the compiler, but occasionally you may have funky requirements which necessitate putting code or data into a separate section.
Sections don't spring fully formed from the linker; rather, they start out in OBJ files, usually placed there by the compiler. The linker's job is to combine all the required sections from OBJ files and libraries into the appropriate final section in the PE file. For example, each OBJ file in your project probably has at least a .text section, which contains code. The linker takes all the sections named .text from the various OBJ files and combines them into a single .text section in the PE file. Likewise, all the sections named .data from the various OBJs are combined into a single .data section in the PE file. Code and data from .LIB files are also typically included in an executable, but that subject is outside the scope of this article.
There is a rather complete set of rules that linkers follow to decide which sections to combine and how. I gave an introduction to the linker algorithms in the July 1997 Under The Hood column in MSJ. A section in an OBJ file may be intended for the linker's use, and not make it into the final executable. A section like this would be intended for the compiler to pass information to the linker.
Sections have two alignment values, one within the disk file and the other in memory. The PE file header specifies both of these values, which can differ. Each section starts at an offset that's some multiple of the alignment value. For instance, in the PE file, a typical alignment would be 0x200. Thus, every section begins at a file offset that's a multiple of 0x200.
Once mapped into memory, sections always start on at least a page boundary. That is, when a PE section is mapped into memory, the first byte of each section corresponds to a memory page. On x86 CPUs, pages are 4KB aligned, while on the IA-64, they're 8KB aligned. The following code shows a snippet of PEDUMP output for the .text and .data section of the Windows XP KERNEL32.DLL.

Section Table
01 .text     VirtSize: 00074658  VirtAddr:  00001000
raw data offs:   00000400  raw data size: 00074800
•••
02 .data     VirtSize: 000028CA  VirtAddr:  00076000
raw data offs:   00074C00  raw data size: 00002400

The .text section is at offset 0x400 in the PE file and will be 0x1000 bytes above the load address of KERNEL32 in memory. Likewise, the .data section is at file offset 0x74C00 and will be 0x76000 bytes above KERNEL32's load address in memory.
It's possible to create PE files in which the sections start at the same offset in the file as they start from the load address in memory. This makes for larger executables, but can speed loading under Windows 9x or Windows Me. The default /OPT:WIN98 linker option (introduced in Visual Studio 6.0) causes PE files to be created this way. In Visual Studio® .NET, the linker may or may not use /OPT:NOWIN98, depending on whether the file is small enough.
An interesting linker feature is the ability to merge sections. If two sections have similar, compatible attributes, they can usually be combined into a single section at link time. This is done via the linker /merge switch. For instance, the following linker option combines the .rdata and .text sections into a single section called .text:

/MERGE:.rdata=.text

The advantage to merging sections is that it saves space, both on disk and in memory. At a minimum, each section occupies one page in memory. If you can reduce the number of sections in an executable from four to three, there's a decent chance you'll use one less page of memory. Of course, this depends on whether the unused space at the end of the two merged sections adds up to a page.
Things can get interesting when you're merging sections, as there are no hard and fast rules as to what's allowed. For example, it's OK to merge .rdata into .text, but you shouldn't merge .rsrc, .reloc, or .pdata into other sections. Prior to Visual Studio .NET, you could merge .idata into other sections. In Visual Studio .NET, this is not allowed, but the linker often merges parts of the .idata into other sections, such as .rdata, when doing a release build.
Since portions of the imports data are written to by the Windows loader when they are loaded into memory, you might wonder how they can be put in a read-only section. This situation works because at load time the system can temporarily set the attributes of the pages containing the imports data to read/write. Once the imports table is initialized, the pages are then set back to their original protection attributes.

Relative Virtual Addresses
In an executable file, there are many places where an in-memory address needs to be specified. For instance, the address of a global variable is needed when referencing it. PE files can load just about anywhere in the process address space. While they do have a preferred load address, you can't rely on the executable file actually loading there. For this reason, it's important to have some way of specifying addresses that are independent of where the executable file loads.
To avoid having hardcoded memory addresses in PE files, RVAs are used. An RVA is simply an offset in memory, relative to where the PE file was loaded. For instance, consider an EXE file loaded at address 0x400000, with its code section at address 0x401000. The RVA of the code section would be:

(target address) 0x401000 - (load address)0x400000  = (RVA)0x1000.

To convert an RVA to an actual address, simply reverse the process: add the RVA to the actual load address to find the actual memory address. Incidentally, the actual memory address is called a Virtual Address (VA) in PE parlance. Another way to think of a VA is that it's an RVA with the preferred load address added in. Don't forget the earlier point I made that a load address is the same as the HMODULE.
Want to go spelunking through some arbitrary DLL's data structures in memory? Here's how. Call GetModuleHandle with the name of the DLL. The HMODULE that's returned is just a load address; you can apply your knowledge of the PE file structures to find anything you want within the module.

The Data Directory
There are many data structures within executable files that need to be quickly located. Some obvious examples are the imports, exports, resources, and base relocations. All of these well-known data structures are found in a consistent manner, and the location is known as the DataDirectory.
The DataDirectory is an array of 16 structures. Each array entry has a predefined meaning for what it refers to. The IMAGE_DIRECTORY_ENTRY_ xxx #defines are array indexes into the DataDirectory (from 0 to 15). Figure 2 describes what each of the IMAGE_DATA_DIRECTORY_xxx values refers to. A more detailed description of many of the pointed-to data structures will be included in Part 2 of this article.

Importing Functions
When you use code or data from another DLL, you're importing it. When any PE file loads, one of the jobs of the Windows loader is to locate all the imported functions and data and make those addresses available to the file being loaded. I'll save the detailed discussion of data structures used to accomplish this for Part 2 of this article, but it's worth going over the concepts here at a high level.
When you link directly against the code and data of another DLL, you're implicitly linking against the DLL. You don't have to do anything to make the addresses of the imported APIs available to your code. The loader takes care of it all. The alternative is explicit linking. This means explicitly making sure that the target DLL is loaded and then looking up the address of the APIs. This is almost always done via the LoadLibrary and GetProcAddress APIs.
When you implicitly link against an API, LoadLibrary and GetProcAddress-like code still executes, but the loader does it for you automatically. The loader also ensures that any additional DLLs needed by the PE file being loaded are also loaded. For instance, every normal program created with Visual C++® links against KERNEL32.DLL. KERNEL32.DLL in turn imports functions from NTDLL.DLL. Likewise, if you import from GDI32.DLL, it will have dependencies on the USER32, ADVAPI32, NTDLL, and KERNEL32 DLLs, which the loader makes sure are loaded and all imports resolved. (Visual Basic 6.0 and the Microsoft .NET executables directly link against a different DLL than KERNEL32, but the same principles apply.)
When implicitly linking, the resolution process for the main EXE file and all its dependent DLLs occurs when the program first starts. If there are any problems (for example, a referenced DLL that can't be found), the process is aborted.
Visual C++ 6.0 added the delayload feature, which is a hybrid between implicit linking and explicit linking. When you delayload against a DLL, the linker emits something that looks very similar to the data for a regular imported DLL. However, the operating system ignores this data. Instead, the first time a call to one of the delayloaded APIs occurs, special stubs added by the linker cause the DLL to be loaded (if it's not already in memory), followed by a call to GetProcAddress to locate the called API. Additional magic makes it so that subsequent calls to the API are just as efficient as if the API had been imported normally.
Within a PE file, there's an array of data structures, one per imported DLL. Each of these structures gives the name of the imported DLL and points to an array of function pointers. The array of function pointers is known as the import address table (IAT). Each imported API has its own reserved spot in the IAT where the address of the imported function is written by the Windows loader. This last point is particularly important: once a module is loaded, the IAT contains the address that is invoked when calling imported APIs.
The beauty of the IAT is that there's just one place in a PE file where an imported API's address is stored. No matter how many source files you scatter calls to a given API through, all the calls go through the same function pointer in the IAT.
Let's examine what the call to an imported API looks like. There are two cases to consider: the efficient way and inefficient way. In the best case, a call to an imported API looks like this:

CALL DWORD PTR [0x00405030]

If you're not familiar with x86 assembly language, this is a call through a function pointer. Whatever DWORD-sized value is at 0x405030 is where the CALL instruction will send control. In the previous example, address 0x405030 lies within the IAT.
The less efficient call to an imported API looks like this:

CALL 0x0040100C
•••
0x0040100C:
JMP       DWORD PTR [0x00405030]

In this situation, the CALL transfers control to a small stub. The stub is a JMP to the address whose value is at 0x405030. Again, remember that 0x405030 is an entry within the IAT. In a nutshell, the less efficient imported API call uses five bytes of additional code, and takes longer to execute because of the extra JMP.
You're probably wondering why the less efficient method would ever be used. There's a good explanation. Left to its own devices, the compiler can't distinguish between imported API calls and ordinary functions within the same module. As such, the compiler emits a CALL instruction of the form

CALL XXXXXXXX

where XXXXXXXX is an actual code address that will be filled in by the linker later. Note that this last CALL instruction isn't through a function pointer. Rather, it's an actual code address. To keep the cosmic karma in balance, the linker needs to have a chunk of code to substitute for XXXXXXXX. The simplest way to do this is to make the call point to a JMP stub, like you just saw.
Where does the JMP stub come from? Surprisingly, it comes from the import library for the imported function. If you were to examine an import library, and examine the code associated with the imported API name, you'd see that it's a JMP stub like the one just shown. What this means is that by default, in the absence of any intervention, imported API calls will use the less efficient form.
Logically, the next question to ask is how to get the optimized form. The answer comes in the form of a hint you give to the compiler. The __declspec(dllimport) function modifier tells the compiler that the function resides in another DLL and that the compiler should generate this instruction

CALL DWORD PTR [XXXXXXXX]

rather than this one:

CALL XXXXXXXX

In addition, the compiler emits information telling the linker to resolve the function pointer portion of the instruction to a symbol named __imp_functionname. For instance, if you were calling MyFunction, the symbol name would be __imp_MyFunction. Looking in an import library, you'll see that in addition to the regular symbol name, there's also a symbol with the __imp__ prefix on it. This __imp__ symbol resolves directly to the IAT entry, rather than to the JMP stub.
So what does this mean in your everyday life? If you're writing exported functions and providing a .H file for them, remember to use the __declspec(dllimport) modifier with the function:

__declspec(dllimport) void Foo(void);

If you look at the Windows system header files, you'll find that they use __declspec(dllimport) for the Windows APIs. It's not easy to see this, but if you search for the DECLSPEC_IMPORT macro defined in WINNT.H, and which is used in files such as WinBase.H, you'll see how __declspec(dllimport) is prepended to the system API declarations.

PE File Structure
Now let's dig into the actual format of PE files. I'll start from the beginning of the file, and describe the data structures that are present in every PE file. Afterwards, I'll describe the more specialized data structures (such as imports or resources) that reside within a PE's sections. All of the data structures that I'll discuss below are defined in WINNT.H, unless otherwise noted.
In many cases, there are matching 32 and 64-bit data structures—for example, IMAGE_NT_HEADERS32 and IMAGE_NT_HEADERS64. These structures are almost always identical, except for some widened fields in the 64-bit versions. If you're trying to write portable code, there are #defines in WINNT.H which select the appropriate 32 or 64-bit structures and alias them to a size-agnostic name (in the previous example, it would be IMAGE_NT_HEADERS). The structure selected depends on which mode you're compiling for (specifically, whether _WIN64 is defined or not). You should only need to use the 32 or 64-bit specific versions of the structures if you're working with a PE file with size characteristics that are different from those of the platform you're compiling for.

The MS-DOS Header
Every PE file begins with a small MS-DOS® executable. The need for this stub executable arose in the early days of Windows, before a significant number of consumers were running it. When executed on a machine without Windows, the program could at least print out a message saying that Windows was required to run the executable.
The first bytes of a PE file begin with the traditional MS-DOS header, called an IMAGE_DOS_HEADER. The only two values of any importance are e_magic and e_lfanew. The e_lfanew field contains the file offset of the PE header. The e_magic field (a WORD) needs to be set to the value 0x5A4D. There's a #define for this value, named IMAGE_DOS_SIGNATURE. In ASCII representation, 0x5A4D is MZ, the initials of Mark Zbikowski, one of the original architects of MS-DOS.

The IMAGE_NT_HEADERS Header
The IMAGE_NT_HEADERS structure is the primary location where specifics of the PE file are stored. Its offset is given by the e_lfanew field in the IMAGE_DOS_HEADER at the beginning of the file. There are actually two versions of the IMAGE_NT_HEADER structure, one for 32-bit executables and the other for 64-bit versions. The differences are so minor that I'll consider them to be the same for the purposes of this discussion. The only correct, Microsoft-approved way of differentiating between the two formats is via the value of the Magic field in the IMAGE_OPTIONAL_HEADER (described shortly).
An IMAGE_NT_HEADER is comprised of three fields:

typedef struct _IMAGE_NT_HEADERS {
DWORD Signature;
IMAGE_FILE_HEADER FileHeader;
IMAGE_OPTIONAL_HEADER32 OptionalHeader;
} IMAGE_NT_HEADERS32, *PIMAGE_NT_HEADERS32;

In a valid PE file, the Signature field is set to the value 0x00004550, which in ASCII is "PE00". A #define, IMAGE_NT_SIGNATURE, is defined for this value. The second field, a struct of type IMAGE_FILE_HEADER, predates PE files. It contains some basic information about the file; most importantly, a field describing the size of the optional data that follows it. In PE files, this optional data is very much required, but is still called the IMAGE_OPTIONAL_HEADER.
Figure 3 shows the fields of the IMAGE_FILE_HEADER structure, with additional notes for the fields. This structure can also be found at the very beginning of COFF OBJ files. Figure 4 lists the common values of IMAGE_FILE_xxx. Figure 5 shows the members of the IMAGE_OPTIONAL_HEADER structure.
The DataDirectory array at the end of the IMAGE_OPTIONAL_HEADERs is the address book for important locations within the executable. Each DataDirectory entry looks like this:

typedef struct _IMAGE_DATA_DIRECTORY {
DWORD   VirtualAddress;     // RVA of the data
DWORD   Size;               // Size of the data
};

The Section Table
Immediately following the IMAGE_NT_HEADERS is the section table. The section table is an array of IMAGE_SECTION_HEADERs structures. An IMAGE_SECTION_HEADER provides information about its associated section, including location, length, and characteristics. Figure 6 contains a description of the IMAGE_SECTION_HEADER fields. The number of IMAGE_SECTION_HEADER structures is given by the IMAGE_NT_HEADERS.FileHeader.NumberOfSections field.
The file alignment of sections in the executable file can have a significant impact on the resulting file size. In Visual Studio 6.0, the linker defaulted to a section alignment of 4KB, unless /OPT:NOWIN98 or the /ALIGN switch was used. The Visual Studio .NET linker, while still defaulting to /OPT:WIN98, determines if the executable is below a certain size and if that is the case uses 0x200-byte alignment.
Another interesting alignment comes from the .NET file specification. It says that .NET executables should have an in-memory alignment of 8KB, rather than the expected 4KB for x86 binaries. This is to ensure that .NET executables built with x86 entry point code can still run under IA-64. If the in-memory section alignment were 4KB, the IA-64 loader wouldn't be able to load the file, since pages are 8KB on 64-bit Windows.

Wrap-up
That's it for the headers of PE files. In Part 2 of this article I'll continue the tour of portable executable files by looking at commonly encountered sections. Then I'll describe the major data structures within those sections, including imports, exports, and resources. And finally, I'll go over the source for the updated and vastly improved PEDUMP.


For background information see:
The Common Object File Format (COFF)


Matt Pietrek is an independent writer, consultant, and trainer. He was the lead architect for Compuware/NuMega's Bounds Checker product line for eight years and has authored three books on Windows system programming. His Web site, at http://www.wheaty.net, has a FAQ page and information on previous columns and articles.
=========================================================

posted @ 2006-10-10 14:43  ->  阅读(2143)  评论(0编辑  收藏  举报