From: http://www.debuginfo.com/articles/debuginfomatch.html
Introduction
Debug
information in PE executables
DebugDir
example
Debug information in a separate
file
PDB files
DBG
files
Matching debug information - the
details
When matching is not desirable
ChkMatch tool
Introduction
We know that debug information (symbols) helps debuggers to analyze the internal layout of the debugged application. In particular, it helps the debugger to locate addresses of variables and functions, display values of variables (including complex structures and classes with nontrivial binary layout), and map raw addresses in the executable to the lines of the source code. (See this article for more information about debug information and its contents).
When we modify the source code and rebuild the executable, its internal layout changes. Some functions and variables can move to other locations, structures and classes can be extended with new members while some old members can be removed, and so on. These changes should be properly reflected in the debug information, which also must be updated to correctly describe the new layout of the executable.
We also know that debug information is often stored separately from the executable, usually in a PDB or DBG file. Now lets imagine what can happen if the debugger picks up a wrong (or outdated) debug information file and tries to use it to debug the application. In the best case, the user will see that some variables have incorrect values. In the worst case, the debugger will not be able to display variables and step through the source code at all. As a result, effective debugging is not possible, and the reason is that debug information does not match the executable.
It is clear that debuggers should do something to prevent such situations. It is achieved through the concept of “matching debug information”. At the time when the executable is built and debug information file is generated, the build tool (linker, for example) assigns a unique identifier to the debug information file. Then this unique identifier is stored in two places – in the executable and in the debug information file. When the debugger starts debugging the executable, it refuses to load a debug information file if its unique identifier is not the same as the identifier stored in the executable. At every subsequent rebuild, the build tool changes the unique identifier, so that an old debug information file cannot be used to debug the new executable, and vice versa.
This is how the process of matching works, in brief. In the remainder of this article, we will explore the details of debug information matching. We will see what kinds of unique identifiers are used, where and how they are stored. We will also discuss situations when matching is not desirable, and see what we can do to disable it if needed.
Debug information in PE executables
As usual, lets start with some theory and explore how debug information is stored in a typical PE executable. Fortunately, PE format itself is well documented, so I don’t have to talk too much about it here (PE specification and Matt Pietrek’s articles are good sources of information ). In brief, a typical PE file starts with a set of headers that contain various important information about the layout and characteristics of the executable. Headers are followed by a set of contiguous data blocks, called “sections”, which contain the actual code and data of the executable. At the end of the file, after the sections, other arbitrary data can be placed.
When an executable is built with debug information, the debug information has to be stored somewhere. Some debug information formats (COFF and CodeView) assume that the debug information is stored in the executable. Other formats (Program Database, and also CodeView) allow storing debug information in a separate file. But even in the latter case, the executable still contains a small piece of debug information that tells the debugger that a separate file exists, and helps to find that file.
There is no common agreement between various build tools on the exact place in PE file where debug information should be stored. Some tools put debug information into one of the sections, others append it to the end of the file after all sections. But debuggers do not complain, because every executable contains a “roadmap” that helps to find the place where debug information is stored.
The road to debug information starts in the file’s optional header (IMAGE_OPTIONAL_HEADER, see WINNT.H). Needless to say that this header, while called “optional”, is always present in PE executables. At the end of the optional header, there is DataDirectory member which serves as the address book of the executable, pointing to various important locations in it. DataDirectory is actually an array of IMAGE_DATA_DIRECTORY structures.
typedef struct _IMAGE_DATA_DIRECTORY { DWORD VirtualAddress; DWORD Size; } IMAGE_DATA_DIRECTORY, *PIMAGE_DATA_DIRECTORY; typedef struct _IMAGE_OPTIONAL_HEADER { WORD Magic; … // Many other fields IMAGE_DATA_DIRECTORY DataDirectory[IMAGE_NUMBEROF_DIRECTORY_ENTRIES]; } IMAGE_OPTIONAL_HEADER32, *PIMAGE_OPTIONAL_HEADER32;
The entry at index 6 (IMAGE_DIRECTORY_ENTRY_DEBUG) contains the address and size of the executable’s debug directory, which is the place where to look for the real location of debug information in the executable file. Debug directory is stored in one of the PE sections, and consists of an array of IMAGE_DEBUG_DIRECTORY structures.
typedef struct _IMAGE_DEBUG_DIRECTORY { DWORD Characteristics; DWORD TimeDateStamp; WORD MajorVersion; WORD MinorVersion; DWORD Type; DWORD SizeOfData; DWORD AddressOfRawData; DWORD PointerToRawData; } IMAGE_DEBUG_DIRECTORY, *PIMAGE_DEBUG_DIRECTORY;
The number of entries in the debug directory can be obtained by dividing the size of the debug directory (as specified in the optional header’s data directory entry) by the size of IMAGE_DEBUG_DIRECTORY structure.
The fact that the debug directory is an array clearly shows that an executable can contain more than one kind of debug information at the same time. For example, executables built with Visual C++ 6.0 contain both COFF and CodeView debug information when /debugtype:both linker option is used.
The kind of debug information described by a particular debug directory entry is specified in Type field of IMAGE_DEBUG_DIRECTORY structure. It can have one of the following values (defined in WINNT.H):
#define IMAGE_DEBUG_TYPE_UNKNOWN 0 #define IMAGE_DEBUG_TYPE_COFF 1 #define IMAGE_DEBUG_TYPE_CODEVIEW 2 #define IMAGE_DEBUG_TYPE_FPO 3 #define IMAGE_DEBUG_TYPE_MISC 4 #define IMAGE_DEBUG_TYPE_EXCEPTION 5 #define IMAGE_DEBUG_TYPE_FIXUP 6 #define IMAGE_DEBUG_TYPE_OMAP_TO_SRC 7 #define IMAGE_DEBUG_TYPE_OMAP_FROM_SRC 8 #define IMAGE_DEBUG_TYPE_BORLAND 9 #define IMAGE_DEBUG_TYPE_RESERVED10 10 #define IMAGE_DEBUG_TYPE_CLSID 11
When working with PE executables built by Microsoft tools, we usually have to deal with only a subset of types:
Type | Description |
---|---|
IMAGE_DEBUG_TYPE_COFF | COFF debug information (stored in the executable) |
IMAGE_DEBUG_TYPE_CODEVIEW | CodeView debug information (stored in the executable) or Program Database debug information (stored in PDB file) |
IMAGE_DEBUG_TYPE_MISC | CodeView debug information (stored in DBG file) |
IMAGE_DEBUG_TYPE_FPO | Frame pointer omission information, which helps debug optimised executables |
FileOffset and Size members of IMAGE_DEBUG_DIRECTORY structure specify the actual location of the debug information of the given type in the executable file.
To summarize, when a debugger wants to find debug information for an executable, it performs the following steps:
1. Read the optional header’s data directory entry which describes the debug information (IMAGE_OPTIONAL_HEADER.DataDirectory[IMAGE_DIRECTORY_ENTRY_DEBUG]) and determine the location and size of the executable’s debug directory.
2. Read debug directory entries and pick up the ones the debugger is interested in. Use FileOffset and Size members of the corresponding IMAGE_DEBUG_DIRECTORY structure to determine the actual location and size of the debug information.
3. Read the debug information. (If the main part of debug information is stored in a separate file, read the file name from the debug information stored in the executable, and load that file).
DebugDir example
DebugDir example shows the process of finding the location of debug information in an executable in details.
Debug information in a separate file
Now I have to remind myself that I was actually going to discuss matching debug information. Thus, while it could be interesting to talk about all possible formats of debug information that we can find in a PE file, there is no sense in doing it here. This is because if the whole debug information is stored in the executable, it is always matched. So lets focus only on the cases where debug information is stored in a separate file. At the time being, there are only two such cases:
- Debug information stored in PDB file (with two existing formats – PDB 2.0 and PDB 7.0)
- Debug information stored in DBG file
PDB files
When debug information for an executable is stored in PDB file, the executable’s debug directory contains an entry of type IMAGE_DEBUG_TYPE_CODEVIEW. This entry points to a small data block, which tells the debugger where to look for the PDB file. But before we proceed to the details of the data stored in this block, a word about CodeView debug information in general should be said.
If we look at CodeView format specification (available in older versions of MSDN), we can notice that several kinds of CodeView information exist. Since all of them are called “CodeView” and use the same type of debug directory entry (IMAGE_DEBUG_TYPE_CODEVIEW), debuggers must be given a way to determine which CodeView format is actually used. This is achieved with the help of a DWORD-sized signature, which is always placed at the beginning of CodeView debug information. The most known signatures for CodeView debug information stored in the executable are “NB09” (CodeView 4.10) and “NB11” (CodeView 5.0). When CodeView information refers to a PDB file, the signature can be “NB10” (which is used with PDB 2.0 files) or “RSDS” (for PDB 7.0 files).
In most kinds of CodeView information, the signature is followed by another DWORD-sized value, Offset, which specifies the offset to the start of the actual debug information from the beginning of the CodeView data. CodeView signature and offset together are sometimes described as CV_HEADER structure:
struct CV_HEADER { DWORD Signature; DWORD Offset; };
Now lets discuss the exact format of CodeView data block that refers to a PDB 2.0 file. It can be represented with the following structure:
struct CV_INFO_PDB20 { CV_HEADER CvHeader; DWORD Signature; DWORD Age; BYTE PdbFileName[]; };
Members of this structure are described in the following table:
Member | Description |
---|---|
CvHeader.Signature | CodeView signature, equal to “NB10” |
CvHeader.Offset | CodeView offset. Set to 0, because debug information is stored in a separate file. |
Signature | The time when debug information was created (in seconds since 01.01.1970) |
Age | Ever-incrementing value, which is initially set to 1 and incremented every time when a part of the PDB file is updated without rewriting the whole file. |
PdbFileName | Null-terminated name of the PDB file. It can also contain full or partial path to the file. |
If the CodeView data block refers to a PDB 7.0 file, a different format is used:
struct CV_INFO_PDB70 { DWORD CvSignature; GUID Signature; DWORD Age; BYTE PdbFileName[]; } ;
Note that the structure does not include Offset field (and thus does not start with CV_HEADER structure), while CodeView signature is still present. The absence of Offset field makes this structure an unusual member of CodeView family.
The members of the structure are described in the following table:
Member | Description |
---|---|
CvSignature | CodeView signature, equal to “RSDS” |
Signature | A unique identifier, which changes with every rebuild of the executable and PDB file. |
Age | Ever-incrementing value, which is initially set to 1 and incremented every time when a part of the PDB file is updated without rewriting the whole file. |
PdbFileName | Null-terminated name of the PDB file. It can also contain full or partial path to the file. |
DBG files
When debug information for an executable is stored in a DBG file, the executable’s debug directory contains an entry of type IMAGE_DEBUG_TYPE_MISC. This entry points to a small block of data, which tells the debugger where to look for the DBG file. This data block has the following format (defined in WINNT.H):
#define IMAGE_DEBUG_MISC_EXENAME 1 typedef struct _IMAGE_DEBUG_MISC { DWORD DataType; DWORD Length; BOOLEAN Unicode; BYTE Reserved[ 3 ]; BYTE Data[ 1 ]; } IMAGE_DEBUG_MISC, *PIMAGE_DEBUG_MISC;
The members of this structure are described in the following table:
Member | Description |
---|---|
DataType | Type of the data. Always set to 1 (IMAGE_DEBUG_MISC_EXENAME) |
Length | Total length of the data block, multiple of four. |
Unicode | If TRUE, subsequent data is Unicode string; if FALSE, the data is ANSI string. |
Reserved | Reserved and unused. |
Data | The name of the DBG file. |
In addition to IMAGE_DEBUG_MISC structure, the executable whose debug information is stored in DBG file also contains IMAGE_FILE_DEBUG_STRIPPED flag set in Characteristics field of the executable’s file header (IMAGE_FILE_HEADER.Characteristics).
Matching debug information - the details
Now we know enough theory to proceed to the details of debug information matching. Lets recall that the executable and the debug information file are considered matched only when they both contain the same unique identifier. So, what kinds of unique identifiers are used?
In the case of PDB 2.0 debug information, the unique identifier consists of two values – signature and age, which are stored in CV_INFO_PDB20 structure in the executable (CV_INFO_PDB20.Signature and CV_INFO_PDB20.Age fields) and in a special data stream in the PDB file. When the debugger checks whether a PDB file matches the executable, it reads the signature and age from the PDB file and compares them with the values stored in CV_INFO_PDB20 structure in the executable. If the values are not the same, the PDB file is considered unmatched, and the debugger refuses to load it.
PDB 7.0 debug information also uses signature and age to check for the match (CV_INFO_PDB70.Signature and CV_INFO_PDB700.Age, respectively). But the fact that CV_INFO_PDB70.Signature is a GUID makes the identifier much more unique than in the case of timestamp-based PDB 2.0 signature.
DBG files use a similar approach, where the role of the unique identifier is assigned to the executable’s timestamp (which is stored in the executable’s file header, IMAGE_FILE_HEADER.TimeDateStamp). The same timestamp is stored in the header of the DBG file (IMAGE_SEPARATE_DEBUG_HEADER.TimeDateStamp). When a debugger checks whether a DBG file matches the executable, it reads the timestamp from the DBG file and compares it with the timestamp stored in the executable. If timestamps are not equal, the DBG file is considered unmatched. In addition, Visual Studio debuggers also check for presence of IMAGE_FILE_DEBUG_STRIPPED flag in Characteristics field of the executable’s file header (IMAGE_FILE_HEADER.Characteristics), and refuse to load the DBG file if the flag is not set (actually, they check the flag first and do not look for DBG file at all if the flag is not set). WinDbg debugger does not check this flag in the default configuration, and uses only timestamp to verify that the DBG file is matched.
When matching is not desirable
While it is usually good that debuggers verify that debug information files and executables are matched (thus saving us from loading a wrong debug information file by mistake), there are situations when such a pedantic approach is not desirable. Consider the situation when our application crashes on the customer’s system, and we have to debug a crash dump. Suddenly we realize that something failed in our established CM process, and debug information file for the application is lost. What to do? Can we rebuild the application and produce a new debug information file? Or can we use debug information file from an older build? Of course, we understand that debug information from an older or newer build may not be 100 percent accurate, but it is still better than nothing. We try to load the debug information file and notice that the debugger refuses to load it because…it is unmatched! Yes, the unique identifier, which is used to check for matching, changes with every build. What can we do? How can we ask the debugger to load an unmatched debug information file?
It turns out that with Visual Studio debugger we cannot do much. No Visual Studio 6.0, no Visual Studio.NET debugger allows to load unmatched debug information files. Fortunately, the situation is much better with WinDbg. While by default it also does not allow to load unmatched debug information, .symopt debugger command can change the default behaviour. After we have issued “.symopt+0x40” command, the debugger will happily accept and load unmatched PDB and DBG files.
ChkMatch tool
I like WinDbg, but I also like Visual Studio debuggers. And sometimes I need them to load unmatched debug information files. While VS debuggers themselves do not offer a workaround, it is possible to make an executable and debug information file match by reading the unique identifier from the executable and writing it to the proper place in the debug information file. This is exactly what ChkMatch tool does.
When started as “chkmatch –m myapp.exe myapp.pdb”, the tool reads the identifier from the executable and writes it to the proper place in the debug information file, thus enforcing the match and allowing VS debugger to load the previously unmatched file. (At the time being, only signature mismatch can be handled for PDB files; age mismatch cannot be handled yet – this is a subject for future research).
Another option allows to check whether an executable and debug information file are matched: “chkmatch –c myapp.exe myapp.pdb”. This is accomplished by reading the identifier from the debug information file and comparing it with the identifier stored in the executable.
Contact
Have questions or comments? Free free to contact Oleg Starodumov at firstname@debuginfo.com.