rebase

/BASE (Base Address)

https://msdn.microsoft.com/en-us/library/f7f5138s.aspx

 

Need for Rebasing a DLL(good)

https://www.codeproject.com/Articles/9426/Need-for-Rebasing-a-DLL

Introduction

This article explains the need for Rebasing a DLL to improve performance during application startup using more than one DLL. It covers how to change compiler settings to rebase a DLL and also the use of /FIXED switch in rebasing.

Every executable and DLL module has a preferred base address, which identifies the ideal memory address where the module should get mapped into a process' address space. When you build an executable module, the linker sets the module's preferred base address to 0x00400000. For a DLL module, the linker sets a preferred base address of 0x10000000. Using Visual Studio's DumpBin utility (with the /headers switch), you can see an image's preferred base address.

Go to command line and use the command Dumpbin /headers exename.exe.

Or use Visual Studio's Depends (Dependency Walker) utility and click an EXE, you will get the information of all DLLs and base addresses where they are loaded.

When this executable module is invoked, the operating system loader creates a virtual address for the new process. Then the loader maps the executable module at memory address 0x00400000 and the DLL module at 0x10000000. Why is this preferred base address so important? Let's look at this code:

Using the code

Let's look at a simple piece of code. I am initializing an integer i in a function.

int i;
void Func();
{
    int i = 5; // This is the important line.
}

When the compiler processes the Func function, the compiler and linker produce machine code that looks something like this:

MOV   [0x10014540], 5

In other words, the compiler and linker have created machine code that is actually hard-coded in the address of the "i" variable, i.e., 0x10014540. This memory address is absolutely correct as long as the DLL does in fact load at its preferred base address.

OK, now let's say that you're designing an application that requires two DLLs. By default, the linker sets the .exemodule's preferred base address to 0x00400000 and the linker sets the preferred base address for both DLLs to 0x10000000. If you attempt to run the .exe, the loader creates the virtual address space and maps the .exemodule at the 0x00400000 memory address. Then the loader maps the first DLL to the 0x10000000 memory address. But now, when the loader attempts to map the second DLL into the process' address space, it can't possibly map it at the module's preferred base address. It must relocate the DLL module, placing it somewhere else.

Below are the dependencies for a test EXE using DLL1 and DLL2 without rebasing. As seen, both DLLs have the same base address and only one will be loaded at that address and the other needs to be reallocated.

Relocating an executable (or DLL) module is an absolutely horrible process, and you should take measures to avoid it. Let's see why. Suppose that the loader relocates the second DLL to address 0x20000000. In that case, the code that changes the "i" variable to 5 should be:

MOV   [0x20014540], 5

But the code in the file's image looks like this:

MOV   [0x10014540], 5

If the code from the file's image without changing the address is allowed to execute, some 4-byte value in the first DLL module will be overwritten with the value 5. This can't possibly be allowed. The loader must somehow fix this code. When the linker builds your module, it embeds a relocation section in the resulting file. If the loader can map a module at its preferred base address, the module's relocation section is never accessed by the system. This is certainly what we want—you never want the relocation section to be used because of the below reasons.

If the module cannot be mapped at its preferred base address, the loader opens the module's relocation section and iterates though all the entries. For each entry found, the loader goes to the page of storage that contains the machine code instruction to be modified. It then grabs the memory address that the machine instruction is currently using and adds to the address the difference between the module's preferred base address and the address where the module actually got mapped.

So, in the example above, the second DLL was mapped at 0x20000000, but its preferred base address is 0x10000000. This yields a difference of 0x10000000, which is then added to the address in the machine code instruction, giving us this:

MOV   [0x20014540], 5

To avoid this, instead change the settings while compilation so as to give different base addresses during compilation itself.

Figure below shows how to achieve it:

Below are the dependencies for a test EXE using DLL1 and DLL2 with rebasing. As seen, both DLLs have different base addresses (DLL1: 0x10000000 and DLL2: 0x20000000) and will be loaded properly. If for some reason, it cannot be loaded at the other address specified, then it has to reallocate the DLL and the above process is carried.

Now this code in the second DLL will reference its "i" variable correctly.

There are two major drawbacks when a module cannot load at its preferred base address:

  • The loader has to iterate through the relocation section and modify a lot of the module's code. This produces a major performance hit and can really hurt an application's initialization time.
  • As the loader writes to the module's code pages, the system's copy-on-write mechanism forces these pages to be backed by the system's paging file.

The second point above is truly bad. It means that the module's code pages can no longer be discarded and reloaded from the module's file image on disk. Instead, the pages are swapped to and from the system's paging file as necessary. This hurts performance too. But wait, it gets worse. Since the paging file backs all of the module's code pages, the system has less storage available for all processes running in the system.

By the way, you can create an executable or DLL module that doesn't have a relocation section in it. You do this by passing the /FIXED switch to the linker when you build the module. Using this switch makes the module smaller in bytes but it means that the module cannot be relocated. If the module cannot load at its preferred base address, it cannot load at all. If the loader must relocate a module but no relocation section exists for the module, the loader kills the entire process and displays an "Abnormal Process Termination" message to the user.

In this case, I have made base address of both DLLs same, i.e., 2000000, and have a fixed switch so only one DLL will be loaded and the other cannot be at that location, and you get an error as shown:

Points of Interest

In this particular example, I have tried to cover all aspects, i.e., without rebasing, how to rebase, and how to rebase using /FIXED switch, what are the needs of rebasing, and drawbacks of rebasing using /FIXED. Suggestions for improvement are most welcome.

This was my first article, hope all of you liked it.

Acknowledgement and References

I would like to acknowledge author Mr. Jeffery Richter and his book on Windows OS, which is one of the best books to know about the Windows operating system internals. Parts of this article is taken from the book and examples were added to simplify things.

 

 

Rebasing Win32 DLLs

http://www.drdobbs.com/rebasing-win32-dlls/184416272?pgno=1

In this article, I discuss why “basing” a DLL is desirable and what it involves. Then I present a post-link utility, called Libase (for “library base”) to automate the procedure. Libase differs from the Platform SDK utility Rebase in that it chooses the new base address for the DLL based on a hash of the filename, instead of asking you to provide a base address explicitly.

 

Base Addresses and Rebasing

 

  Every Win32 application loads in a private memory address space. The operating system makes it appear that each process has a linear address range that starts from zero. When one process reads from memory address 0x12345, it reads from an entirely different physical memory address than another process that reads from the same address. The operating system keeps the various logical memory spaces apart by implicitly using the segment registers (CS,DS, etc.) as a selector into a table that maps logical memory addresses to physical memory addresses. This is the way that “protected mode” works on the Intel processors.

  In a process, the application (.exe) and all loadable components (mostly DLLs, but also “in-process” ActiveX servers — which in fact are DLLs) share the logical address space. If one DLL reads from address 0x12345, it reads from the same memory location as another DLL in the same process.

  Applications and DLLs have functions and variables. Code in an application calls a function by “jumping” to the address at which the function starts. The starting address of the function was determined by the linker when it built the executable. The linker cannot just choose any address; it has to take into account areas of memory that the operating system has reserved. A function may not start at address 0x00000000, for example, and in Windows 9x it may also not exceed address 0x80000000.

  This raises a problem for DLLs. When you link the DLL, the linker cannot know what linear address the DLL will have to actually load at for any given application (or even application invocation). That’s the “Dynamic Link” part of the DLL acronym. It is entirely possible, and even very likely these days, that an application loads two DLLs that both are “based” by the linker to load at the same starting address. (The base address set by the linker is the “preferred load address.”)

  If your application needs to load a DLL whose preferred load address conflicts with memory that’s already in use (such as by a previously-loaded DLL that had the same preferred load address), the operating system “rebases” the conflicting DLL by loading it at a different address that does not overlap and then by adjusting all addresses. The physical format of a.dll file includes relocation information that points to, for example, the target addresses ofCALL and JMP instructions, and addresses that reference global/static variables (such as literal strings). All these addresses have to get revised if the operating system cannot load the DLL at its preferred load address.

  This procedure, done at load time, is time consuming, of course, but it also increases the memory footprint that the DLL takes. For every loaded module, Windows creates a “section object,” a memory mapped file for the DLL. Whenever your application accesses memory that was swapped out, Windows reloads it from the section object. When the executable module was loaded at a different base address than its preferred base address, the image of the module in memory no longer matches the image of the module on disk, and, therefore, those portions of the module that contain relocations are swapped out to the system pagefile. In summary, if a module loads at its preferred base address, it is not copied to the pagefile; if a module is rebased, nearly all of the code section and some of the data section of the DLL is copied to the pagefile at load time.

  Basing a DLL means to explicitly select a preferred load address when you link it — hopefully selecting one that will cause it to avoid memory locations used by the application or other DLLs. Two reasons why this is desirable are, as discussed above, to make the DLLs load faster and to reduce pagefile usage. A third reason, brought forward in John Robbins’ bookDebugging Applications is to be able to determine the module and the source code line (with the help of a .map file) when given a crash address. (If the DLL had to be rebased when it was loaded, the DLL’s .map file addresses will no longer reflect reality for that invocation of the DLL.)

  By the way, on the “faster load time” issue, I should mention that when an executable module is unloaded, Windows puts its pages on a “standby” list, a kind of cache from where the module’s pages can be retrieved very efficiently when it is loaded again. So if you load a DLL for a second time, and its pages are still in the standby list, it will load a lot quicker than the first time.

  Slow load times are most irritating for applications that you start frequently, such as compilers, and Windows’ standby list already deals adequately with this category. Still, some utilities should always start in a snap, even when run only occasionally. For example, when a screen saver launched while a colleague and I were intensively watching a simulation developing, we were both disturbed. But had it not taken three to five seconds to load (in the heat of the moment, we forgot to time it), I would not have disabled it immediately. If you produce screen savers, it is worth taking care of such trivial matters. There are many other examples of applications that you will want to launch quickly from the first time on, from right-click context menu extensions for Windows Explorer to optional macro/script engines in applications.

  Base address conflicts are not the only cause of slow DLL load times, or even the most important one. Ruediger Asche (see the bibliography) gives a detailed report of load times for a set of DLLs before and after base address conflicts were resolved. However, resolving base address conflicts is so easy that there is hardly any reason not to do it.

To end this overview, I checked the base addresses of executable modules built by the compilers that I have:

 

  • Applications (.exe files) start at 0x00400000 for all compilers that I tested. These executable images are loaded first in a process, and they will never need to be relocated. (In fact, they sometimes do not even contain a .RELOC section — the part of a .exe or .dllthat contains detailed relocation information for rebasing.)
  • Microsoft Visual C/C++ places DLLs at address 0x10000000; this is the address that you will encounter most.
  • Microsoft Visual Basic places DLLs at address 0x11000000.
  • Borland C++, Watcom C/C++, and LCC-Win32 place DLLs at address 0x00400000, thereby guaranteeing a conflicting base address with the application.

 

Manual Rebasing(手动方式(其实也挺简单的))

 

The address range for an application that is not reserved by any version of Windows is from0x00400000 to 0x80000000. The system DLLs for Windows are currently based in memory from 0x70000000 to 0x78000000 on the Intel processors and from 0x68000000 to0x78000000 on the MIPS processors. Other standard DLLs (for OLE support) are apparently in the range 0x50000000 to 0x5f000000. When selecting base addresses for DLLs, Microsoft suggests that you select them from the top of the allowed address range downwards, in order to avoid conflicts with memory allocated dynamically by the application (which is allocated from the bottom up).

In conclusion, the most suitable address range for DLLs is from 0x60000000 through0x6f000000. Microsoft, seeking portability where it cannot be achieved, proposes to reduce the range further to 0x60000000 through 0x68000000 in order to accommodate both Intel and MIPS processors. (Also note that Microsoft’s upper limit overlaps the reserved range of the MIPS processor.) Microsoft’s proposal continues with a “first letter” scheme for the selection of the base address, which I have summarized in Table 1. In other words, you select a base address for your DLL based on the first letter of the DLL’s name and the addresses in Table 1.

After selecting a load address for a DLL, you have to tell it to the linker. Note again that applications (.exe files) do not need a base adjustment; they are the first executable module that the loader will load, and, therefore, they always load at the address that the linker has fixed them at. The linker options are:

 

  • With Watcom C/C++, add the “OP OFFSET=address” to the linker line (WLINK), where you replace “address” with the desired base address. You can use decimal or hexadecimal notation for this address. (Hexadecimal is in the same format as C/C++ literals, for example, “OP OFFSET=0x62000000”.)
  • With Borland C++, use the “-B:address” option (TLINK32); the value is in hexadecimal.
  • With Microsoft C/C++, use the option “-base:address”; the value is in hexadecimal.
  • Alternatively, you can use a post-link utility. For the Rebase utility, which comes with the Platform SDK, use the “-b address” option; the value is in hexadecimal.

 

Automatic Rebasing: Libase(自动方式)

 

The drawbacks of the manual rebasing scheme are that the table is difficult to memorize, and that choosing a base address only on the first letter is too simplistic. When I tried it on several somewhat larger projects that I take part in, conflicts arose so quickly that a “rolling the dice” scheme produced better results than the “first letter” proposal. Initially, I extended the scheme to take the first two letters into account (with the added rule that, if many filenames start with the same prefix, the “second letter” to select is the first letter in the filename behind that prefix). This worked in the sense that it resolved nearly all of the conflicts, but the procedure became even harder to know by heart, now requiring two tables instead of one. This called for an automatic solution. And while I was at it, why stop at considering only two letters of the filename?

Libase is a little post-link utility that I wrote that chooses a base address of a DLL (considering all letters in the filename) and rebases the DLL to that address. You do not need to add linker flags to use it; instead, Libase must run after the linker has finished. Libase is configurable via a .ini file; by default it uses an address range of 0x60000000 to0x6ff00000 (larger than the one proposed by Microsoft) with a step size of 0x00100000. The chosen range and step size allow for 256 different base addresses (instead of just nine with Microsoft’s proposal). The hash is adapted from the well-known hash function published inCompilers: Principles, Techniques and Tools by Aho, Sethi, and Ullman (page 435) as P. J. Weinberger’s algorithm for computing hash values. The source code for Libase is in libase.c(Listing 1), and libase.ini (Listing 2) contains a sample .ini file to control it.

In its default configuration, Libase disregards the case of characters in a filename. That is, the files mylib.dll and MYLIB.DLL are rebased to the same address. By setting “IgnoreCase” to “0” in the .ini file, Libase uses the case of the filename as stored on disk.

To use Libase, simply run it with the path to a DLL on the command line. Libase can rebase multiple DLLs in one invocation, but unlike the Platform SDK utility Rebase, it does not choose consecutive, non-overlapping addresses; Libase chooses the base address for each DLL from a hash of its filename. One added feature of Libase is that it keeps the addresses to which it has rebased all modules that it has seen in its .ini file. This allows you to check whether a collision has occurred and to which DLLs that collision applies.

The workhorse function of Libase is ReBaseImage(), which is exported by Microsoft’simagehlp.dll. The implementation of Libase is trivial for the remainder (except for the frustration of the SDK documentation for the ReBaseImage() function mismatching the prototype in imagehlp.h and conflicting with a comment in that header file).

Libase does not guarantee that a DLL gets a unique “preferred load” address; a base address collision may still occur; it is just less likely. The default “step size” assumes that no DLL is bigger than 1MB. You will get a warning for a DLL whose size exceeds the step size, because the report in the .ini file is then no longer accurate.

In closing, I would like to mention that to have a DLL load quickly, the first step is to make sure that Windows can locate it quickly. My advice is to keep implicitly loaded DLLs in the same directory as the application that uses them and to use a full path for DLLs that the application loads explicitly.

 

Bibliography

 Ruediger R. Asche. “Rebasing Win32 DLLs: The Whole Story,” MSDN library, September 1995. This article does exhaustive tests on the load-time degradation of DLLs that must be rebased by the operating system at load time.

John Robbins. Debugging Applications (Microsoft Press, 2000), ISBN 0-7356-0886-5.

 

Modify the Base Addresses for a DLL Files Series

https://www.codeproject.com/articles/35829/modify-the-base-addresses-for-a-dll-files-series

  The address range for an application that is not reserved by any version of Windows is from 0x00400000 to 0x80000000. The system DLLs for Windows are currently based in memory from 0x70000000 to 0x78000000 on the Intel processors and from 0x68000000 to 0x78000000 on the MIPS processors. Other standard DLLs (for OLE support) are apparently in the range 0x50000000 to 0x5f000000. When selecting base addresses for DLLs, Microsoft suggests that you select them from the top of the allowed address range downwards, in order to avoid conflicts with memory allocated dynamically by the application (which is allocated from the bottom up).

  In conclusion, the most suitable address range for DLLs is from 0x60000000 through 0x6f000000. Microsoft, proposes to reduce the range further to 0x60000000 through 0x68000000 in order to accommodate with MIPS processors too,

Dll base address ?

http://stackoverflow.com/questions/7395447/dll-base-address

At runtime I use VMMap(是一个tool,能动态查看。也可以用walk dependency查看(不过只能静态查看)) to monitor the virtual address space and it reveals that the ngen'd Dlls are sitting within a consistant range of virtual memory

 

posted @ 2017-01-12 17:07  醉游  阅读(626)  评论(0编辑  收藏  举报