Part 1. Memory protection mechanisms in Windows

A very thorough, and accurate, write-up of the current state of our mitigations as they apply to programmable, extensible apps. -- Microsoft SWIScience team

This section provides an overview of the memory protection mechanisms available on the Windows platform. Most of the discussion in this paper will focus on Windows Vista SP1, but it is important to be aware of the differences in the protection mechanisms available in different version of Windows. The following table provides a summary of these differences:

GS

Stack cookies

The /GS option of the Visual C++ compiler enables run-time detection of stack buffer overflows. If the option is enabled, the compiler stores a random value on the stack between the local variables and return address of a function. This value is known as a stack cookie. If an attacker exploits a buffer overflow to overwrite the return address of a function, they will also overwrite the cookie, changing its value. This is detected in the epilogue of the function and the program aborts before the modified return address is used.

A typical prologue and epilogue of a function protected by /GS is shown below:

; prologue
push ebp
mov ebp, esp
sub esp, 214h
mov eax, ___security_cookie ; random value, initialized at module startup
xor eax, ebp ; XOR it with the current base pointer
mov [ebp+var_4], eax ; store the cookie

...

; epilogue
mov ecx, [ebp+var_4] ; get the cookie from the stack
xor ecx, ebp ; XOR the cookie with the current base pointer
call __security_check_cookie ; check the cookie
leave
retn 0Ch

; __fastcall __security_check_cookie(x)
cmp ecx, ___security_cookie
jnz ___report_gsfailure ; terminate the process
rep retn

#pragma strict_gs_check

The extra prologue and epilogue code can add a significant overhead to small functions. The gs-perf test program in Appendix A shows a worst case slowdown of 42%. To minimize the performance impact of the /GS option, the compiler adds the stack cookie only to functions that contain string buffers or allocate memory on the stack with _alloca.

Since the C language has no native string type, the compiler defines a string buffer as an array of 1 or 2 byte elements with a total size of at least 5 bytes. The GS protection is applied to all functions with arrays that match this description. For example, the following variables will cause the functions containing them to be protected by GS:

char a[5]; // protected, 5 byte array of elements of size 1
short b[3]; // protected, 6 byte array of elements of size 2
struct {
	char a;
} c[5]; // protected, 5 byte array of elements of size 1
struct {
	char a[5];
} d; // protected because the structure contains a string buffer

Functions that don't use _alloca and don't contain variables considered to be string buffers are not protected by GS. For example, the variables below will not trigger the GS heuristic:

char e[4]; // not protected, total size is less than 5 bytes
int f[10]; // not protected, array element size greater than 2
char* g[10]; // not protected, array element size greater than 2
struct {
	char a;
	short b;
} h[5]; // not protected, array element size greater than 2
struct {
	char a1;
	char a2;
	char a3;
	char a4;
	char a5;
} i; // not protected, the structure does not contain a string buffer

Visual Studio 2005 SP1 introduced a new compiler directive that enables more aggressive GS heuristics. If the strict_gs_check pragma is turned on, the compiler adds a GS cookie to all functions that use the address of a local variable. This includes array dereferences, pointer arithmetic and passing the address of a local variables to other functions. This results in a much more complete protection at the expense of runtime performance.

Variable reordering

The main limitation of the GS protection is that it detects buffer overflows only when the function with the overwritten stack cookie returns. If any other overwritten data on the stack is used by the function, the attacker might be able to take control of the execution before the GS cookie is checked.

To prevent the attacker from overwriting local variables or arguments used by the function, the compiler modifies the layout of the stack frame. It reorders the local variables, placing string buffers at higher addresses than all other variables. This ensures that a string buffer overflow cannot overwrite any other local variables. Function arguments that contain pointers or string buffers (called vulnerable arguments in the compiler documentation) are protected by allocating extra space on the stack and copying their values below the local variables. The original argument values located after the return address are not used in the rest of the code.

The following diagram shows the stack frame layout of a vulnerable function with and without GS protection:

vuln.c
void vuln(char* arg)
{
	char buf[100];
	int i;
	strcpy(buf, arg);
	...
}
standard stack frame
buf
i
return address
arg
stack frame with /GS
copy of arg
i
buf
stack cookie
return address
arg

Without GS a buffer overflow of the buf variable will allow the attacker to overwrite i, the return address and the arg argument. Enabling GS adds a stack cookie, moves i out of the way and creates a copy of arg. The original argument can still be overwritten, but it is no longer used by the function. The attacker has no way of taking control of the execution before the cookie check detects the overflow and terminates the program.

SafeSEH

SEH handler validation

The SafeSEH protection mechanism is designed to prevent attackers from taking control of the program execution by overwriting an exception handler record on the stack. If a binary is linked with the /SafeSEH linker option, its header will contain a table of all valid exception handlers within that module. When an exception occurs, the exception dispatcher code in NTDLL.DLL verifies that the exception handler record on the stack points to one of the valid handlers in the table. If the attacker overwrites the exception handler record and points it somewhere else, the exception dispatcher will detect this and terminate the program.

The validation of the exception handler record begins in the RtlDispatchException function. Its first task is to make sure that the exception record is located on the stack of the current thread and is 4-byte aligned. This prevents the attacker from overwriting the Next field of a record and pointing it to a fake record on the heap. The function also verifies that the exception handler address does not point to the stack. This check prevents the attacker from jumping directly to shellcode on the stack.

void RtlDispatchException(...)
{
	if (exception record is not on the stack)
		goto corruption;
	if (handler is on the stack)
		goto corruption;
	if (RtlIsValidHandler(handler, process_flags) == FALSE)
		goto corruption;
	// execute handler
	RtlpExecuteHandlerForException(handler, ...)
	...
}

The exception handler address is validated further by the RtlIsValidHandler function. The pseudocode of this function in Vista SP1 is shown below:

BOOL RtlIsValidHandler(handler)
{
	if (handler is in an image) {
		if (image has the IMAGE_DLLCHARACTERISTICS_NO_SEH flag set)
			return FALSE;
		if (image has a SafeSEH table)
			if (handler found in the table)
				return TRUE;
			else
				return FALSE;
		if (image is a .NET assembly with the ILonly flag set)
			return FALSE;
		// fall through
	}
	if (handler is on a non-executable page) {
		if (ExecuteDispatchEnable bit set in the process flags)
			return TRUE;
		else
			raise ACCESS_VIOLATION; // enforce DEP even if we have no hardware NX
	}
	if (handler is not in an image) {
		if (ImageDispatchEnable bit set in the process flags)
			return TRUE;
		else
			return FALSE; // don't allow handlers outside of images
	}
	// everything else is allowed
	return TRUE;
}

The ExecuteDispatchEnable and ImageDispatchEnable bits are part of the process execution flags in the kernel KPROCESS structure. These two bits control whether the exception dispatcher will call handlers located in non-executable memory or outside of an image. The two bits can be changed at runtime, but by default they are both set for processes with DEP disabled and cleared for processes with DEP enabled.

In processes with DEP enabled there are two types of exception handlers that are considered valid by the exception dispatcher:

  1. handler found in the SafeSEH table of an image without the NO_SEH flag
  2. handler on an executable page in an image without the NO_SEH flag, without a SafeSEH table and without the .NET ILonly flag

In processes with DEP disabled there are three valid cases:

  1. handler found in the SafeSEH table of an image without the NO_SEH flag
  2. handler in an image without the NO_SEH flag, without a SafeSEH table and without the .NET ILonly flag
  3. handler on a non-image page, but not on the stack of the current thread

SEH chain validation

Windows Server 2008 introduced a new SEH protection mechanism that detects exception handler record overwrites by validating the SEH linked list. The idea for this SEH protection was first described in the Uninformed article Preventing the Exploitation of SEH Overwrites by Matt Miller and adopted later by Microsoft. This protection mechanism is enabled by default on Windows Server 2008. It is also available on Vista SP1, but is not turned on by default. It can be enabled by setting the undocumented registry key HKLM\SYSTEM\CurrentControlSet\Control\Session Manager\kernel\DisableExceptionChainValidation to 0.

When this protection mechanism is enabled, the FinalExceptionHandler function in NTDLL.DLL is registered as the first exception handler in all theads. As additional exception handlers are registered, they form a linked list with the last record always pointing to FinalExceptionHandler. The exception dispatcher walks this linked list and verifies that the last record still points to that function. If an attacker overwrites the Next field of an exception handler record, the validation loop will not reach the last record and the SEH chain corruption will be detected.

One potential way to bypass this protection is to point the overwritten Next pointer to a fake SEH record that points to the FinalExceptionHandler function. However, the ASLR implementation in Vista randomizes the address of the function and makes it impossible to for an attacker to terminate the SEH chain unless they have a way to bypass ASLR.

The SEH chain validation is implemented in the RtlDispatchException. The following pseudocode is from Vista SP1:

// Skip the chain validation if the DisableExceptionChainValidation bit is set
if (process_flags & 0x40 == 0) {
	// Skip the validation if there are no SEH records on the linked list
	if (record != 0xFFFFFFFF) {
		// Walk the SEH linked list
		do {
			// The record must be on the stack
			if (record < stack_bottom || record > stack_top)
				goto corruption;
			// The end of the record must be on the stack
			if ((char*)record + sizeof(EXCEPTION_REGISTRATION) > stack_top)
				goto corruption;
			// The record must be 4 byte aligned
			if ((record & 3) != 0)
				goto corruption;
			handler = record->handler;
			// The handler must not be on the stack
			if (handler >= stack_bottom && handler < stack_top)
				goto corruption;
			record = record->next;
		} while (record != 0xFFFFFFFF);
		// End of chain reached
		// Is bit 9 set in the TEB->SameTebFlags field? This bit is set in
		// ntdll!RtlInitializeExceptionChain, which registers
		// FinalExceptionHandler as an SEH handler when a new thread starts.
		if ((TEB->word_at_offset_0xFCA & 0x200) != 0) {
			// The final handler must be ntdll!FinalExceptionHandler
			if (handler != &FinalExceptionHandler)
				goto corruption;
		}
	}
}

SEH chain validation is disabled for executables with MajorLinkerVersion and MinorLinkerVersion in the PE header set to 0x53 and 0x52 respectively, indicating an Armadillo protected binary. This check is performed in the LdrpIsImageSEHValidationCompatible function during process initialization. When a new DLL is loaded, a similar check in LdrpCheckNXCompatibility disables SEH chain validation if the DLL being loaded has that same incompatible linker version.

Heap protection

The standard exploitation method for heap overflows in older versions of Windows is to overwrite the header of a heap chunk and create a fake free block with flink and blink pointers controlled by the attacker. When this free block is allocated or coalesced with other free blocks, the memory allocator will write the value of the flink pointer at the address specified in the blink pointer. This allows the attacker to perform an arbitrary 4-byte write anywhere in memory, which can easily lead to shellcode execution.

The heap protection mechanisms in Windows XP SP2 and Windows Vista are designed to stop this exploitation technique.

Safe unlinking

Starting in Windows XP SP2 the heap allocator implements safe unlinking when removing chunks from the free list. Before using the flink and blink pointers, it verifies that both flink->blink and blink->flink point to the current heap block. This prevents the attacker from pointing flink or blink to arbitrary memory locations and using the unlink operation to do an arbitrary 4-byte write.

Heap metadata cookies and encryption

In addition to the safe unlinking, the allocator in XP SP2 stores a single byte cookie in the header of each heap chunk. This cookie is checked when the chunk is removed from the free list. If the heap chunk header has been overwritten, the cookie will not match and the heap allocator will detect this as heap corruption.

In Windows Vista the cookie is supplemented by heap metadata encryption. All important fields in the heap header are XORed with a random 32-bit value and are decrypted before being used.

The cookies and the metadata encryption are very effective at preventing the attacker from abusing overwritten heap chunk headers or creating fake chunks on the heap.

DEP

Data Execution Prevention (DEP) is a protection mechanism that prevents the execution of code in memory pages marked non-executable. By default, the only executable pages in a Windows process are the ones that contain the text sections of the executable and the loaded DLL files. Enabling DEP prevents the attacker from executing shellcode on the stack, heap or in data sections.

If DEP is enabled and the program attempts to execute code on a non-executable page, an access violation exception will be raised. The program gets a chance to handle this exception, but most programs that expect all memory to be executable will simply crash. If a program needs to execute code on the heap or the stack, it needs to use the VirtualAlloc or VirtualProtect functions to explicitly allocate executable memory or mark existing pages executable.

Hardware support for NX

Even though the Windows memory manager code always keeps track of which pages are supposed to be non-executable, the traditional x86 architecture supports non-executable memory only when segmentation is used to enforce memory protection. Like all other modern operating systems, Windows uses a flat memory model with page-level protection instead of segmentation. The page table entries on x86 have only a single bit that describes the page protection. If the bit is set, the page is writable, otherwise it is read-only. Since there is no bit to control execution, all pages on the system are considered executable by the CPU.

This oversight in the x86 architecture was corrected in CPUs released after 2004 by adding a second protection bit in the page table entries. This bit is known as the NX bit (No eXecute) and using it requires support by the operating system. Windows has been able to take advantage of the NX bit since the release of Windows XP SP2.

If the CPU does not support hardware NX, Windows uses a very limited form of DEP called Software DEP. It is implemented as an extra check in the exception dispatcher which ensures that the SEH handler is located on an executable page. This is the extent of Software DEP. Since all modern CPUs have support for hardware NX and the Software DEP feature is trivially bypassable anyways, we will focus only on the hardware-enforced DEP protection.

DEP policies

Due to the large number of application compatibility problems with DEP, this protection is not enabled by default for all processes on the system. The administrator can choose between four possible DEP policies, which are set in the boot.ini file on Windows XP or in the boot configuration on Vista:

  • OptIn

This is the default setting on Windows XP and Vista. In this mode DEP protection is enabled only for system processes and applications that explicitly opt-in. All other processes get no DEP protection. DEP can be turned off at runtime by the application, or by the loader if an incompatible DLL is loaded.

To opt-in an application on Windows XP, the administrator needs to create an entry in the system application compatibility database and apply the AddProcessParametersFlags compatibility fix as described in the documentation by Microsoft. On Vista all applications that are compiled with the /NXcompat linker option are automatically opted-in.

  • OptOut

All processes are protected by DEP, except for the ones that the administrator adds to an exception list or are listed in the application compatibility database as not compatible with DEP. This is the default setting on Windows Server 2003 and Windows Server 2008. DEP can be turned off at runtime by the application, or by the loader if an incompatible DLL is loaded.

  • AlwaysOn

All processes are protected by DEP, no exceptions. Turning off DEP at runtime is not possible.

  • AlwaysOff

No processes are protected by DEP. Turning on DEP at runtime is not possible.

On 64-bit versions of Windows, DEP is always turned on for 64-bit processes and cannot be disabled. However, Internet Explorer on Vista x64 is still a 32-bit process and is subject to the policies described above.

Enabling or disabling DEP at runtime

The DEP settings for a process are stored in the Flags bitfield of the KPROCESS structure in the kernel. This value can be queried and set with NtQueryInformationProcess and NtSetInformationProcess, information class ProcessExecuteFlags (0x22), or with a kernel debugger. The output below shows the process flags of an Internet Explorer process on Vista SP1:

lkd> !process 0 0 iexplore.exe
PROCESS 83d29470 SessionId: 1 Cid: 0fec Peb: 7ffd9000 ParentCid: 06dc
	DirBase: 1f105440 ObjectTable: 91b69b28 HandleCount: 376.
	Image: iexplore.exe
lkd> dt nt!_KPROCESS 83d29470 -r
	+0x06b Flags : _KEXECUTE_OPTIONS
		+0x000 ExecuteDisable : 0y0
		+0x000 ExecuteEnable : 0y1
		+0x000 DisableThunkEmulation : 0y0
		+0x000 Permanent : 0y0
		+0x000 ExecuteDispatchEnable : 0y1
		+0x000 ImageDispatchEnable : 0y1
		+0x000 DisableExceptionChainValidation : 0y1
		+0x000 Spare : 0y0

Of these flags, only the first four are relevant to DEP. The first flag, ExecuteDisable is set if DEP is enabled. This might seem counterintuitive, but the flag's meaning really is "disable execution from non-executable memory". Conversely, the ExecuteEnable flag is set when DEP is disabled. It should be noted that in OptOut mode both ExecuteEnable and ExecuteDisable are set to 0, but DEP is still enabled. DisableThunkEmulation controls the ATL thunk emulation mode that will be discussed in the next section. Finally, the Permanent flag indicates that the execute options are final and cannot be further changed. This is used to prevent exploits from calling NtSetInformationProcess to disable DEP before jumping to shellcode on the stack. Such an attack was presented by skape and Skywing in Uninformed vol.2. On Vista, the permanent flag is automatically set for all executables linked with the /NXcompat linker option immediately after the loader enables DEP.

Windows XP SP3 and Vista SP1 introduced a new API for querying and setting the DEP policy of a process. The SetProcessDEPPolicy, GetProcessDEPPolicy and GetSystemDEPPolicy functions should be used instead of the undocumented NtQueryInformationProcess and NtSetInformationProcess where they are available.

When a new DLL is loaded into a process that does not have the Permanent flag set, the loader performs a series of checks to determine if the DLL is compatible with DEP. If the DLL is determined to be incompatible, DEP protection is disabled for this process. The checks are performed by the LdrpCheckNXCompatibility function which looks for three types of DLLs that are known to be incompatible with DEP:

  1. DLLs that have secserv.dll as the name in the export directory table, and have 2 sections named .txt and .txt2 . These are DLLs are protected by the SafeDisc copy-protection system which is not compatible with DEP.
  2. DLLs that are listed in the HKEY_LOCAL_MACHINE\SOFTWARE\Microsoft\Windows NT\CurrentVersion\Image File Execution Options\DllNXOptions registry key. This key contains a list of DLLs that are known to be incompatible.
  3. DLLs with a section named .aspack , .pcle or .sforce . These section names indicate packers or software protectors that are known to be incompatible.

If the DLL being loaded was linked with the /NXcompat linker option and has the IMAGE_DLL_CHARACTERISTICS_NX_COMPAT flag set, the checks described above are skipped and DEP is not disabled. This allows vendors of DLLs incompatible with DEP to mark new versions of their software as compatible and get the benefits of DEP protection.

Thunk Emulation

One of the biggest problems with enabling DEP is that some applications will simply not work, since they rely on some code to be executed from writeable memory. It turns out that many applications that behave this way do so because older versions of the ATL library shipped by Microsoft use small code thunks on the heap. Since the ATL libraries are used extensively by third party vendors Microsoft decided to provide a "cheat" to enable ATL code to function in DEP environments. When a program attempts to execute code on a non-executable page, the kernel calls KiEmulateAtlThunk() to check if this is a result of a well known instruction sequence used as an ATL thunk. The function proceeds as follows:

  1. If bytes that the program is trying to execute don't match one of the five known thunks, allow the system to raise the access violation exception.
  2. If an ATL thunk is identified, verify whether it appears to be valid or not. The most important aspect of this is checking that the address being executed is not part of an image, and that the target IP of the branch instruction in the thunk is inside a valid image. If the thunk is invalid, continue with DEP exception as normal.
  3. If the thunk is valid, "manually" emulate the thunk and continue the process as if nothing happened. Since the target of the branch is a valid image, the execution will continue without any danger of executing code on a non-executale page.

The known ATL thunks that get emulated are listed below:

C7 44 24 04 XX XX XX XX mov [esp+4], imm32
E9 YY YY YY YY 			jmp imm32
B9 XX XX XX XX 			mov ecx, imm32
E9 YY YY YY YY 			jmp imm32
BA XX XX XX XX 			mov edx, imm32
B9 YY YY YY YY 			mov ecx, imm32
FF E1 					jmp ecx
B9 XX XX XX XX 			mov ecx, imm32
B8 YY YY YY YY 			mov eax, imm32
FF E0					jmp eax
59 						pop ecx
58 						pop eax
51 						push ecx
FF 60 04 				jmp [eax+4]

ASLR

Address Space Layout Randomization (ASLR) is a security feature that randomizes the addresses where objects are mapped in the virtual address space of a given process. When implemented correctly, ASLR provides a significant hurdle to a would-be attacker, since they will not know the precise location of an interesting address to overwrite. Furthermore, even if an attacker is able to overwrite a useful pointer in memory (such as a saved instruction pointer on the stack), pointing it to something of value will also be difficult.

Although the concept of ASLR is not new, it is a relatively recent addition to the Windows platform. Vista and Windows Server 2008 are the first operating systems in the Windows family to provide ASLR natively. Previous to these releases, there were a number of third party solutions available that provided ASLR functionality to varying degrees. This paper will focus on Vista's native implementation.

Vista's ASLR randomizes the location of images (PE files mapped into memory), heaps, stacks, the PEB and TEBs. The details of the randomization of each of these components are presented in the following sections.

Image randomization

Image positioning randomization is designed to place images at a random location in the virtual address space of each process. Vista's ASLR has the capability to randomly position both executables and DLLs. Note that in order for a library or an executable to be randomly rebased, there are several conditions that need to be met; these will be discussed shortly. Before talking about the specifics, it is worth mentioning that there is a system-wide configuration parameter that determines the behaviour of Vista's image randomization. This parameter can be set in the registry key HKLM\SYSTEM\CurrentControlSet\Control\Session Manager\Memory Management\MoveImages, which by default does not exist. This key has three possible settings:

  • If the value is set to 0, never randomize image bases in memory, always honour the base address specified in the PE header.
  • If set to -1, randomize all relocatable images regardless of whether they have the IMAGE_DLL_CHARACTERISTICS_DYNAMIC_BASE flag or not.
  • If set to any other value, randomize only images that have relocation information and are explicitly marked as compatible with ASLR by setting the IMAGE_DLL_CHARACTERISTICS_DYNAMIC_BASE (0x40) flag in DllCharacteristics field the PE header. This is the default behaviour.

Executable randomization

When a new address is being selected as an image base for an executable, a random delta value is added to or subtracted from the ImageBase value in the executable's PE header. This delta value is calculated by taking a random 8-bit value from the RDTSC counter and multiplying it by 64KB, which is the required image alignment on Windows. The result is that the image is loaded at a random 64KB aligned address within 16 MB of the preferred image base. It is important to note that the delta is never 0, which means that the executable is never loaded at the image base specified in the PE header.

On Vista SP0, there are 255 possible deltas ranging from 0x010000 to 0xFF0000. Due to a bug in way the delta is calculated, the value 0x010000 has a probability of 2/256 while all other values have a probability of 1/256. This is fixed on Vista SP1, where the values range from 0x010000 to 0xFE0000 and each one has an equal probability (1/254) of being selected. The following pseudocode shows the details of the image base calculation in the MiSelectImageBase function:

if ((nt_header->Characteristics & IMAGE_FILE_DLL) == 0)
{

RelocateExe:
	// Get the RDTSC counter and calculate the random offset

#ifdef VISTA_SP0

	// Delta calculation on Vista SP0

	unsigned int Delta = (RDTSC & 0xFF) * 0x10000;

	// We don't allow offset 0, replace it with offset 0x10000

	if (Delta == 0)
		Delta = 0x10000;
	// Delta ranges from 0x010000 to 0xFF0000

#else

	// Delta calculation on Vista SP1

	unsigned int Delta = (((RDTSC >> 4) % 0xFE) + 1) * 0x10000;
	
	// Delta ranges from 0x010000 to 0xFE0000

#endif

	// Validate the original image base and image size

	dwImageSize = image size rounded up to 64KB
	dwImageEnd = dwImageBase + dwImageSize;

	if (dwImageBase >= MmHighestUserAddress || dwImageSize > MmHighestUserAddress || dwImageEnd <= dwImageBase || dwImageEnd > MmHighestUserAddress)
		return 0;

	// When the last reference to an image section goes away, it doesn't get
	// discarded immediately and may be reactivated if the image is loaded
	// again soon after. If that happens, then we apply a further delta to the
	// existing delta (stored in arg0->dwOffset14) and this check ensures that
	// we don't end up double-relocating back to the on-disk base address.

	if (arg0->dwOffset14 + Delta == 0)
		return dwImageBase;

	// To get the new base, we subtract Delta from the old image base. If the
	// old image base is too low and we add Delta instead

	if (dwImageBase > Delta) {
		dwNewBase = dwImageBase - Delta; // subtract Delta
	}
	else {
		dwNewBase = dwImageBase + Delta; // add Delta

		// Validate the new image base

		if (dwNewBase < dwImageBase ||dwNewBase + ImageSize > MmHighestUserAddress) || dwNewBase + ImageSize < dwImageBase + ImageSize)
		return 0;
	}

	...

	// relocate the image to the new base

	return dwNewBase;
}

DLL randomization

The randomization of base addresses for DLLs is slightly different from the one for executables. Since Windows relies on relocations instead of position independent code, a DLL must be loaded at the same address in each process that uses it to allow the physical memory used by the DLL to be shared. To facilitate this behaviour, a global bitmap called _MiImageBitMap is used to represent the address space from 0x50000000 to 0x78000000. The bitmap is 0x2800 bits in length with each bit representing 64KB of memory. As each DLL is loaded, its position is recorded by setting the appropriate bits in the bitmap to mark the memory where the DLL is being mapped. When the same DLL is loaded in another process, its section object is reused and it is mapped at the same virtual addresses.

The following pseudocode from the MiSelectImageBase function shows the details of selecting a random image base for a DLL. It is called both for DLLs that have the IMAGE_DLL_CHARACTERISTICS_DYNAMIC_BASE flag and for DLLs that need to be rebased because their preferred image base is not available:

if ((nt_header->Characteristics & IMAGE_FILE_DLL) == 0)
{
RelocateExe:
	...
}
else
{
	// Relocate DLLs

	usImageSizeIn64kbBlocks = ImageSize / 64KB

	// Find the required number of bits in the bitmap and set them

	dwStartIndex = RtlFindClearBitsAndSet(
		MiImageBitMap, // bitmap
		usImageSizeIn64kbBlocks, // number of bits
		MiImageBias); // where to start looking

	// If we cannot find enough empty bits, relocate the DLL within 16MB of the
	// image base specified in the PE header

	if(dwStartIndex == 0xFFFFFFFF)
		goto RelocateExe;

	// Calculate the new image base

	dwEndIndex = dwStartIndex + usImageSizeIn64kbBlocks;
	dwNewBase = MiImageBitMapHighVa - dwEndIndex * 64KB;

	if (dwNewBase == dwImageBase)
	{
		// If the new image base is the same as the image base in the PE
		// header, we need to repeat the search in the bitmap. Since the bits
		// for the current DLL position are already set, we're guaranteed to
		// get a new position

		dwNewStartIndex = RtlFindClearBitsAndSet(
			MiImageBitMap, // bitmap
			usImageSizeIn64kbBlocks, // number of bits
			dwEndIndex); // hint

		// If the search was successful, clear the bits from the first search

		if (dwNewStartIndex != 0xFFFFFFFF)
			RtlClearBits(MiImageBitMap, dwStartIndex, usImageSizeIn64kbBlocks);

		// Calculate the new image base

		dwEndIndex = dwNewStartIndex + usImageSizeIn64kbBlocks;
		dwNewBase = MiImageBitMapHighVa - dwEndIndex * 64KB;
	}

	...

	return dwNewBase;
}

The MiImageBias value used by MiSelectImageBase is an 8-bit random value initialized with the RDTSC instruction once per boot, in the MiInitializeRelocations function. It is used as a random offset from the beginning of the MiImageBitMap bitmap and specifies the address where the search for the new DLL image base starts from. In effect, this means that the first DLL loaded into the address space will end at 0x78000000 - MiImageBias*64KB (MiImageBitMap starts at MiImageBitMapHighVa and extends towards lower addresses, so it is backwards), and additional DLLs will be placed one after the other following the first one. The MiSelectImageBase function ensures that a DLL is never loaded at the image base specified in the PE header.

Since MiImageBias has only 256 possible values, there are only 256 possible locations for the first DLL loaded on the system (NTDLL.DLL). However, the exact location of the subsequent DLLs depends both on the address of NTDLL.DLL and the order in which the DLLs are loaded. To increase the randomness of the known system DLLs, they are loaded in random order by the SmpRandomizeDllList function in the SMSS system process early in the boot process.

Heap randomization

Part of Microsoft's ASLR strategy involves randomizing where a heap created with the RtlHeapCreate function begins in memory. In the past, a newly created heap (including the default process heap) was created using the NtAllocateVirtualMemory function, which does a linear address space search starting at a point chosen by the caller. The heap begins with a sizeable data structure that has a number of elements that have been abused to exploit heap overflows in the past. Allocating a heap with NtAllocateVirtualMemory doesn't actually guarantee that it will be statically positioned, but in practice it nearly always resided at a predictable location. In Vista, some randomness has been added to the allocation process in order to make things harder for a would-be attacker. This randomization takes place during the early stages of RtlHeapCreate. Essentially, a 5-bit random value is generated and then multiplied by 64K. This value is then used as an offset from the base address returned by the NtAllocateVirtualMemory where the heap data structure will begin. The memory in the block before this offset is subsequently freed. The following pseudocode demonstrates this process.

LPVOID lpAllocationBase = NULL, lpHeapBase = NULL;
DWORD dwRandomSize = (_RtlpHeapGenerateRandomValue64() & 0x1F) << 16;

// Integer overflow check, however this allocation would fail anyway
if(dwRegionSize + dwRandomSize < dwSize)
	dwRandomSize = 0;

dwRegionSize += dwRandomSize;

if(NtAllocateVirtualMemory(NtCurrentProcess(),&lpAllocationBase, 0, &dwRegionSize, MEM_RESERVE, dwProtectionMask) < 0)
	return NULL;

lpHeapBase = lpAllocationBase;

if(dwRandomSize && _RtlpSecMemFreeVirtualMemory(INVALID_HANDLE_VALUE, &lpAllocationBase, &dwRandomSize, MEM_RELEASE) >= 0)
{
	lpHeapBase += (LPBYTE)lpAllocationBase + dwRandomSize;
	dwRegionSize -= dwRandomSize;
}

The idea is that even if NtAllocateVirtualMemory returns a predictable location, this random offset will give the attacker only a 1/32 chance of guessing the correct location of the base heap structure. Additionally, since the memory before the random offset is released, there is a good chance that an invalid guess will result in an immediate access violation. Note that since the random value is multiplied by 64K, offsets for the start of the heap range from 0 to 0x1F0000 in 64K increments (making the maximum offset from the returned base address close to 2MB).

Stack randomization

Vista also adds some entropy to the location of stacks for all threads within a given process. The stack randomization is twofold; the base of the stack is chosen randomly, and an offset into the initial page where the stack starts getting used is also chosen at random, so that targeting precise values on the stack will often not be a viable option. The stack base is chosen by searching through the virtual address space for a suitable size hole, where hole is defined as a consecutive series of pages not mapped into memory. Entropy is added to this process by generating a random 5-bit value x based on the time stamp counter, and then searching through the address space for the x-th hole of the required size. Once a hole has been found, it is passed as the suggested base address to NtAllocateVirtualMemory. After that, the offset within the initial page where the stack starts is adjusted randomly in the PspSetupUserStack function. Again, a strategy is employed whereby a random value is derived from the time stamp counter, this time 9 bits. This 9-bit random value is then multiplied by 4 (guaranteeing DWORD alignment), and subtracted from the stack base. This results in a maximum offset of 7FC bytes, or half a page.