Windows 内存管理知识总结

工作中遇到了 32位 windows 程序虚拟内存不足的问题,于是对 Windows 内存相关知识做了调研探索。文内容总结自《Windows Internal》和 MSDN 文档,具体链接会注在文章最后,供大家参考


在了解 Windows 内存知识前,需要弄清「虚拟内存」和「物理内存」的关系





  • 进程分配的都是虚拟内存,不能直接使用物理内存
  • 虚拟内存地址通过 MMU (Mememory Management Unit),会被翻译为物理地址,找到对应的物理页
  • 分配连续的虚拟内存,对应的物理内存不一定是连续的,好处是在进程层面不用过多考虑内存碎片化的影响
  • 页命中,物理内存中存在对应的物理页
  • 缺页(paging fault)异常,物理内存中没有找到对应的物理页
  • 交换(swapping)或页面调度(paging),将当前没用的物理页(牺牲页)写入磁盘,将需要用的虚拟内存页映射到物理内存页


比如图中所示,物理内存一共只有4页。开始时,「进程A」分配了 4 页内存,此时物理内存已经占满。此时如果「进程B」又分配了 2 页内存「VP3」「VP4」,这时会触发缺页异常,操作系统会根据缓存策略将短时间用不到的内存数据交换到磁盘,比如「进程A」的 「VP3」「VP4」被换出到磁盘。然后,「进程B」的「VP3」「VP4」才能被使用。



Windows中虚拟内存的两种状态 reserved & comitted


Reserving and Committing Pages

Pages in a process virtual address space are free, reserved, committed, or shareable.
Committed and shareable pages are pages that, when accessed, ultimately translate to valid pages in physical memory.

提交的页也称为私有页(private page)。
Committed pages are also referred to as private pages.
This reflects the fact that committed pages cannot be shared with other processes, whereas shareable pages can be (but, of course, might be in use by only one process).

Private pages are allocated through the Windows VirtualAlloc, VirtualAllocEx, and VirtualAllocExNuma functions.
These functions allow a thread to reserve address space and then commit portions of the reserved space.
The intermediate "reserved" state allows the thread to set aside a range of contiguous virtual addresses for possible future use (such as an array), while consuming negligible system resources, and then commit portions of the reserved space as needed as the application runs.
Or, if the size requirements are known in advance, a thread can reserve and commit in the same function call.
In either case, the resulting committed pages can then be accessed by the thread.
Attempting to access free or reserved memory results in an exception because the page isn't mapped to any storage that can resolve the reference.

If committed (private) pages have never been accessed before, they are created at the time of first access as zero-initialized pages (or demand zero).
Private committed pages may later be automatically written to the paging file by the operating system if required by demand for physical memory.
"Private" refers to the fact that these pages are normally inaccessible to any other process.

  • reserved 预留,表示预先分配的虚拟内存,但还没有映射到物理内存,在使用时需要先命中物理页
  • commited 已经提交,表示虚拟内存已经映射到了物理内存或已经缓存在磁盘
  • commited pages 也是 private pages,表示不能与其他进程共享

为什么虚拟内存需要 reserved,而不是直接使用 commited?

这是我在 stackoverflow 上找到的我比较认可的回答:

Why would I want to reserve? Why not just get committed memory? There are several reasons I have in mind:

  1. Some application needs a specific address range, say from 0x400000 to 0x600000, but does not need the memory for storing anything. It is used to trap memory access. E.g., if some code accesses such area, it will be caught. (Useful for some reason.)
  2. Some thread needs to store progressively expanding data. And the data needs to be in one contiguous chunk of memory. It is preferred not to commit large physical memory at one go because it is not needed and would be such a waste. The memory can be utilized by some other threads first. The physical memory is committed only on demand.


  1. 某些应用需要特定的地址空间用于捕获内存捕获监测,一但某些代码开辟了这块空间,就捕获这个事件
  2. 预留连续的空间,后续再使用,比如开辟一条线程时,会先预留 1MB 的空间,而不会直接提交到物理内存


Why would I want to reserve?


Why not just get committed memory?


There are several reasons I have in mind: Some application needs a specific address range, say from 0x400000 to 0x600000, but does not need the memory for storing anything.


It is used to trap memory access.


E.g., if some code accesses such area, it will be caught.


(Useful for some reason.)


Some thread needs to store progressively expanding data.


And the data needs to be in one contiguous chunk of memory.


It is preferred not to commit large physical memory at one go because it is not needed and would be such a waste.


The memory can be utilized by some other threads first.


The physical memory is committed only on demand.

关于「32位程序」和「32位CPU」的 Q&A

Q1. 为什么 8G 甚至 16G 物理内存的笔记本电脑跑 winp32 程序还是会 OOM?


下面做个比喻,解释 32位程序虚拟内存和物理内存的关系是什么。


  • 学校盖的大,能招的学生就多,程序能分配的虚拟内存空间就大。

  • 如果学校盖的小,宿舍盖的大,那么宿舍一定会有空位,因为学校就算招满人了,宿舍也住不满(代表了单进程,虚拟内存小于物理内存的情况,不考虑使用 PAE 技术的情况)

  • 如果学校盖的大,宿舍盖的小,宿舍就会住满。那么就需要设定策略,让更需要住宿的同学住进宿舍,不太需要住宿的同学就要搬出宿舍,给需要的同学腾出位置(代表了虚拟内存大于物理内存的情况下,物理内存打满后,需要将不需要的内存数据写入磁盘)

Q2. 为什么32位程序瓶颈是在虚拟内存上?

A: 32位进程,虚拟内存空间是 4GB,Windows系统中,内核空间占用 2GB,用户空间只有 2GB

32位程序\操作系统的指针只能表示 2^32 = 4GB 范围内的地址,所以我们开辟的虚拟内存也只能在 4GB 以内。

一个进程的内存空间布局是什么样子,为什么我们可用的空间只有 2GB 会在介绍 Windows 进程内存布局一节中回答。

Q3. 32位CPU和32位操作系统的关系是什么?


  • 32位CPU 是不能使用 64 位操作系统的,因为 64位操作系统一条指令是 64位,32位 CPU 无法处理

  • 反过来,64位CPU 可以运行 32位操作系统,但无法发挥出 CPU 的全部能力,有点「大马拉小车」的感觉

Q4. 32位CPU只能使用 4GB 的物理内存么?CPU的寻址能力和CPU的位宽相关么?


  • 寻址范围和地址线宽度有关,和 CPU 位宽无关,Intel 32位CPU 早在1995年就支持36位地址线了,也就是 32位CPU 能使用 64GB 的物理内存

  • 为什么能访问更大的内存地址?可以详细了解 PAE(Physical Address Extension) 技术

  • PAE 技术是为了让多个 32位进程累计使用内存的情况下,能使用更多的物理内存(超过4GB)

Windows 内存布局(Windows Process Virtual Space)

用户地址空间(User Address Space Layout)


下图出自《Windows Internals 6》



上图描述了 x86(32位)进程的内存布局:

  • 分为了 3GB 的用户空间,和 1GB 的内核空间,但这并不是 Win32 程序的正常布局,而是开启了大地址空间模式的程序(LARGE_ADDRESS_AWARE)
  • 正常的 Win32 程序用户空间只有 2GB,内核空间也占用 2GB
  • 用户空间占用低地址(00000000 ~ 7FFFEFFF),内核空间占用高地址(7FFF000 ~ FFFFFFFF)
  • 用户空间存放了「代码」「全局变量」「线程栈」「DLL」等
  • 内核空间图中详细标明了包含什么,本文不再赘述,感兴趣的同学可以自行了解


ASLR 是如何保护 Linux 系统免受缓冲区溢出攻击的 - 知乎 (

ASLR - 简书 (


  • 最低地址存放了 .exe
  • 然后是 .dll
  • 然后是 Heap,Heap 中存放的是通过 HeapAlloc 等 API 分配的堆内存
  • 然后是 Thread Stack,存放的是线程栈内存,每开一条新线程就会对应开辟一块栈内存

图中还提到了 ASLR,这是什么,后文会具体介绍。




图中描述的用户空间非常「碎片化」,这可能也和 ASLR 相关。如果你要分析应用的虚拟内存布局,不要完全以图中的布局为准,要以自己程序真正运行的情况为准。




User Address Space Layout


Just as address space in the kernel is dynamic, the user address space is also built dynamically-the addresses of the thread stacks, process heaps, and loaded images (such as DLLs and an application's executable) are dynamically computed (if the application and its images support it) through a mechanism known as Address Space Layout Randomization, or ASLR.



At the operating system level, user address space is divided into a few well-defined regions of memory, shown in Figure 10-14.


The executable and DLLs themselves are present as memory mapped image files, followed by the heap(s) of the process and the stack(s) of its thread(s).


Apart from these regions (and some reserved system structures such as the TEBs and PEB), all other memory allocations are run-time dependent and generated.


ASLR is involved with the location of all these run-timedependent regions and, combined with DEP, provides a mechanism for making remote exploitation of a system through memory manipulation harder to achieve.


Since Windows code and data are placed at dynamic locations, an attacker cannot typically hardcode a meaningful offset into either a program or a system-supplied DLL.


  1. 线程栈、进程堆、已装载的镜像文件(exe、dll)的地址是动态计算获得的

  2. 其中 exe dll 需要应用支持 ASLR(随机选择地址)

DEP(数据执行保护)怎么设置-百度经验 (

数据执行保护_百度百科 (

ASLR 是什么?

下面具体看看,到底什么是 ASLR

Windows XP和Windows 7的结果之间的差异是由Windows Vista地址空间负载随机化(ASLR)引入的地址空间布局更随机的性质引起的,这导致了一些碎片。

The difference between the Windows XP result and the Windows 7 result is caused by the more random nature of address space layout introduced in Windows Vista Address Space Load Randomization (ASLR), that leads to some fragmentation.


Randomization of DLL loading, thread stack and heap placement, helps defend against malware code injection.


As you can see from this VMMap output, there's 357MB of address space still available, but the largest free block is only 128K in size, which is smaller than the 1MB required for a 32-bit stack:


  • ASLR 全称是 Address Space Layout Randomization,可以翻译为随机地址空间
  • 目的是为了防御恶意软件做注入攻击,因为固定地址更容易被攻击者破译
  • 这么做随之而来的缺点是更容易造成「内存碎片化」

如何关闭 ASLR?






在 Windows 中,Memory Manager 会为每个线程提供两个栈,用户栈(user stack)和内核栈(kernel stack)





User Stacks

当创建一个线程时,内存管理器自动分配预定数量的虚拟内存,默认为1 MB。这个数量可以在调用CreateThread或CreateRemoteThread函数中配置,或者在使用Microsoft C/ c++编译器中的/STACK:reserve开关编译应用程序时配置,它将在image.header中存储信息。

When a thread is created, the memory manager automatically reserves a predetermined amount of virtual memory, which by default is 1 MB. This amount can be configured in the call to the CreateThread or CreateRemoteThread function or when compiling the application by using the /STACK:reserve switch in the Microsoft C/C++ compiler, which will store the information in the image.header.

尽管预留了1 MB内存,但只提交栈的第一页(除非图像的PE头另有指定),以及一个保护页。

Although 1 MB is reserved, only the first page of the stack will be committed (unless the PE header of the image specifies otherwise), along with a guard page.


When a thread's stack grows large enough to touch the guard page, an exception will occur, causing an attempt to allocate another guard.

通过这种机制,用户栈不会立即消耗所有1 MB的已提交内存,而是随着需求增长。

Through this mechanism, a user stack doesn't immediately consume all 1 MB of committedmemory but instead grows with demand.


(However, it will never shrink back.)

  • 线程创建时,默认预留 1MB 虚拟内存

  • 通过编译器指定参数 /STACK:reverse 可以将预留内存大小写入 PE Header 中(修改 stack size)

  • 尽管预留了 1 MB 虚拟内存,但只有 first page 虚拟内存会被提交(真正分配)


EXPERIMENT: Creating the Maximum Number of Threads

每个32位进程只有2 GB的用户地址空间可用,为每个线程堆栈保留的相对较大的内存允许轻松计算一个进程可以支持的最大线程数:略小于2048,总共接近2 GB的内存(除非使用increaseuserva BCD选项,并且图像是大地址空间感知的)。

With only 2 GB of user address space available to each 32-bit process, the relatively large memory that is reserved for each thread's stack allows for an easy calculation of the maximum number of threads that a process can support: a little less than 2,048, for a total of nearly 2 GB of memory (unless the increaseuserva BCD option is used and the image is large address space aware).

通过强制每个新线程使用尽可能小的堆栈预留大小(64 KB),这个限制可以增长到大约30 400个线程,您可以使用Sysinternals的TestLimit实用程序自己测试。

By forcing each new thread to use the smallest possible stack reservation size, 64 KB, the limit can grow to about 30,400 threads, which you can test for yourself by using the TestLimit utility from Sysinternals.


Here is some sample output:



如果您尝试在64位Windows安装上进行此实验(有8 TB可用的用户地址空间),您可能会看到可能创建数十万个线程(只要有足够的内存)。

If you attempt this experiment on a 64-bit Windows installation (with 8 TB of user address space available), you would expect to see potentially hundreds of thousands of threads created (as long as sufficient memory were available).

然而,有趣的是,TestLimit实际上会比32位机器上创建更少的线程,这与TestLimit .exe是32位应用程序,因此运行在Wow64环境下有关。

Interestingly, however, TestLimit will actually create fewer threads than on a 32-bit machine, which has to do with the fact that Testlimit.exe is a 32-bit application and thus runs under the Wow64 environment.


(See Chapter 3 in Part 1 for more information on Wow64.)

因此,每个线程不仅有它的32位Wow64栈,而且还有它的64位栈,因此消耗超过两倍的内存,同时仍然保持只有2 GB的地址空间。

Each thread will therefore have not only its 32-bit Wow64 stack but also its 64-bit stack, thus consuming more than twice the memory, while still keeping only 2 GB of address space.


To properly test the thread-creation limit on 64-bit Windows, use the Testlimit64.exe binary instead.



Note that you will need to terminate TestLimit with Process Explorer or Task Manager-using Ctrl+C to break the application will not function because this operation itself creates a new thread, which will not be possible once memory is exhausted.


  • 64 位系统跑 32 位程序,最大线程数量比 32 位机器跑 32 程序要少
  • 原因是 64 位机器跑 32 位程序,会额外创建 64 位的栈,同样只有 2GB 虚拟内存空间,但每个线程重复消耗了两份内存
  • 实测,64 位栈占用 256 kb 内存,每个线程栈合计占用 1.25 MB

总结,理论上在 64位系统上跑 32位程序,会有额外的开销,本来 32 位程序虚拟内存只有 2GB 可用,运行在 64 位系统上时会更快的暴露这个短板。想了解更多的同学可以去查阅一下 WoW64(windows on windows64)相关内容

分析 Windows 虚拟内存的利器,VMMap


官方为我们提供了一款工具 vmmap




  • Total::总的分配过的虚拟内存

  • Free:可用的虚拟内存

  • Image:exe dll 占用的虚拟内存

  • Private data:进程私有的堆占用的内存

  • Stack:线程栈占用的虚拟内存



我们也可以打开 vmmap 点 help 进行查看每个区域的具体含义




除了 GUI,vmmap 也提供了 CLI 供我们在脚本中使用

如何解决 Win32 程序的虚拟内存瓶颈?


将 32位程序升级为 64位

虚拟内存在 64位程序上将不会成为瓶颈,但将现有程序改为 64位并不是一件容易的事,具体需要做什么就不再本文赘述了。


  • 减小线程栈分配空间,在上文得出结论,默认情况下,32位程序跑在64位系统上,每条线程需要开辟 1.25MB内存,那我们可以适当减小栈大小。如果是 java 程序可以通过JVM启动参数 Xss 来减少栈空间
  • 减少大的预留的堆空间,比如 java 程序在 JVM 启动的时候就会预留分配 XmX 大小的空间,如果是 1GB,就占用了一半的空间。


默认情况下,32位Windows上进程的虚拟大小是2 GB。

By default, the virtual size of a process on 32-bit Windows is 2 GB.

如果映像被特别标记为大地址空间感知,并且系统通过一个特殊选项启动(本•章后面会介绍),那么一个32位进程在32位Windows上可以增长到3 GB,在64位Windows上可以增长到4 GB。

If the image is marked specifically as large address space aware, and the system is booted with a special option (described later in this•chapter), a 32-bit process can grow to be 3 GB on 32-bit Windows and to 4 GB on 64-bit Windows.

在64位Windows上,进程虚拟地址空间大小在IA64系统上为7 152 GB,在x64系统上为8 192 GB。

The process virtual address space size on 64-bit Windows is 7,152 GB on IA64 systems and 8,192 GB on x64 systems.


(This value could be increased in future releases.)


  • 默认情况,进程虚拟内存大小 2GB

  • 如果 exe 做大地址空间标记且系统启动使用了特殊参数,可以将进程虚拟内存大小升至 3GB


  1. 在编译 exe 的时候需要指定 Linker 参数 LARGE_ADDRESS_AWARE 为 YES
  2. 需要用管理员模式打开 cmd,然后输入命令 bcdedit /set increaseuserva 3072,3072 表示 3GB


  1. 确认 windows 系统是否通过 bcdedit 设置了参数,用管理员模式打开 cmd,输入 bcdedit,看列表中是否有 increaseuserva 3072,如果有就进行下一步
  2. 使用 dumpbin /headers 查看 exe 是否开启了大地址空间模式




《Windows Internal 6》《Windows Internal 7》 《程序员的自我修养》

Windows 内存管理知识总结 - 知乎 (

Pushing the Limits of Windows: Virtual Memory | Microsoft Learn

突破 Windows 的极限:分页和非分页池 - Microsoft 社区中心

