------ 解析因内核栈溢出导致的 “double fault” 蓝屏 ------

——————————————————————————————————————————————————————————————————————————

前一篇指出 tail_recursivef_factorial() 会递归调用自身来计算某个正整数的阶乘。当要计算的目标数值过大，经历多次调用后，

就会耗尽可用的内核栈，引发一次页错误异常，而转移控制到错误处理程序前再次向无效的内存地址压入“陷阱帧”则会让原本可

以处理的异常升级为“double fault”，致使系统崩溃。本篇通过试图计算 685! 来触发“double fault”并进行分析。

将编译好的驱动拷贝到被调试机器上，利用 sc.exe 把它加载至内核空间，源码中（参见上一篇）设置的初始断点被激活从而断入

调试机上的 WinDbg.exe，观察驱动入口点“DriverEntry()”内的局部变量，其中“Number”的值 0x2ad 正是要计算阶乘的数

685：

按下“g”键恢复执行，没多久就让系统崩溃了，这在我们的意料之中，如果没有连接宿主机上的调试器，目标系统就会直接

蓝屏，并且显示“bug check”代码——0000007F：

在 MSDN 网站上搜索该错误码，它对应于“UNEXPECTED_KERNEL_MODE_TRAP”，官方给出的解释如下：

The UNEXPECTED_KERNEL_MODE_TRAP bug check has a value of 0x0000007F.
This bug check indicates that the Intel CPU generated a trap and the kernel failed to catch this trap.

This trap could be a bound trap (a trap the kernel is not permitted to catch) or a double fault
(a fault that occurred while processing an earlier fault, which always results in a system failure).

这种错误是由于 Intel CPU 生成了一个陷阱（trap），而内核未能捕获这个陷阱。
此陷阱可能是一个受困陷阱（内核不允许捕获的陷阱），或一个“double fault”（当处理一个早先的错误时又出现一个错误，
这样就总是会导致系统故障）。

原文描述中的后一种情况（处理错误时又发生另一个错误）就是我们此刻的处境。

UNEXPECTED_KERNEL_MODE_TRAP 有四个参数，你可以从上一张图看到，首个参数值为“0x00000008（陷阱编号）”，

官方对该值的解释为：

0x00000008, or Double Fault, indicates that an exception occurs during a call to the handler for a prior exception.
Typically, the two exceptions are handled serially.
However, there are several exceptions that cannot be handled serially,
and in this situation the processor signals a double fault. There are two common causes of a double fault:

A kernel stack overflow. This overflow occurs when a guard page is hit, and the kernel tries to push a trap frame.
Because there is no stack left, a stack overflow results, causing the double fault.
If you think this overview has occurred, use !thread to determine the stack limits, and then use kb
(Display Stack Backtrace) with a large parameter (for example, kb 100) to display the full stack.

A hardware problem.

“Double Fault”，指明在调用前一个异常处理程序期间，又出现了一个异常。一般而言，两个异常是顺序处理的。
然而，有一些异常无法顺序处理，在这种情况下处理器就会发出一个“double fault”信号。有两种常见情况会导致
“double fault”：

1。一次内核栈溢出。当接触到一个保护页时就会发生此类溢出，然后内核试图向其中压入一个陷阱帧。
因为已经没有剩余栈可用了，导致又一次栈溢出，造成“double fault”。如果你认为发生了这种溢出，利用“!thread”调试器
命令确定栈界限，然后使用“kb”（显示栈回溯）命令，并带着较大的参数（比如 kb 100）来显示完整的栈。

2。硬件问题