如何跟踪高CPU在用户模式应用程序-现场调试!
我想谈谈我们经常处理的一个常见问题。我们的任务通常是在用户模式进程/应用程序中查找哪些函数正在使用CPU。通常,用户会发现一个应用程序使用的CPU比他们预期的要多,这可能会影响整个系统的性能和响应能力。
对于这个练习,我编写了一些人为的示例代码,称为EATCPU。它包含在博客文章的底部。任务管理器的下图显示EATCPU消耗41%的CPU时间。客户或用户可能会告诉您,这种情况“通常”不会发生。问“正常”是什么总是好的。在这种情况下,我们认为正常值为~10%。
最好的情况是对运行在高CPU级别的进程进行实时调试。如果您有幸拥有一个客户/用户,该客户/用户允许您进行远程调试,并且问题在需要时重现,那么您可以采取以下操作。
您需要安装Windows调试工具,并设置符号路径。如果有可能,请获取正在调试的应用程序的符号。我们假设你是支持上述计划的专家。如果是内部编写的,从开发者那里获取符号。如果是来自第三方的,供应商可能愿意为您提供他们产品的符号。
下一件事是附用windbg.exe加进程。
在debuggers目录中,输入TLIST,这将列出您的进程。获取进程id,然后运行WinDBG.EXE –p PROCESSID,或者如果您调试的是eatcup之类的程序,则可以运行WINDBG C:\program\EATCPU.EXE.
附加调试器或在调试器中启动进程后,重现问题。
Microsoft (R) Windows Debugger Version 6.8.0001.0 Copyright (c) Microsoft Corporation. All rights reserved. ***** WARNING: Your debugger is probably out-of-date. ***** Check http://dbg for updates. CommandLine: eatcpu.exe Symbol search path is: srv*C:\symbols*\\symbols\symbols Executable search path is: ModLoad: 00400000 0041a000 eatcpu.exe ModLoad: 779b0000 77b00000 ntdll.dll ModLoad: 76780000 76890000 C:\Windows\syswow64\kernel32.dll ModLoad: 62bb0000 62cd1000 C:\Windows\WinSxS\x86_microsoft.vc80.debugcrt_1fc8b3b9a1e18e3b_8.0.50727.762_none_24c8a196583ff03b\MSVCR80D.dll ModLoad: 75cb0000 75d5a000 C:\Windows\syswow64\msvcrt.dll (1090.164): Break instruction exception - code 80000003 (first chance) eax=00000000 ebx=00000000 ecx=712b0000 edx=00000000 esi=fffffffe edi=77a90094 eip=779c0004 esp=0017faf8 ebp=0017fb28 iopl=0 nv up ei pl zr na pe nc cs=0023 ss=002b ds=002b es=002b fs=0053 gs=002b efl=00000246 ntdll!DbgBreakPoint: 779c0004 cc int 3 0:000> g (1090.11d4): Break instruction exception - code 80000003 (first chance) eax=7efa3000 ebx=00000000 ecx=00000000 edx=77a1d894 esi=00000000 edi=00000000 eip=779c0004 esp=0109ff74 ebp=0109ffa0 iopl=0 nv up ei pl zr na pe nc cs=0023 ss=002b ds=002b es=002b fs=0053 gs=002b efl=00000246 ntdll!DbgBreakPoint: 779c0004 cc int 3 0:007> .sympath SRV*c:\websymbols*http://msdl.microsoft.com/download/symbols Symbol search path is: SRV*c:\websymbols*http://msdl.microsoft.com/download/symbols 0:007> g (1090.17d4): Break instruction exception - code 80000003 (first chance) eax=7efa3000 ebx=00000000 ecx=00000000 edx=77a1d894 esi=00000000 edi=00000000 eip=779c0004 esp=0109ff74 ebp=0109ffa0 iopl=0 nv up ei pl zr na pe nc cs=0023 ss=002b ds=002b es=002b fs=0053 gs=002b efl=00000246 ntdll!DbgBreakPoint: 779c0004 cc int 3
一旦问题开始,就运行!runaway命令。
0:007> !runaway User Mode Time Thread Time 2:c04 0 days 0:01:08.827 ß Note this thread, thread 2:c04 is using more CPU than any other. 7:17d4 0 days 0:00:00.000 ß Note the other threads are using very little if any CPU. 6:1a4c 0 days 0:00:00.000 5:d20 0 days 0:00:00.000 4:157c 0 days 0:00:00.000 3:1b28 0 days 0:00:00.000 1:1134 0 days 0:00:00.000 0:164 0 days 0:00:00.000 0:007> ~2s ß Using the thread number 2, set the thread context with the ~Ns command. *** WARNING: Unable to verify checksum for eatcpu.exe eax=cccccccc ebx=00b93c48 ecx=0000002b edx=00b937a8 esi=00000000 edi=00d9fcf0 eip=0041169c esp=00d9fcd0 ebp=00d9fd9c iopl=0 nv up ei pl nz na po nc cs=0023 ss=002b ds=002b es=002b fs=0053 gs=002b efl=00010202 eatcpu!checkSomething+0x1c: 0041169c f3ab rep stos dword ptr es:[edi] es:002b:00d9fcf0=cccccccc 0:002> k ß Dump the call stack using k.
如果您看一下下面的调用堆栈,这个线程中的应用程序代码从您看到EATCPU的地方开始,前面的代码是begin thread的C运行时代码。我想跟踪being thread下运行的所有代码。这里的假设是,我会发现有东西在循环并吞噬CPU。为此,我将使用WT命令。不过,首先我需要指定WT开始跟踪的起始地址。
让我们使用UF来分解启动代码的函数,方法是获取eatcpu!myThreadFunction的返回地址。
ChildEBP RetAddr 00d9fd9c 00411657 eatcpu!checkSomething+0x1c [c:\source\eatcpu\eatcpu\eatcpu.cpp @ 49] 00d9fe74 004115a8 eatcpu!trySomething+0x27 [c:\source\eatcpu\eatcpu\eatcpu.cpp @ 45] 00d9ff58 62bb4601 eatcpu!myThreadFunction+0x38 [c:\source\eatcpu\eatcpu\eatcpu.cpp @ 35] 00d9ff94 62bb459c MSVCR80D!_beginthread+0x221 00d9ffa0 768019f1 MSVCR80D!_beginthread+0x1bc 00d9ffac 77a2d109 kernel32!BaseThreadInitThunk+0xe 00d9ffec 00000000 ntdll!_RtlUserThreadStart+0x23 [d:\vistartm\base\ntos\rtl\rtlexec.c @ 2695]
我们将使用WT来监视跟踪这些代码。我已经选择了myThreadFunction函数的起始地址。我还指定了–or 打印每个函数的返回值。Wt产生非常直观的输出。它允许您快速识别代码执行方式中的模式,而仅使用T(TRACE)命令将更加困难。
8 0 [ 0] ntdll!RtlSetLastWin32Error eax = 0 >> No match on ret 8 0 [ 0] ntdll!RtlSetLastWin32Error 2 0 [ 0] eatcpu!checkSomething 1 0 [ 1] eatcpu!ILT+345(__RTC_CheckEsp) 2 0 [ 1] eatcpu!_RTC_CheckEsp eax = 0 9 3 [ 0] eatcpu!checkSomething 1 0 [ 1] eatcpu!ILT+345(__RTC_CheckEsp) 2 0 [ 1] eatcpu!_RTC_CheckEsp eax = 0 12 6 [ 0] eatcpu!checkSomething eax = 0 >> No match on ret 12 6 [ 0] eatcpu!checkSomething 7 0 [ 0] eatcpu!trySomething 1 0 [ 1] eatcpu!ILT+345(__RTC_CheckEsp) 2 0 [ 1] eatcpu!_RTC_CheckEsp eax = 0 10 3 [ 0] eatcpu!trySomething eax = 0 >> No match on ret 10 3 [ 0] eatcpu!trySomething 9 0 [ 0] eatcpu!myThreadFunction ß I see a pattern, a loop. Beginning of loop. 1 0 [ 1] eatcpu!ILT+35(?trySomethingYAHHZ) 60 0 [ 1] eatcpu!trySomething 1 0 [ 2] eatcpu!ILT+180(?checkSomethingYAHHZ) 62 0 [ 2] eatcpu!checkSomething 5 0 [ 3] kernel32!SetLastError 16 0 [ 3] ntdll!RtlSetLastWin32Error eax = 0 64 21 [ 2] eatcpu!checkSomething 1 0 [ 3] eatcpu!ILT+345(__RTC_CheckEsp) 2 0 [ 3] eatcpu!_RTC_CheckEsp eax = 0 71 24 [ 2] eatcpu!checkSomething 1 0 [ 3] eatcpu!ILT+345(__RTC_CheckEsp) 2 0 [ 3] eatcpu!_RTC_CheckEsp eax = 0 74 27 [ 2] eatcpu!checkSomething eax = 0 67 102 [ 1] eatcpu!trySomething 1 0 [ 2] eatcpu!ILT+345(__RTC_CheckEsp) 2 0 [ 2] eatcpu!_RTC_CheckEsp eax = 0 70 105 [ 1] eatcpu!trySomething eax = 0 18 176 [ 0] eatcpu!myThreadFunction ßEnd of loop / beginning of loop 1 0 [ 1] eatcpu!ILT+35(?trySomethingYAHHZ) 60 0 [ 1] eatcpu!trySomething 1 0 [ 2] eatcpu!ILT+180(?checkSomethingYAHHZ) 62 0 [ 2] eatcpu!checkSomething 5 0 [ 3] kernel32!SetLastError ßAlways look for what might be going wrong! Last error can give you a clue. We are setting last error at the low level of the loop 16 0 [ 3] ntdll!RtlSetLastWin32Error eax = 0 64 21 [ 2] eatcpu!checkSomething 1 0 [ 3] eatcpu!ILT+345(__RTC_CheckEsp) 2 0 [ 3] eatcpu!_RTC_CheckEsp eax = 0 71 24 [ 2] eatcpu!checkSomething 1 0 [ 3] eatcpu!ILT+345(__RTC_CheckEsp) 2 0 [ 3] eatcpu!_RTC_CheckEsp eax = 0 74 27 [ 2] eatcpu!checkSomething eax = 0 ßAlso note checkSomething is returning ZERO, this might indicate a problem. You need to look at the code or assembler. 67 102 [ 1] eatcpu!trySomething 1 0 [ 2] eatcpu!ILT+345(__RTC_CheckEsp) 2 0 [ 2] eatcpu!_RTC_CheckEsp eax = 0 70 105 [ 1] eatcpu!trySomething eax = 0 27 352 [ 0] eatcpu!myThreadFunction ßEnd of loop / beginning of loop 1 0 [ 1] eatcpu!ILT+35(?trySomethingYAHHZ) 60 0 [ 1] eatcpu!trySomething 1 0 [ 2] eatcpu!ILT+180(?checkSomethingYAHHZ) 62 0 [ 2] eatcpu!checkSomething 5 0 [ 3] kernel32!SetLastError 16 0 [ 3] ntdll!RtlSetLastWin32Error eax = 0 64 21 [ 2] eatcpu!checkSomething 1 0 [ 3] eatcpu!ILT+345(__RTC_CheckEsp) 2 0 [ 3] eatcpu!_RTC_CheckEsp eax = 0 71 24 [ 2] eatcpu!checkSomething 1 0 [ 3] eatcpu!ILT+345(__RTC_CheckEsp) 2 0 [ 3] eatcpu!_RTC_CheckEsp eax = 0 74 27 [ 2] eatcpu!checkSomething eax = 0 67 102 [ 1] eatcpu!trySomething 1 0 [ 2] eatcpu!ILT+345(__RTC_CheckEsp) 2 0 [ 2] eatcpu!_RTC_CheckEsp eax = 0 70 105 [ 1] eatcpu!trySomething eax = 0 36 528 [ 0] eatcpu!myThreadFunction ßEnd of loop / beginning of loop 1 0 [ 1] eatcpu!ILT+35(?trySomethingYAHHZ) 60 0 [ 1] eatcpu!trySomething 1 0 [ 2] eatcpu!ILT+180(?checkSomethingYAHHZ) 62 0 [ 2] eatcpu!checkSomething 5 0 [ 3] kernel32!SetLastError 16 0 [ 3] ntdll!RtlSetLastWin32Error eax = 0 64 21 [ 2] eatcpu!checkSomething 1 0 [ 3] eatcpu!ILT+345(__RTC_CheckEsp) 2 0 [ 3] eatcpu!_RTC_CheckEsp eax = 0 71 24 [ 2] eatcpu!checkSomething 1 0 [ 3] eatcpu!ILT+345(__RTC_CheckEsp) 2 0 [ 3] eatcpu!_RTC_CheckEsp eax = 0 74 27 [ 2] eatcpu!checkSomething eax = 0 67 102 [ 1] eatcpu!trySomething 12930 instructions were executed in 12929 events (0 from other threads) Function Name Invocations MinInst MaxInst AvgInst eatcpu!ILT+180(?checkSomethingYAHHZ) 69 1 1 1 eatcpu!ILT+345(__RTC_CheckEsp) 210 1 1 1 eatcpu!ILT+35(?trySomethingYAHHZ) 70 1 1 1 eatcpu!_RTC_CheckEsp 210 2 2 2 eatcpu!checkSomething 70 60 74 73 eatcpu!myThreadFunction 1 630 630 630 eatcpu!trySomething 71 10 70 68 kernel32!SetLastError 70 5 5 5 ntdll!RtlSetLastWin32Error 70 16 16 16 0 system calls were executed eax=cccccccc ebx=00b93c48 ecx=00000002 edx=00b937a8 esi=00000000 edi=00d9fe6c eip=0041164c esp=00d9fda8 ebp=00d9fe74 iopl=0 nv up ei pl nz na pe nc cs=0023 ss=002b ds=002b es=002b fs=0053 gs=002b efl=00010206 eatcpu!trySomething+0x1c: 0041164c f3ab rep stos dword ptr es:[edi] es:002b:00d9fe6c=cccccccc 0:002> !gle ßNow that we have broken in let's check and see what the last error is using !GLE (Get Last Error) This dumps out the last error from the TEB. LastErrorValue: (Win32) 0x57 (87) - The parameter is incorrect. LastStatusValue: (NTSTATUS) 0xc000000d - An invalid parameter was passed to a service or function. 0:007> bp kernel32!SetLastError ßLets set a breakpoint on last error to examine what is going on in the function calling it. 0:007> g Breakpoint 1 hit eax=cccccccc ebx=00b93c48 ecx=00000000 edx=00b937a8 esi=00d9fcd0 edi=00d9fd9c eip=767913dd esp=00d9fcc8 ebp=00d9fd9c iopl=0 nv up ei pl zr na pe nc cs=0023 ss=002b ds=002b es=002b fs=0053 gs=002b efl=00000246 kernel32!SetLastError: 767913dd 8bff mov edi,edi 0:002> kv ßGet the call stack ChildEBP RetAddr 0:002> kv ChildEBP RetAddr Args to Child 00d9fcc4 004116cb 00000057 00d9fe74 00000000 kernel32!SetLastError (FPO: [Non-Fpo]) ß0x57 Invalid parameter error, Why? 00d9fd9c 00411657 00000000 00d9ff58 00000000 eatcpu!checkSomething+0x4b (FPO: [Non-Fpo]) (CONV: cdecl) [c:\source\eatcpu\eatcpu\eatcpu.cpp @ 57] 00d9fe74 004115a8 00000000 00000000 00000000 eatcpu!trySomething+0x27 (FPO: [Non-Fpo]) (CONV: cdecl) [c:\source\eatcpu\eatcpu\eatcpu.cpp @ 45] 00d9ff58 62bb4601 0017ff34 4f9f12e9 00000000 eatcpu!myThreadFunction+0x38 (FPO: [Non-Fpo]) (CONV: cdecl) [c:\source\eatcpu\eatcpu\eatcpu.cpp @ 35] 00d9ff94 62bb459c 00b937a8 00d9ffac 768019f1 MSVCR80D!_beginthread+0x221 (FPO: [Non-Fpo]) 00d9ffa0 768019f1 00b937a8 00d9ffec 77a2d109 MSVCR80D!_beginthread+0x1bc (FPO: [Non-Fpo]) 00d9ffac 77a2d109 00b93c48 00d926a6 00000000 kernel32!BaseThreadInitThunk+0xe (FPO: [Non-Fpo]) 00d9ffec 00000000 62bb4520 00b93c48 00000000 ntdll!_RtlUserThreadStart+0x23 (FPO: [Non-Fpo]) (CONV: stdcall) [d:\vistartm\base\ntos\rtl\rtlexec.c @ 2695] 0:002> !error 00000057 ßdouble check, using !error, this will decode the error into a human readable string. Error code: (Win32) 0x57 (87) - The parameter is incorrect. 0:002> uf checkSomething ßlets disassemble the function calling SetLastError. eatcpu!checkSomething [c:\source\eatcpu\eatcpu\eatcpu.cpp @ 49]: 49 00411680 55 push ebp 49 00411681 8bec mov ebp,esp 49 00411683 81ecc0000000 sub esp,0C0h 49 00411689 53 push ebx 49 0041168a 56 push esi 49 0041168b 57 push edi 49 0041168c 8dbd40ffffff lea edi,[ebp-0C0h] 49 00411692 b930000000 mov ecx,30h 49 00411697 b8cccccccc mov eax,0CCCCCCCCh 49 0041169c f3ab rep stos dword ptr es:[edi] 50 0041169e 837d0800 cmp dword ptr [ebp+8],0 ßCheck what our first parameter is on the stack, EBP+8 remember PLUS == Parameters. Note looking at the stack it’s 00000000 50 004116a2 741d je eatcpu!checkSomething+0x41 (004116c1) ßif parameter 1 ones 0 we are going to jump to this addres, else we execute the following code. (WE JUMP) eatcpu!checkSomething+0x24 [c:\source\eatcpu\eatcpu\eatcpu.cpp @ 52]: 52 004116a4 8bf4 mov esi,esp ßThe green would have been the good code path, non errant. 52 004116a6 68fa000000 push 0FAh 52 004116ab ff15a8814100 call dword ptr [eatcpu!_imp__Sleep (004181a8)] ßNote we sleep or do some work other then eat CPU here if we are passed non ZERO 52 004116b1 3bf4 cmp esi,esp 52 004116b3 e8a6faffff call eatcpu!ILT+345(__RTC_CheckEsp) (0041115e) 53 004116b8 b801000000 mov eax,1 ßWe are setting EAX to 1, this means we have succeded 53 004116bd eb15 jmp eatcpu!checkSomething+0x54 (004116d4) ßNow we jump to the clean up code for the fucntionn eatcpu!checkSomething+0x41 [c:\source\eatcpu\eatcpu\eatcpu.cpp @ 57]: 57 004116c1 8bf4 mov esi,esp ßThis appears to be a failure case. We did not get an expected parameter so we report an error and return zero. 57 004116c3 6a57 push 57h ßPusing error 0x57 on the stack, invalid parameter. 57 004116c5 ff15a4814100 call dword ptr [eatcpu!_imp__SetLastError (004181a4)] ßOur call to setlasterror 57 004116cb 3bf4 cmp esi,esp 57 004116cd e88cfaffff call eatcpu!ILT+345(__RTC_CheckEsp) (0041115e) 58 004116d2 33c0 xor eax,eax ßXORing eax with eax will make EAX Zero. This is an error condition. eatcpu!checkSomething+0x54 [c:\source\eatcpu\eatcpu\eatcpu.cpp @ 60]: 60 004116d4 5f pop edi 60 004116d5 5e pop esi 60 004116d6 5b pop ebx 60 004116d7 81c4c0000000 add esp,0C0h 60 004116dd 3bec cmp ebp,esp 60 004116df e87afaffff call eatcpu!ILT+345(__RTC_CheckEsp) (0041115e) 60 004116e4 8be5 mov esp,ebp 60 004116e6 5d pop ebp 60 004116e7 c3 ret
这个场景需要注意的一点是,在处理高CPU条件时,在某些循环条件的较低级别经常会出现一些问题,从而阻止代码的正确执行。如果幸运的话,问题是由操作系统或应用程序中的某个错误工具报告的。无论哪种情况,您都可以使用上述技术进行隔离。
下面是EATCPU的示例代码。
// eatcpu.cpp : Defines the entry point for the console application. // #include "stdafx.h" #include <windows.h> #include <process.h> void myThreadFunction(void *); int doSomething(int); int trySomething(int); int checkSomething(int); int _tmain(int argc, _TCHAR* argv[]) { int truevalue = 1; int falsevalue = 0; // let's start some threads. These should represent worker threads in a process. Some of them will do “valid work” one will fail to do so. _beginthread(myThreadFunction,12000,(void *)&truevalue); _beginthread(myThreadFunction,12000,(void *)&truevalue); _beginthread(myThreadFunction,12000,(void *)&truevalue); _beginthread(myThreadFunction,12000,(void *)&falsevalue ); _beginthread(myThreadFunction,12000,(void *)&truevalue); _beginthread(myThreadFunction,12000,(void *)&truevalue); Sleep(10*60000); return 0; } void myThreadFunction(void *value) { int *localvalue = (int *)value; while(1) { trySomething(*localvalue); } } int doSomething(int value) { return trySomething(value); } int trySomething(int value) { return checkSomething(value); } int checkSomething(int value) { if(value) { // Make sure we have have some valid input parameter. //We will pretend we are doing work, this could be anything , file I/O etc. Sleep(250); return TRUE; } else { // This is an error condition, this function expects an NON Zero parameter and will report ZERO as an invalid parameter. SetLastError(ERROR_INVALID_PARAMETER); return FALSE; } }