记一次使用windbg排查内存泄漏的过程
一、背景
近期有一个项目在运行当中出现一些问题,程序顺利启动,但是观察一阵子后发现内存使用总量在很缓慢地升高,
虽然偶尔还会往下降一些,但是总体还是不断上升;内存运行6个小时候从33M上升到80M;
程序存在内存泄漏是确定无疑的了,大概出问题的方向也知道,就是程序新加入一个采集协议(BACnet协议,MSTP_DLL),
但是怎么把具体泄漏位置找出来却非常麻烦,因为这个协议是封装在一个C语言写的动态库中,想要单步调试好像不太可能,
况且源码也不再我这里;
如果到此为止,推脱给其他同事找问题,那联合调试费时不说。其他同事也身兼数职,不大可能有时间调试,
那项目推进肯定停滞;那没办法了,只能硬着头皮上;网上了解一番,对于这种内存泄漏问题,比较好的处理方式就是
抓取内存快照,然后分析数据提交记录,使用查看使用堆栈等信息;所以基于以上原因,选择了windbg内核调试工具;
先分析一下看看,说不定可以发现问题;
二、windbg注意事项
1、首先要安装对版本,即你的程序是32位还是64位,对于的windbg版本也要一致,否则会报错;详情了解:点击这里
2、需要用64位的任务管理器抓32位的dump文件,那不能直接在任务管理器右键“创建转储文件“,需要运行(C:\Windows\SysWOW64\taskmgr.exe)
3、或者直接在windbg上使用命令存储,先附加到进程,然后使用命令:(.dump /ma c:\xxx.dmp),这样就将快照保存在C盘了;
4、最重要的,要确保你的机器能连接外网;由于windbg的使用需要在线更新符号文件,但是这个地址刚好被国家防火墙屏蔽;
三、windbg必要设置
1、首先我先抓取2个内存快照文件(中间相隔一段时间),如下
2、打开windbg,设置符号下载路径
将33.dmp直接拖进工作区即可,然后打开菜单File -> Symbol File Path
输入地址:SRV*c:\symbols*http://msdl.microsoft.com/download/symbols
四、分析文件
1、分别打开两个dmp文件,输入命令!dumpheap -stat查看各种类型的内存分配情况
33.dmp
>.load C:\Windows\Microsoft.NET\Framework\v4.0.30319\SOS.dll >!dumpheap -stat ..... 61f87928 2292 34012 System.RuntimeType[] 5d2dbe74 267 34176 System.Data.DataColumn 61fd75e0 668 37408 System.Reflection.RuntimePropertyInfo 61f8426c 702 48976 System.Int32[] 5d2dcc24 70 72520 System.Data.RBTree`1+Node[[System.Data.DataRow, System.Data]][] 61f883e4 1242 84456 System.Reflection.RuntimeParameterInfo 61f8839c 2045 89980 System.Signature 0a7566bc 596 92976 HG.MacamUnit.Entity.TSubSysNodes 61f82788 723 117736 System.Object[] 61f89850 8 131696 System.Int64[] 61fd8938 2792 167520 System.Reflection.RuntimeMethodInfo 007988d0 220 434392 Free 61f824e4 12187 738904 System.String 61f85c40 2138 743067 System.Byte[] 61f82c60 294 6629796 System.Char[] Total 55014 objects
80.dmp
>.load C:\Windows\Microsoft.NET\Framework\v4.0.30319\SOS.dll
>!dumpheap -stat
.....
61f83698 876 24528 System.RuntimeType
61f84ec0 159 26472 System.Collections.Hashtable+bucket[]
61fc9020 631 27764 System.Reflection.RtFieldInfo
61f95be8 46 28392 System.Reflection.Emit.__FixupData[]
61f87928 2292 34012 System.RuntimeType[]
61fd75e0 668 37408 System.Reflection.RuntimePropertyInfo
5d2dcc24 42 43512 System.Data.RBTree`1+Node[[System.Data.DataRow, System.Data]][]
61f8426c 595 45868 System.Int32[]
61f883e4 1242 84456 System.Reflection.RuntimeParameterInfo
61f8839c 2045 89980 System.Signature
61f82788 622 113684 System.Object[]
61f89850 8 131696 System.Int64[]
61fd8938 2769 166140 System.Reflection.RuntimeMethodInfo
61f824e4 9800 676596 System.String
61f85c40 2064 705655 System.Byte[]
61f82c60 195 2369402 System.Char[]
007988d0 114 3338792 Free
Total 47306 objects
着重分析(红色部分)这两个文件的内存分配情况,似乎差别不大,完全看不出来80-33=近50M的内存消耗在哪里;
但认真思考一下,这样好像也没有问题,因为System.***这种类型是C#环境独有的,已知C#没有内存泄漏,所以这里没有体现应该是正常的;
那C语言接口文件里边的问题该如何找出来呢?
2、再来试试!heap -s,查看各种堆的内存提交数据量
33.dmp
0:047> !heap -s
LFH Key : 0x343fce0b
Termination on corruption : ENABLED
Heap Flags Reserv Commit Virt Free List UCR Virt Lock Fast
(k) (k) (k) (k) length blocks cont. heap
-----------------------------------------------------------------------------
00780000 00000002 8192 4636 8192 209 2484 4 0 e LFH
002e0000 00001002 256 4 256 2 1 1 0 0
00280000 00001002 1088 72 1088 5 2 2 0 0
00c70000 00041002 256 4 256 2 1 1 0 0
002d0000 00001002 1088 132 1088 8 23 2 0 0
00450000 00001002 256 4 256 0 1 1 0 0
07230000 00041002 256 4 256 2 1 1 0 0
00c10000 00001002 256 216 256 3 39 1 0 0 LFH
09b50000 00001002 256 80 256 39 28 1 0 0
09d00000 00001002 64 4 64 2 1 1 0 0
09ef0000 00001002 1088 72 1088 6 2 2 0 0
004c0000 00001002 1088 192 1088 15 140 2 0 0
09760000 00041002 256 28 256 4 4 1 0 0
09ed0000 00001002 64 12 64 1 1 1 0 0
0b210000 00001002 3136 1456 3136 52 84 3 0 0 LFH
0a700000 00001002 256 212 256 2 1 1 0 0
0e1e0000 00011002 256 4 256 0 1 1 0 0
0d030000 00001002 256 16 256 3 1 1 0 0
11b30000 00001002 1088 388 1088 0 1 2 0 0
-----------------------------------------------------------------------------
80.dmp
0:051> !heap -s LFH Key : 0x343fce0b Termination on corruption : ENABLED Heap Flags Reserv Commit Virt Free List UCR Virt Lock Fast (k) (k) (k) (k) length blocks cont. heap ----------------------------------------------------------------------------- 00780000 00000002 8192 4808 8192 225 2505 4 0 f1 LFH 002e0000 00001002 256 4 256 2 1 1 0 0 00280000 00001002 1088 132 1088 4 6 2 0 0 00c70000 00041002 256 4 256 2 1 1 0 0 002d0000 00001002 1088 168 1088 12 26 2 0 0 00450000 00001002 256 4 256 0 1 1 0 0 07230000 00041002 256 4 256 2 1 1 0 0 00c10000 00001002 256 228 256 26 69 1 0 0 LFH 09b50000 00001002 256 80 256 39 25 1 0 0 09d00000 00001002 64 4 64 2 1 1 0 0 09ef0000 00001002 1088 132 1088 6 5 2 0 0 004c0000 00001002 1088 220 1088 26 173 2 0 0 09760000 00041002 256 28 256 4 8 1 0 0 09ed0000 00001002 64 12 64 1 1 1 0 0 0b210000 00001002 3136 1456 3136 74 71 3 0 0 LFH 0a700000 00001002 256 212 256 2 1 1 0 0 0e1e0000 00011002 256 4 256 0 1 1 0 0 0d030000 00001002 256 16 256 1 1 1 0 0 11b30000 00001002 47808 46068 47808 396 6836 7 0 0 -----------------------------------------------------------------------------
这次有异常了,可以看到11b30000这一行内存提交变化很大 47808 - 1088 = 46720;
这次可以肯定问题就在这个堆里边;
3、进去看看11b30000,使用命令:!heap -stat -h 11b30000
80.dmp
0:051> !heap -stat -h 11b30000
heap @ 11b30000
group-by: TOTSIZE max-display: 20
size #blocks total ( %) (percent of total busy bytes)
1f0 102d9 - 1f58470 (92.48)
18 102b0 - 184080 (4.47)
10 102ae - 102ae0 (2.98)
214 13 - 277c (0.03)
1000 2 - 2000 (0.02)
800 2 - 1000 (0.01)
220 1 - 220 (0.00)
1d7 1 - 1d7 (0.00)
80 3 - 180 (0.00)
a4 1 - a4 (0.00)
24 4 - 90 (0.00)
14 4 - 50 (0.00)
4a 1 - 4a (0.00)
25 2 - 4a (0.00)
48 1 - 48 (0.00)
46 1 - 46 (0.00)
41 1 - 41 (0.00)
3e 1 - 3e (0.00)
3c 1 - 3c (0.00)
37 1 - 37 (0.00)
可以看到前面3项几乎占据99%的内存提交记录;尤其以内存块大小为1f0的数据块使用最多内存;
到目前为止,我们知道了几项有效信息,有大小分别为1f0、18、10的三种数据块,不断申请出新空间;
但是这样还不够,根据一个内存块的大小并不能准确定位是哪里出了问题,这是一个结构体?还是字符串?还是数组?
都不知道,所以有必要进去看看,有哪些地方使用到了这些数据块
4、查看使用了1f0数据块大小的位置列表,使用命令:!heap -flt s [size]
80.dmp 0:051> !heap -flt s 1f0 _DPH_HEAP_ROOT @ 5a1000 Freed and decommitted blocks DPH_HEAP_BLOCK : VirtAddr VirtSize Busy allocations DPH_HEAP_BLOCK : UserAddr UserSize - VirtAddr VirtSize _HEAP @ 780000 HEAP_ENTRY Size Prev Flags UserPtr UserSize - state 0078e5b8 0045 0000 [00] 0078e5e0 001f0 - (busy) _DPH_HEAP_ROOT @ 9d11000 Freed and decommitted blocks DPH_HEAP_BLOCK : VirtAddr VirtSize Busy allocations DPH_HEAP_BLOCK : UserAddr UserSize - VirtAddr VirtSize _HEAP @ 4c0000 _DPH_HEAP_ROOT @ af41000 Freed and decommitted blocks DPH_HEAP_BLOCK : VirtAddr VirtSize Busy allocations DPH_HEAP_BLOCK : UserAddr UserSize - VirtAddr VirtSize _HEAP @ b210000 0cf61680 0045 0045 [00] 0cf616a8 001f0 - (busy) _DPH_HEAP_ROOT @ d871000 Freed and decommitted blocks DPH_HEAP_BLOCK : VirtAddr VirtSize Busy allocations DPH_HEAP_BLOCK : UserAddr UserSize - VirtAddr VirtSize _HEAP @ d030000 _DPH_HEAP_ROOT @ 11631000 Freed and decommitted blocks DPH_HEAP_BLOCK : VirtAddr VirtSize Busy allocations DPH_HEAP_BLOCK : UserAddr UserSize - VirtAddr VirtSize _HEAP @ 11b30000 11b312e8 0045 0045 [00] 11b31310 001f0 - (busy) 11b315a8 0045 0045 [00] 11b315d0 001f0 - (busy) 11b356f8 0045 0045 [00] 11b35720 001f0 - (busy) 11b35920 0045 0045 [00] 11b35948 001f0 - (busy) 11b36f30 0045 0045 [00] 11b36f58 001f0 - (busy) 11b37b58 0045 0045 [00] 11b37b80 001f0 - (busy) 11b37e18 0045 0045 [00] 11b37e40 001f0 - (busy) 11b3e4f0 0045 0045 [00] 11b3e518 001f0 - (busy) 11b3f570 0045 0045 [00] 11b3f598 001f0 - (busy) 11b3f830 0045 0045 [00] 11b3f858 001f0 - (busy) 11b3faf0 0045 0045 [00] 11b3fb18 001f0 - (busy) 11b3fdb0 0046 0045 [00] 11b3fdd8 001f0 - (busy) 12890578 0045 0046 [00] 128905a0 001f0 - (busy) ......
可以看到有很多堆都有使用到1f0大小的内存块,但是只有最后一个堆 _DPH_HEAP_ROOT @ 11631000
是记录最多的,满屏都是,这里只能截断,选取一部分看看
5、查看调用堆栈,使用命令:!heap -p -a [address]
80.dmp 0:051> !heap -p -a 11b3fdd8 address 11b3fdd8 found in _HEAP @ 11b30000 HEAP_ENTRY Size Prev Flags UserPtr UserSize - state 11b3fdb0 0046 0000 [00] 11b3fdd8 001f0 - (busy) Trace: 083a 7405a6a7 verifier!AVrfpDphNormalHeapAllocate+0x000000d7 74058f6e verifier!AVrfDebugPageHeapAllocate+0x0000030e 77d10fe6 ntdll!RtlDebugAllocateHeap+0x00000030 77ccab8e ntdll!RtlpAllocateHeap+0x000000c4 77c73461 ntdll!RtlAllocateHeap+0x0000023a 664668e5 msvcr90!_calloc_impl+0x00000125 66463c5a msvcr90!calloc+0x0000001a 0:051> !heap -p -a 11b3fdd8 address 11b3fdd8 found in _HEAP @ 11b30000 HEAP_ENTRY Size Prev Flags UserPtr UserSize - state 11b3fdb0 0046 0000 [00] 11b3fdd8 001f0 - (busy) Trace: 083a 7405a6a7 verifier!AVrfpDphNormalHeapAllocate+0x000000d7 74058f6e verifier!AVrfDebugPageHeapAllocate+0x0000030e 77d10fe6 ntdll!RtlDebugAllocateHeap+0x00000030 77ccab8e ntdll!RtlpAllocateHeap+0x000000c4 77c73461 ntdll!RtlAllocateHeap+0x0000023a 664668e5 msvcr90!_calloc_impl+0x00000125 66463c5a msvcr90!calloc+0x0000001a 0:051> !heap -p -a 11b3fb18 address 11b3fb18 found in _HEAP @ 11b30000 HEAP_ENTRY Size Prev Flags UserPtr UserSize - state 11b3faf0 0045 0000 [00] 11b3fb18 001f0 - (busy) Trace: 083a 7405a6a7 verifier!AVrfpDphNormalHeapAllocate+0x000000d7 74058f6e verifier!AVrfDebugPageHeapAllocate+0x0000030e 77d10fe6 ntdll!RtlDebugAllocateHeap+0x00000030 77ccab8e ntdll!RtlpAllocateHeap+0x000000c4 77c73461 ntdll!RtlAllocateHeap+0x0000023a 664668e5 msvcr90!_calloc_impl+0x00000125 66463c5a msvcr90!calloc+0x0000001a
随意挑选几个查看调用堆栈,似乎没有有用的特征信息,verifier、ntdll、msvcr90这些都是操作系统内核级别的函数;
并不能暴露出使用1f0大小的数据块大概位置,这就有点难办了,难道此路不通?如果不找到有效堆栈信息,想定位
内心泄漏点,靠单步调试会相当麻烦。。。
不急,先看看,这些地方内存块内容是什么,说不定能找到一些有效特征信息;
使用命令:db [UserPtr]
80.dmp 0:051> db 11b3fb18 11b3fb18 00 00 04 00 00 00 00 00-00 00 00 00 00 00 00 00 ................ 11b3fb28 00 00 00 00 00 00 00 00-00 00 00 00 00 00 00 00 ................ 11b3fb38 00 00 00 00 00 00 00 00-00 00 00 00 00 00 00 00 ................ 11b3fb48 00 00 00 00 00 00 00 00-00 00 00 00 00 00 00 00 ................ 11b3fb58 00 00 00 00 00 00 00 00-00 00 00 00 00 00 00 00 ................ 11b3fb68 00 00 00 00 00 00 00 00-00 00 00 00 00 00 00 00 ................ 11b3fb78 00 00 00 00 00 00 00 00-00 00 00 00 00 00 00 00 ................ 11b3fb88 00 00 00 00 00 00 00 00-00 00 00 00 00 00 00 00 ................ 0:051> db 11b3fdd8 11b3fdd8 00 00 04 00 00 00 00 00-00 00 00 00 00 00 00 00 ................ 11b3fde8 00 00 00 00 00 00 00 00-00 00 00 00 00 00 00 00 ................ 11b3fdf8 00 00 00 00 00 00 00 00-00 00 00 00 00 00 00 00 ................ 11b3fe08 00 00 00 00 00 00 00 00-00 00 00 00 00 00 00 00 ................ 11b3fe18 00 00 00 00 00 00 00 00-00 00 00 00 00 00 00 00 ................ 11b3fe28 00 00 00 00 00 00 00 00-00 00 00 00 00 00 00 00 ................ 11b3fe38 00 00 00 00 00 00 00 00-00 00 00 00 00 00 00 00 ................ 11b3fe48 00 00 00 00 00 00 00 00-00 00 00 00 00 00 00 00 ................ 0:051> db 11b3fdd8 11b3fdd8 00 00 04 00 00 00 00 00-00 00 00 00 00 00 00 00 ................ 11b3fde8 00 00 00 00 00 00 00 00-00 00 00 00 00 00 00 00 ................ 11b3fdf8 00 00 00 00 00 00 00 00-00 00 00 00 00 00 00 00 ................ 11b3fe08 00 00 00 00 00 00 00 00-00 00 00 00 00 00 00 00 ................ 11b3fe18 00 00 00 00 00 00 00 00-00 00 00 00 00 00 00 00 ................ 11b3fe28 00 00 00 00 00 00 00 00-00 00 00 00 00 00 00 00 ................ 11b3fe38 00 00 00 00 00 00 00 00-00 00 00 00 00 00 00 00 ................ 11b3fe48 00 00 00 00 00 00 00 00-00 00 00 00 00 00 00 00 ................
结果是令人失望的;
显示这些基本都是空白内存,里边已经没有任何有效信息,,
陷入死胡同里了,难道到此为止?
还不死心,我们再看看这些地址有没有引用跟,如果有引用跟,也可以打印堆栈信息
使用命令:!gcroot [UserPtr]
80.dmp 0:051> !gcroot 11b3fb18 Found 0 unique roots (run '!GCRoot -all' to see all roots). 0:051> !gcroot 11b3fdd8 Found 0 unique roots (run '!GCRoot -all' to see all roots). 0:051> !gcroot 11b3fdd8 Found 0 unique roots (run '!GCRoot -all' to see all roots).
愿望是美好的,这个大小位1f0的数据块被申请了0x102d9次,使用!gcroot命令查看得到貌似都是无引用的野数据
我们再来看看,这个 _DPH_HEAP_ROOT @ 11631000堆的创建堆栈
80.dmp 0:051> dt ntdll!_DPH_HEAP_ROOT CreateStackTrace 11631000 +0x0b8 CreateStackTrace : 0x04d54f8c _RTL_TRACE_BLOCK 0:051> dds 0x04d54f8c 04d54f8c 04d1b714 04d54f90 0000f801 04d54f94 000f0000 04d54f98 74058969 verifier!AVrfDebugPageHeapCreate+0x439 04d54f9c 77cbcea2 ntdll!RtlCreateHeap+0x41 04d54fa0 757356bc KERNELBASE!HeapCreate+0x50 04d54fa4 66463a4a msvcr90!_heap_init+0x1b 04d54fa8 66422bb4 msvcr90!__p__tzname+0x2a 04d54fac 66422d5e msvcr90!_CRTDLL_INIT+0x1e 04d54fb0 77c79264 ntdll!LdrpCallInitRoutine+0x14 04d54fb4 77c7fe97 ntdll!LdrpRunInitializeRoutines+0x26f 04d54fb8 77c7ea4e ntdll!LdrpLoadDll+0x472 04d54fbc 77cbd3df ntdll!LdrLoadDll+0xc7 04d54fc0 75732e6a KERNELBASE!LoadLibraryExW+0x233 04d54fc4 7562483c kernel32!LoadLibraryW+0x11 04d54fc8 6d3d18de*** WARNING: Unable to verify checksum for Win32Project1.dll *** ERROR: Symbol file could not be found. Defaulted to export symbols for Win32Project1.dll - Win32Project1+0x18de 04d54fcc 6d3d28fc Win32Project1!BACNet::Init+0x5c 04d54fd0 6d3d5925 Win32Project1!Init+0x25 04d54fd4 66639972*** WARNING: Unable to verify checksum for SMDB.dll *** ERROR: Symbol file could not be found. Defaulted to export symbols for SMDB.dll - SMDB!LogPop+0x12 04d54fd8 66639452 SMDB!CreateSharedMemory+0x12 04d54fdc 6d8e47bd clrjit!Compiler::impImportBlockCode+0x2aac [f:\dd\ndp\clr\src\jit32\importer.cpp @ 10258] 04d54fe0 6d8c2e6b clrjit!Compiler::impImportBlock+0x5f [f:\dd\ndp\clr\src\jit32\importer.cpp @ 13246] 04d54fe4 6d8c306a clrjit!Compiler::impImport+0x235 [f:\dd\ndp\clr\src\jit32\importer.cpp @ 14195] 04d54fe8 6d8c364f clrjit!Compiler::compCompile+0x62 [f:\dd\ndp\clr\src\jit32\compiler.cpp @ 2491] 04d54fec 6d8c4276 clrjit!Compiler::compCompileHelper+0x32f [f:\dd\ndp\clr\src\jit32\compiler.cpp @ 3615] 04d54ff0 6d8c43fc clrjit!Compiler::compCompile+0x2ab [f:\dd\ndp\clr\src\jit32\compiler.cpp @ 3086] 04d54ff4 6d8c45c8 clrjit!jitNativeCode+0x1f6 [f:\dd\ndp\clr\src\jit32\compiler.cpp @ 4057] 04d54ff8 6d8c377d clrjit!CILJit::compileMethod+0x7d [f:\dd\ndp\clr\src\jit32\ee_il_dll.cpp @ 180] 04d54ffc 633b39b3 clr!invokeCompileMethodHelper+0x10b 04d55000 633b3a8b clr!invokeCompileMethod+0x3d 04d55004 633b3ae8 clr!CallCompileMethodWithSEHWrapper+0x39 04d55008 633b3d97 clr!UnsafeJitFunction+0x431
动态库Win32Project1.dll是对MSTP_DLL动态库的再次封装可以确定不存在内存泄漏问题;
看到这个堆是在于硬件设备通信的时候,初始化时CLR创建的线程;
不过知道这个好像也没有什么用,因为我们本来就知道是BACnet协议通信的动态库有问题;
只能说明是初始化之后产生的内存泄漏;
但是为什么这些无跟指针没有被垃圾回收?
但是仔细一想,好像也是正常,因为这些是可以明确的在C语言编写的动态库里申请的内存,属于不受托管的内存;
C#垃圾回收也只能回收托管内存,所以这部分数据不主动释放,那就会永远在那里;
但是现在,好像陷入死胡同了,找不到思路,既然如此就先放放,先看看其他两个数据块的调用情况;
6、!heap -flt s 18
80.dmp > !heap -flt s 18 ... 16f45098 000a 000a [00] 16f450c0 00018 - (busy) 16f45358 000a 000a [00] 16f45380 00018 - (busy) 16f45618 000a 000a [00] 16f45640 00018 - (busy) 16f458d8 000a 000a [00] 16f45900 00018 - (busy) 16f45b98 000a 000a [00] 16f45bc0 00018 - (busy) 16f46080 000a 000a [00] 16f460a8 00018 - (busy) 16f46118 000a 000a [00] 16f46140 00018 - (busy) 16f461b0 000a 000a [00] 16f461d8 00018 - (busy) 16f46248 000a 000a [00] 16f46270 00018 - (busy) 16f462e0 000a 000a [00] 16f46308 00018 - (busy) 16f46378 000a 000a [00] 16f463a0 00018 - (busy) 16f46410 000a 000a [00] 16f46438 00018 - (busy) 16f464a8 000b 000a [00] 16f464d0 00018 - (busy) 16f46548 000a 000b [00] 16f46570 00018 - (busy) 16f46808 000a 000a [00] 16f46830 00018 - (busy) 16f46ac8 000a 000a [00] 16f46af0 00018 - (busy) 16f46d88 000a 000a [00] 16f46db0 00018 - (busy) 16f47048 000a 000a [00] 16f47070 00018 - (busy) 16f47308 000a 000a [00] 16f47330 00018 - (busy) ...
7、随意挑几个看看,命令:!heap -p -a [UserPtr]
80.dmp 0:051> !heap -p -a invalid address passed to `-p -a'0:051> !heap -p -a 16f460a8 address 16f460a8 found in _HEAP @ 11b30000 HEAP_ENTRY Size Prev Flags UserPtr UserSize - state 16f46080 000a 0000 [00] 16f460a8 00018 - (busy) Trace: 074b 7405a6a7 verifier!AVrfpDphNormalHeapAllocate+0x000000d7 74058f6e verifier!AVrfDebugPageHeapAllocate+0x0000030e 77d10fe6 ntdll!RtlDebugAllocateHeap+0x00000030 77ccab8e ntdll!RtlpAllocateHeap+0x000000c4 77c73461 ntdll!RtlAllocateHeap+0x0000023a 664668e5 msvcr90!_calloc_impl+0x00000125 66463c5a msvcr90!calloc+0x0000001a *** ERROR: Symbol file could not be found. Defaulted to export symbols for MSTP_DLL.dll - 669baea1 MSTP_DLL!MSTP_Get_RPM_ACK_Data+0x00000091 0:051> !heap -p -a 16f46570 address 16f46570 found in _HEAP @ 11b30000 HEAP_ENTRY Size Prev Flags UserPtr UserSize - state 16f46548 000a 0000 [00] 16f46570 00018 - (busy) 7405a6a7 verifier!AVrfpDphNormalHeapAllocate+0x000000d7 74058f6e verifier!AVrfDebugPageHeapAllocate+0x0000030e 77d10fe6 ntdll!RtlDebugAllocateHeap+0x00000030 77ccab8e ntdll!RtlpAllocateHeap+0x000000c4 77c73461 ntdll!RtlAllocateHeap+0x0000023a 664668e5 msvcr90!_calloc_impl+0x00000125 66463c5a msvcr90!calloc+0x0000001a 669baea1 MSTP_DLL!MSTP_Get_RPM_ACK_Data+0x00000091 0:051> !heap -p -a 16f46308 address 16f46308 found in _HEAP @ 11b30000 HEAP_ENTRY Size Prev Flags UserPtr UserSize - state 16f462e0 000a 0000 [00] 16f46308 00018 - (busy) Trace: 074b 7405a6a7 verifier!AVrfpDphNormalHeapAllocate+0x000000d7 74058f6e verifier!AVrfDebugPageHeapAllocate+0x0000030e 77d10fe6 ntdll!RtlDebugAllocateHeap+0x00000030 77ccab8e ntdll!RtlpAllocateHeap+0x000000c4 77c73461 ntdll!RtlAllocateHeap+0x0000023a 664668e5 msvcr90!_calloc_impl+0x00000125 66463c5a msvcr90!calloc+0x0000001a 669baea1 MSTP_DLL!MSTP_Get_RPM_ACK_Data+0x00000091
这次很顺利,这个内存使用的地方实在MSTP_DLL的 MSTP_Get_RPM_ACK_Data里边;这个就是我们要找的最终的内存泄漏点信息;
同样操作堆10大小的数据块操作一遍
80.dmp > !heap -flt s 10 ... 15359fa0 0009 0009 [00] 15359fc8 00010 - (busy) 1535a2a0 0009 0009 [00] 1535a2c8 00010 - (busy) 1535a560 0009 0009 [00] 1535a588 00010 - (busy) 1535aee8 0009 0009 [00] 1535af10 00010 - (busy) 1535af80 0009 0009 [00] 1535afa8 00010 - (busy) 1535b018 0009 0009 [00] 1535b040 00010 - (busy) 1535b360 0009 0009 [00] 1535b388 00010 - (busy) 1535b620 0009 0009 [00] 1535b648 00010 - (busy) 1535c420 0009 0009 [00] 1535c448 00010 - (busy) 1535d220 0009 0009 [00] 1535d248 00010 - (busy) 1535d4e0 0009 0009 [00] 1535d508 00010 - (busy) 1535d7a0 0009 0009 [00] 1535d7c8 00010 - (busy) 1535da60 0009 0009 [00] 1535da88 00010 - (busy) 1535dd20 0009 0009 [00] 1535dd48 00010 - (busy) 1535dfe0 0009 0009 [00] 1535e008 00010 - (busy) 1535e2a0 0009 0009 [00] 1535e2c8 00010 - (busy) 1535e560 0009 0009 [00] 1535e588 00010 - (busy) 1535e820 0009 0009 [00] 1535e848 00010 - (busy) 1535eae0 0009 0009 [00] 1535eb08 00010 - (busy) 1535eda0 0009 0009 [00] 1535edc8 00010 - (busy) 1535f060 0009 0009 [00] 1535f088 00010 - (busy) 1535f320 0009 0009 [00] 1535f348 00010 - (busy) 1535f5e0 0009 0009 [00] 1535f608 00010 - (busy) ...
80.dmp 0:051> !heap -p -a 1535eb08 address 1535eb08 found in _HEAP @ 11b30000 HEAP_ENTRY Size Prev Flags UserPtr UserSize - state 1535eae0 0009 0000 [00] 1535eb08 00010 - (busy) Trace: 0817 7405a6a7 verifier!AVrfpDphNormalHeapAllocate+0x000000d7 74058f6e verifier!AVrfDebugPageHeapAllocate+0x0000030e 77d10fe6 ntdll!RtlDebugAllocateHeap+0x00000030 77ccab8e ntdll!RtlpAllocateHeap+0x000000c4 77c73461 ntdll!RtlAllocateHeap+0x0000023a 664668e5 msvcr90!_calloc_impl+0x00000125 66463c5a msvcr90!calloc+0x0000001a 669bb07b MSTP_DLL!MSTP_Get_RP_ACK_Data+0x0000003b 0:051> !heap -p -a 1535f088 address 1535f088 found in _HEAP @ 11b30000 HEAP_ENTRY Size Prev Flags UserPtr UserSize - state 1535f060 0009 0000 [00] 1535f088 00010 - (busy) Trace: 0817 7405a6a7 verifier!AVrfpDphNormalHeapAllocate+0x000000d7 74058f6e verifier!AVrfDebugPageHeapAllocate+0x0000030e 77d10fe6 ntdll!RtlDebugAllocateHeap+0x00000030 77ccab8e ntdll!RtlpAllocateHeap+0x000000c4 77c73461 ntdll!RtlAllocateHeap+0x0000023a 664668e5 msvcr90!_calloc_impl+0x00000125 66463c5a msvcr90!calloc+0x0000001a 669bb07b MSTP_DLL!MSTP_Get_RP_ACK_Data+0x0000003b 0:051> !heap -p -a 1535f348 address 1535f348 found in _HEAP @ 11b30000 HEAP_ENTRY Size Prev Flags UserPtr UserSize - state 1535f320 0009 0000 [00] 1535f348 00010 - (busy) Trace: 0817 7405a6a7 verifier!AVrfpDphNormalHeapAllocate+0x000000d7 74058f6e verifier!AVrfDebugPageHeapAllocate+0x0000030e 77d10fe6 ntdll!RtlDebugAllocateHeap+0x00000030 77ccab8e ntdll!RtlpAllocateHeap+0x000000c4 77c73461 ntdll!RtlAllocateHeap+0x0000023a 664668e5 msvcr90!_calloc_impl+0x00000125 66463c5a msvcr90!calloc+0x0000001a 669bb07b MSTP_DLL!MSTP_Get_RP_ACK_Data+0x0000003b
这次也顺利拿到另一个内存泄漏的位置信息在MSTP_DLL的 MSTP_Get_RP_ACK_Data里边;
MSTP_Get_RP_ACK_Data
MSTP_Get_RPM_ACK_Data
这两个方法其实是读取模块点数值或者收集模块信息的时候返回的一个数据指针;
现在很明显这两个方法返回的指针可能是有问题的,里边非常大的可能存在内存泄漏;
7、验证
跟同事找到原来的MSTP_DLL的源码,找到以上两个方法体
可以看到当初那位同事设计这个方法的时候,很明显有2个错误;
1)返回的指针只见声明内存空间,不见释放;
2)返回数据的指针不应该在方法体中的返回值中传出来,应该写在方法参数中,外部声明,传进去赋值,然后外部使用,再外部释放
3)两个方法体都一样的问题
五、整理
1)我们知道有三处内存泄漏,分别大小是1f0、18、10
2)三者占据99%的新增不释放的内存消耗
3)我们已经找到其中两个泄漏位置,还剩下一个
4)1f0是重中之重,占据内存消耗92%,不解决这个BUG,问题基本就相当于没解决
5)无法找到1f0的调用堆栈信息,无明显特征信息,无引用跟;
5)emmmmm? (第二声)
好像被我们错过了一个信息,
是否还记得最开始那一段?
80.dmp 0:051> !heap -stat -h 11b30000 heap @ 11b30000 group-by: TOTSIZE max-display: 20 size #blocks total ( %) (percent of total busy bytes) 1f0 102d9 - 1f58470 (92.48) 18 102b0 - 184080 (4.47) 10 102ae - 102ae0 (2.98)
这几个数据很接近,都是申请次数大小,也就是说着三个数据块被申请的次数差不多。。
鉴于此,我们再去看看33M内存的时候这几个次数的值是多少
33.dmp 0:047> !heap -s LFH Key : 0x343fce0b Termination on corruption : ENABLED Heap Flags Reserv Commit Virt Free List UCR Virt Lock Fast (k) (k) (k) (k) length blocks cont. heap ----------------------------------------------------------------------------- 00780000 00000002 8192 4636 8192 209 2484 4 0 e LFH 002e0000 00001002 256 4 256 2 1 1 0 0 00280000 00001002 1088 72 1088 5 2 2 0 0 00c70000 00041002 256 4 256 2 1 1 0 0 002d0000 00001002 1088 132 1088 8 23 2 0 0 00450000 00001002 256 4 256 0 1 1 0 0 07230000 00041002 256 4 256 2 1 1 0 0 00c10000 00001002 256 216 256 3 39 1 0 0 LFH 09b50000 00001002 256 80 256 39 28 1 0 0 09d00000 00001002 64 4 64 2 1 1 0 0 09ef0000 00001002 1088 72 1088 6 2 2 0 0 004c0000 00001002 1088 192 1088 15 140 2 0 0 09760000 00041002 256 28 256 4 4 1 0 0 09ed0000 00001002 64 12 64 1 1 1 0 0 0b210000 00001002 3136 1456 3136 52 84 3 0 0 LFH 0a700000 00001002 256 212 256 2 1 1 0 0 0e1e0000 00011002 256 4 256 0 1 1 0 0 0d030000 00001002 256 16 256 3 1 1 0 0 11b30000 00001002 1088 388 1088 0 1 2 0 0 ----------------------------------------------------------------------------- 0:047> !heap -stat -h 11b30000 heap @ 11b30000 group-by: TOTSIZE max-display: 20 size #blocks total ( %) (percent of total busy bytes) 1f0 1f2 - 3c4e0 (86.13) 18 1c9 - 2ad8 (3.82) 1000 2 - 2000 (2.86) 10 1c7 - 1c70 (2.54) 214 c - 18f0 (2.23) 800 2 - 1000 (1.43) 220 1 - 220 (0.19) 1d7 1 - 1d7 (0.16) 80 3 - 180 (0.13) a4 1 - a4 (0.06) 24 4 - 90 (0.05) 14 4 - 50 (0.03) 4a 1 - 4a (0.03) 25 2 - 4a (0.03) 48 1 - 48 (0.03) 46 1 - 46 (0.02) 41 1 - 41 (0.02) 3e 1 - 3e (0.02) 3c 1 - 3c (0.02) 37 1 - 37 (0.02)
分别是1f2、1c9、1c7;
1f0:102d9 - 1f2 = 65767
18:102b0 - 1c9 = 65767
10:102ae - 1c7 = 65767
居然申请的次数一模一样!
稳了!这个1f0可以断定与其他两个紧密相关;首先怀疑的就是
MSTP_Get_RP_ACK_Data
MSTP_Get_RPM_ACK_Data
1)这两个方法体中使用到的所有子方法体有没有申请空间的语句;
2)申请的空间大小是不是就是1f0;
依据上面的推测,再次阅读那2个方法体;
经过分析BACNET_APPLICATION_DATA_VALUE结构体大小刚好就是1f0
好了,搞定
如果对你有帮助,请点赞、评论;