记一次内存泄漏DUMP分析
自从进入一家创业公司以后,逐渐忙成狗,却无所收获,感觉自身的技术能力用武之地很少,工作生活都在业务逻辑中颠倒。
前些天线上服务内存吃紧,让运维把DUMP拿下来,分析一下聊以自慰。
先来统计一下大对象信息
0:000> !dumpheap -min 85000 -stat Statistics: MT Count TotalSize Class Name 000007feec34c168 7 57734750 System.Char[] 000007feec34aee0 14 115469904 System.String 00000000013032d0 101 621925414 Free Total 122 objects Fragmented blocks larger than 0.5 MB: Addr Size Followed by 000000010d382018 2.8MB 000000010d645e90 System.String 000000010d971aa8 1.8MB 000000010db43530 System.Random 000000010db70bd0 1.1MB 000000010dc8e238 System.String 000000010dd2f6a8 0.7MB 000000010ddd9160 System.Random 000000010ddd92e8 1.1MB 000000010dee8d38 System.Security.Cryptography.SafeHashHandle 000000010e223090 3.0MB 000000010e51dcc8 System.Random
看看字符串
0:000> !dumpheap -type System.String -min 85000 Address MT Size 00000004ffed5250 000007feec34aee0 12721650 0000000501f4aec0 000007feec34aee0 1322018 000000050208dae8 000007feec34aee0 1322022 00000005021d0710 000007feec34aee0 12726120 0000000502df3678 000007feec34aee0 12726124 00000005121b3168 000007feec34aee0 12726120 000000052001c2b0 000007feec34aee0 12721654 0000000521053930 000007feec34aee0 732168 00000005211b9120 000007feec34aee0 732168 00000005216efa08 000007feec34aee0 12726124 0000000522312978 000007feec34aee0 12726124 00000005307564d8 000007feec34aee0 4780744 0000000531074a50 000007feec34aee0 4780748 0000000531503d20 000007feec34aee0 12726120 Statistics: MT Count TotalSize Class Name 000007feec34aee0 14 115469904 System.String
查看字符串详情
0:000> !DumpObj /d 0000000501f4aec0 Name: System.String MethodTable: 000007feec34aee0 EEClass: 000007feebcb3720 Size: 1322018(0x142c22) bytes File: C:\Windows\Microsoft.Net\assembly\GAC_64\mscorlib\v4.0_4.0.0.0__b77a5c561934e089\mscorlib.dll String: {"SalePriceStrategyList":[{"SaleRuleID":5178,"StrategyTypeID":5,"StrategyTypeName":"标准售......
0:000> dc 00000004ffed5250 L1000 00000004`ffed5250 ec34aee0 000007fe 00610eec 0022007b ..4.......a.{.". 00000004`ffed5260 00720050 0064006f 00630075 00490074 P.r.o.d.u.c.t.I. 00000004`ffed5270 00220064 0034003a 00390037 002c0031 d.".:.4.7.9.1.,. 00000004`ffed5280 00500022 006f0072 00750064 00740063 ".P.r.o.d.u.c.t. 00000004`ffed5290 0061004e 0065006d 003a0022 4e3d0022 N.a.m.e.".:.".=N 00000004`ffed52a0 002d6c5f 683c9999 62c991cc 901a76f4 _l-...<h...b.v.. 00000004`ffed52b0 00228f66 0022002c 00750053 004e0062 f.".,.".S.u.b.N. 00000004`ffed52c0 006d0061 00220065 0022003a 002c0022 a.m.e.".:.".".,. 00000004`ffed52d0 00440022 00700065 00610052 0067006e ".D.e.p.R.a.n.g. 00000004`ffed52e0 004c0065 00730069 00220074 006e003a e.L.i.s.t.".:.n. 00000004`ffed52f0 006c0075 002c006c 00440022 00730065 u.l.l.,.".D.e.s. 00000004`ffed5300 00520074 006e0061 00650067 0069004c t.R.a.n.g.e.L.i. 00000004`ffed5310 00740073 003a0022 007b005b 00520022 s.t.".:.[.{.".R. 00000004`ffed5320 006e0061 00650067 00640049 003a0022 a.n.g.e.I.d.".:. 00000004`ffed5330 00330022 00220037 0022002c 00610052 ".3.7.".,.".R.a. 00000004`ffed5340 0067006e 004e0065 006d0061 00220065 n.g.e.N.a.m.e.". 00000004`ffed5350 0022003a 6c5f4e3d 002c0022 00450022 :.".=N_l".,.".E. 00000004`ffed5360 004e006e 006d0061 00220065 0022003a n.N.a.m.e.".:.". 00000004`ffed5370 0069004c 0069006a 006e0061 00220067 L.i.j.i.a.n.g.". 00000004`ffed5380 0022002c 00610052 0067006e 00540065 ,.".R.a.n.g.e.T. 00000004`ffed5390 00700079 00220065 0031003a 002c0036 y.p.e.".:.1.6.,.
发现是指纹或销控缓存反序列产生的
同理看看字符数组,结果类似。
继续,分析线程
0:000> !threads ThreadCount: 1710 UnstartedThread: 1 BackgroundThread: 122 PendingThread: 0 DeadThread: 1587 Hosted Runtime: no
发现deadthread很多,用类似的方式,发现这些线程的地址都在终结器队列GCHandle中,推测室短时间内AMQ批量触发而无法大量共用线程池中的现有线程引起新开辟了很多额外的线程。通过调用栈发现的确如此:
0:000> !clrstack OS Thread Id: 0x5718 (42) Child SP IP Call Site 0000000013e8e6c8 000000007706df6a [GCFrame: 0000000013e8e6c8] 0000000013e8e798 000000007706df6a [HelperMethodFrame_1OBJ: 0000000013e8e798] System.Threading.Monitor.ObjWait(Boolean, Int32, System.Object) 0000000013e8e8b0 000007fe9302c6da Apache.NMS.ActiveMQ.Threads.DedicatedTaskRunner.Run() 0000000013e8e930 000007feec19f8a5 System.Threading.ExecutionContext.RunInternal(System.Threading.ExecutionContext, System.Threading.ContextCallback, System.Object, Boolean) 0000000013e8ea90 000007feec19f609 System.Threading.ExecutionContext.Run(System.Threading.ExecutionContext, System.Threading.ContextCallback, System.Object, Boolean) 0000000013e8eac0 000007feec19f5c7 System.Threading.ExecutionContext.Run(System.Threading.ExecutionContext, System.Threading.ContextCallback, System.Object) 0000000013e8eb10 000007feec1b2d21 System.Threading.ThreadHelper.ThreadStart() 0000000013e8ee28 000007feed27f713 [GCFrame: 0000000013e8ee28] 0000000013e8f158 000007feed27f713 [DebuggerU2MCatchHandlerFrame: 0000000013e8f158] 0000000013e8f338 000007feed27f713 [ContextTransitionFrame: 0000000013e8f338] 0000000013e8f528 000007feed27f713 [DebuggerU2MCatchHandlerFrame: 0000000013e8f528]
推测是由于AMQ短时间内批量触发指纹、销控缓存更新引起。
本来想着手解决缓存反序列化大对象、改善AMQ批量触发开辟过多线程、以及是否有未退订的订阅等问题,不过产品过来说,业务码好了没》》》
顶顶顶顶顶顶顶顶顶顶顶顶顶顶顶顶顶顶顶顶