docker中用createdump创建dump文件
运行docker镜像
docker run --name Gateway --privileged=true -p 888:8912 -d jackframework/jmsgateway
进入运行镜像的命令行
docker exec -it Gateway bash
查找createdump文件所在
find / -name createdump
用createdump 1命令生成dum文件
/usr/share/dotnet/shared/Microsoft.NETCore.App/6.0.11/createdump 1
输出这些:
[createdump] Gathering state for process 1 dotnet [createdump] Writing minidump with heap to file /tmp/coredump.1 [createdump] Written 285315072 bytes (69657 pages) to core file [createdump] Target process is alive [createdump] Dump successfully written
退出docker镜像命令
exit
拷贝Gateway虚拟机的文件到F:/dumpfiles盘
docker cp Gateway:/tmp/coredump.1 f:/dumpfiles/coredump.1
如果主机有dotnet-dump可以直接分析这个文件了,没有的话,可以创建一个临时虚拟机,安装.net sdk
先拉一个.net sdk的镜像
docker pull mcr.microsoft.com/dotnet/sdk:7.0
然后启动一个临时虚拟机
docker run --rm -it -v f:/dumpfiles:/tmp/coredump mcr.microsoft.com/dotnet/sdk:7.0
root@07ef0f7b35df:/# cd /tmp/coredump root@07ef0f7b35df:/tmp/coredump# ls coredump.1 root@07ef0f7b35df:/tmp/coredump#
安装dotnet-dump
dotnet tool install -g dotnet-dump
设置环境路径
export PATH="$PATH:/root/.dotnet/tools"
分析dump文件
cd /tmp/coredump/
dotnet-dump analyze coredump.1
>clrthreads
分析死锁
利用clrstack
命令输出调用堆栈。
> clrstack -all OS Thread Id: 0x311 Child SP IP Call Site 00007FCDE06DF530 00007fd1521e600c [GCFrame: 00007fcde06df530] 00007FCDE06DF620 00007fd1521e600c [GCFrame: 00007fcde06df620] 00007FCDE06DF680 00007fd1521e600c [HelperMethodFrame_1OBJ: 00007fcde06df680] System.Threading.Monitor.ReliableEnter(System.Object, Boolean ByRef) 00007FCDE06DF7D0 00007FD0DB4F765A testwebapi.Controllers.DiagScenarioController.<deadlock>b__3_1() 00007FCDE06DF800 00007FD0D7A25862 System.Threading.ThreadHelper.ThreadStart_Context(System.Object) [/_/src/System.Private.CoreLib/src/System/Threading/Thread.CoreCLR.cs @ 44] 00007FCDE06DF820 00007FD0DB03686D System.Threading.ExecutionContext.RunInternal(System.Threading.ExecutionContext, System.Threading.ContextCallback, System.Object) [/_/src/System.Private.CoreLib/shared/System/Threading/ExecutionContext.cs @ 201] 00007FCDE06DF870 00007FD0D7A2597E System.Threading.ThreadHelper.ThreadStart() [/_/src/System.Private.CoreLib/src/System/Threading/Thread.CoreCLR.cs @ 93] 00007FCDE06DFBD0 00007fd1512f849f [GCFrame: 00007fcde06dfbd0] 00007FCDE06DFCA0 00007fd1512f849f [DebuggerU2MCatchHandlerFrame: 00007fcde06dfca0] OS Thread Id: 0x312 Child SP IP Call Site 00007FCDDFEDE530 00007fd1521e600c [GCFrame: 00007fcddfede530] 00007FCDDFEDE620 00007fd1521e600c [GCFrame: 00007fcddfede620] 00007FCDDFEDE680 00007fd1521e600c [HelperMethodFrame_1OBJ: 00007fcddfede680] System.Threading.Monitor.ReliableEnter(System.Object, Boolean ByRef) 00007FCDDFEDE7D0 00007FD0DB4F765A testwebapi.Controllers.DiagScenarioController.<deadlock>b__3_1() 00007FCDDFEDE800 00007FD0D7A25862 System.Threading.ThreadHelper.ThreadStart_Context(System.Object) [/_/src/System.Private.CoreLib/src/System/Threading/Thread.CoreCLR.cs @ 44] 00007FCDDFEDE820 00007FD0DB03686D System.Threading.ExecutionContext.RunInternal(System.Threading.ExecutionContext, System.Threading.ContextCallback, System.Object) [/_/src/System.Private.CoreLib/shared/System/Threading/ExecutionContext.cs @ 201] 00007FCDDFEDE870 00007FD0D7A2597E System.Threading.ThreadHelper.ThreadStart() [/_/src/System.Private.CoreLib/src/System/Threading/Thread.CoreCLR.cs @ 93] 00007FCDDFEDEBD0 00007fd1512f849f [GCFrame: 00007fcddfedebd0] 00007FCDDFEDECA0 00007fd1512f849f [DebuggerU2MCatchHandlerFrame: 00007fcddfedeca0]
300多个线程的调用堆栈大多数线程共享一个公共调用堆栈,
该调用堆栈似乎显示请求传入了死锁方法,而死锁方法继而又调用了 Monitor.ReliableEnter,此方法表示这些线程正试图进入锁定,然而这个obj可能已被其他线程获取了排它锁了。
光看这个调用堆栈,太难看出问题... 但是结合源码就很容易看出DeadlockFunc这个方法早就已经将o1,o2两个object造成了交叉死锁,然后后面启动的300个线程都试图获取获取锁定,结果就是无限等待。
00007FCDDEEDC530 00007fd1521e600c [GCFrame: 00007fcddeedc530] 00007FCDDEEDC620 00007fd1521e600c [GCFrame: 00007fcddeedc620] 00007FCDDEEDC680 00007fd1521e600c [HelperMethodFrame_1OBJ: 00007fcddeedc680] System.Threading.Monitor.ReliableEnter(System.Object, Boolean ByRef) 00007FCDDEEDC7D0 00007FD0DB4F765A testwebapi.Controllers.DiagScenarioController.<deadlock>b__3_1() 00007FCDDEEDC800 00007FD0D7A25862 System.Threading.ThreadHelper.ThreadStart_Context(System.Object) [/_/src/System.Private.CoreLib/src/System/Threading/Thread.CoreCLR.cs @ 44] 00007FCDDEEDC820 00007FD0DB03686D System.Threading.ExecutionContext.RunInternal(System.Threading.ExecutionContext, System.Threading.ContextCallback, System.Object) [/_/src/System.Private.CoreLib/shared/System/Threading/ExecutionContext.cs @ 201] 00007FCDDEEDC870 00007FD0D7A2597E System.Threading.ThreadHelper.ThreadStart() [/_/src/System.Private.CoreLib/src/System/Threading/Thread.CoreCLR.cs @ 93] 00007FCDDEEDCBD0 00007fd1512f849f [GCFrame: 00007fcddeedcbd0] 00007FCDDEEDCCA0 00007fd1512f849f [DebuggerU2MCatchHandlerFrame: 00007fcddeedcca0]
利用syncblk
命令找出实际持有排它锁的线程。
> syncblk Index SyncBlock MonitorHeld Recursion Owning Thread Info SyncBlock Owner 25 0000000000E59BB8 603 1 00007FCE8C00FB30 1e8 14 00007fcea81ffb98 System.Object 26 0000000000E59C00 3 1 00007FD0C8001FA0 1e9 15 00007fcea81ffbb0 System.Object ----------------------------- Total 326 Free 307
Owning Thread Info
列下面有3个子列,第一列是地址,第二列是操作系统线程 ID,第三列是线程索引,也可以通过threads
命令拿到线程索引。
通过setthread
命令将线程切换到0x1e8上,然后用clrstack
查看它的调用堆栈。
> setthread 14 > clrstack OS Thread Id: 0x1e8 (14) Child SP IP Call Site 00007FCE77FFE500 00007fd1521e600c [GCFrame: 00007fce77ffe500] 00007FCE77FFE5F0 00007fd1521e600c [GCFrame: 00007fce77ffe5f0] 00007FCE77FFE650 00007fd1521e600c [HelperMethodFrame_1OBJ: 00007fce77ffe650] System.Threading.Monitor.Enter(System.Object) 00007FCE77FFE7A0 00007FD0DB4F5D0C testwebapi.Controllers.DiagScenarioController.DeadlockFunc() 00007FCE77FFE7E0 00007FD0DB4F5B27 testwebapi.Controllers.DiagScenarioController.<deadlock>b__3_0() 00007FCE77FFE800 00007FD0D7A25862 System.Threading.ThreadHelper.ThreadStart_Context(System.Object) [/_/src/System.Private.CoreLib/src/System/Threading/Thread.CoreCLR.cs @ 44] 00007FCE77FFE820 00007FD0DB03686D System.Threading.ExecutionContext.RunInternal(System.Threading.ExecutionContext, System.Threading.ContextCallback, System.Object) [/_/src/System.Private.CoreLib/shared/System/Threading/ExecutionContext.cs @ 201] 00007FCE77FFE870 00007FD0D7A2597E System.Threading.ThreadHelper.ThreadStart() [/_/src/System.Private.CoreLib/src/System/Threading/Thread.CoreCLR.cs @ 93] 00007FCE77FFEBD0 00007fd1512f849f [GCFrame: 00007fce77ffebd0] 00007FCE77FFECA0 00007fd1512f849f [DebuggerU2MCatchHandlerFrame: 00007fce77ffeca0]
这个线程的调用堆栈跟上面的300多个比起来多了一些信息,可以看出它通过DiagScenarioController.DeadlockFunc中的Monitor.Enter方法获得了某个object得排它锁。
分析内存泄漏
SOS命令:dotnet-dump 诊断工具 - .NET CLI | Microsoft Docs
检查当前所有托管类型的统计信息 -min [byte]
可选参数可以限制统计范围
> dumpheap -stat Statistics: MT Count TotalSize Class Name 00007fd0dc2b3ab0 1 24 System.Collections.Generic.ObjectEqualityComparer`1[[System.Threading.ThreadPoolWorkQueue+WorkStealingQueue, System.Private.CoreLib]] 00007fd0dc29ca28 1 24 System.Collections.Generic.ObjectEqualityComparer`1[[Microsoft.AspNetCore.Mvc.ModelBinding.Metadata.DefaultModelMetadata, Microsoft.AspNetCore.Mvc.Core]] ... 00007fd0d7f85510 482 160864 System.Object[] 0000000000da9d80 19191 7512096 Free 00007fd0dc2b3718 2 8388656 testwebapi.Controllers.Customer[] 00007fd0dadbeb58 1010240 24245760 testwebapi.Controllers.Customer 00007fd0d7f90f90 1012406 95190854 System.String
可在此处看到大多数是String或Customer对象。String对象占用了90MB左右的空间
使用方法表mt
分析占用最多的System.String类型
>dumpheap -mt 00007f61876a0f90 00007fcfabc376f8 00007fd0d7f90f90 94 00007fcfabc37770 00007fd0d7f90f90 94 00007fcfabc377e8 00007fd0d7f90f90 94 00007fcfabc37860 00007fd0d7f90f90 94 00007fcfabc378d8 00007fd0d7f90f90 94 ... Statistics: MT Count TotalSize Class Name 00007fd0d7f90f90 1012406 95190854 System.String Total 1012406 objects
一百万个对象,大小都是94byte,使用gcroot
命令随便查看几个对象的根。
> gcroot -all 00007fcfabc378d8 HandleTable: 00007FD1508815F8 (pinned handle) -> 00007FD0A7FFF038 System.Object[] -> 00007FCEA830A230 testwebapi.Controllers.Processor -> 00007FCEA830A248 testwebapi.Controllers.CustomerCache -> 00007FCEA830A260 System.Collections.Generic.List`1[[testwebapi.Controllers.Customer, DiagnosticScenarios]] -> 00007FD0B821F0A8 testwebapi.Controllers.Customer[] -> 00007FCFABC378C0 testwebapi.Controllers.Customer -> 00007FCFABC378D8 System.String Found 1 roots. >
基本上就是Customer,CustomerCache,Processor这几个对象造成的内存占用过高。
高cpu占用:
dotnet tool install --global dotnet-counters # 和dotnet-trace配合使用收集dotnet程序信息,参见https://docs.microsoft.com/zh-cn/dotnet/core/diagnostics/debug-highcpu?tabs=windows dotnet-trace ps # 获取pid dotnet-counters monitor --refresh-interval 1 -p 22884 # 监视CPU使用率 dotnet-trace collect -p 22884 --providers Microsoft-DotNETCore-SampleProfiler # --providers 是指定了所需的提供程序,是微软提供的应用程序,也可以不写 # 上一步收集,将收集.nettrace文件,可以使用vs 2022直接打开或者PerfView打开(推荐vs 2022) # 打开后将会展示出CPU占比,自己忽略掉system等系统路径,查看到自己的代码路径,然后定位具体方法
如果dotnet-trace安装了也用不了,那是环境变量没有配好
想永久生效环境变量:
1 cd #回到当前用户家目录
2 vim .bash_profile (ps:ubuntu下文件名为~/.profile)
3 在bash_profile文件末尾添加并保存以下命令:
export DOTNET_ROOT=$HOME/.dotnet #注意等号两边不能有空格
export PATH=$PATH:$HOME/.dotnet:$HOME/.dotnet/tools #注意等号两边不能有空格
参考:
调试内存泄漏教程 | Microsoft Docs https://docs.microsoft.com/zh-cn/dotnet/core/diagnostics/debug-memory-leak
利用dotnet-dump分析docker容器内存泄露 https://blog.csdn.net/gavinoldmen/article/details/121940948