docker中用createdump创建dump文件

运行docker镜像

docker run --name Gateway --privileged=true -p 888:8912 -d jackframework/jmsgateway

进入运行镜像的命令行

docker exec -it Gateway bash

查找createdump文件所在

find / -name createdump

用createdump 1命令生成dum文件

/usr/share/dotnet/shared/Microsoft.NETCore.App/6.0.11/createdump 1

输出这些:

[createdump] Gathering state for process 1 dotnet
[createdump] Writing minidump with heap to file /tmp/coredump.1
[createdump] Written 285315072 bytes (69657 pages) to core file
[createdump] Target process is alive
[createdump] Dump successfully written

退出docker镜像命令

exit

拷贝Gateway虚拟机的文件到F:/dumpfiles盘

docker cp Gateway:/tmp/coredump.1 f:/dumpfiles/coredump.1

如果主机有dotnet-dump可以直接分析这个文件了,没有的话,可以创建一个临时虚拟机,安装.net sdk

先拉一个.net sdk的镜像

docker pull mcr.microsoft.com/dotnet/sdk:7.0

然后启动一个临时虚拟机

docker run --rm -it -v f:/dumpfiles:/tmp/coredump mcr.microsoft.com/dotnet/sdk:7.0

root@07ef0f7b35df:/# cd /tmp/coredump
root@07ef0f7b35df:/tmp/coredump# ls
coredump.1
root@07ef0f7b35df:/tmp/coredump#

安装dotnet-dump

dotnet tool install -g dotnet-dump

设置环境路径

export PATH="$PATH:/root/.dotnet/tools"

分析dump文件

cd /tmp/coredump/

dotnet-dump analyze coredump.1

>clrthreads

分析死锁

 利用clrstack命令输出调用堆栈。

> clrstack -all
OS Thread Id: 0x311
        Child SP               IP Call Site
00007FCDE06DF530 00007fd1521e600c [GCFrame: 00007fcde06df530] 
00007FCDE06DF620 00007fd1521e600c [GCFrame: 00007fcde06df620] 
00007FCDE06DF680 00007fd1521e600c [HelperMethodFrame_1OBJ: 00007fcde06df680] System.Threading.Monitor.ReliableEnter(System.Object, Boolean ByRef)
00007FCDE06DF7D0 00007FD0DB4F765A testwebapi.Controllers.DiagScenarioController.<deadlock>b__3_1()
00007FCDE06DF800 00007FD0D7A25862 System.Threading.ThreadHelper.ThreadStart_Context(System.Object) [/_/src/System.Private.CoreLib/src/System/Threading/Thread.CoreCLR.cs @ 44]
00007FCDE06DF820 00007FD0DB03686D System.Threading.ExecutionContext.RunInternal(System.Threading.ExecutionContext, System.Threading.ContextCallback, System.Object) [/_/src/System.Private.CoreLib/shared/System/Threading/ExecutionContext.cs @ 201]
00007FCDE06DF870 00007FD0D7A2597E System.Threading.ThreadHelper.ThreadStart() [/_/src/System.Private.CoreLib/src/System/Threading/Thread.CoreCLR.cs @ 93]
00007FCDE06DFBD0 00007fd1512f849f [GCFrame: 00007fcde06dfbd0] 
00007FCDE06DFCA0 00007fd1512f849f [DebuggerU2MCatchHandlerFrame: 00007fcde06dfca0] 
OS Thread Id: 0x312
        Child SP               IP Call Site
00007FCDDFEDE530 00007fd1521e600c [GCFrame: 00007fcddfede530] 
00007FCDDFEDE620 00007fd1521e600c [GCFrame: 00007fcddfede620] 
00007FCDDFEDE680 00007fd1521e600c [HelperMethodFrame_1OBJ: 00007fcddfede680] System.Threading.Monitor.ReliableEnter(System.Object, Boolean ByRef)
00007FCDDFEDE7D0 00007FD0DB4F765A testwebapi.Controllers.DiagScenarioController.<deadlock>b__3_1()
00007FCDDFEDE800 00007FD0D7A25862 System.Threading.ThreadHelper.ThreadStart_Context(System.Object) [/_/src/System.Private.CoreLib/src/System/Threading/Thread.CoreCLR.cs @ 44]
00007FCDDFEDE820 00007FD0DB03686D System.Threading.ExecutionContext.RunInternal(System.Threading.ExecutionContext, System.Threading.ContextCallback, System.Object) [/_/src/System.Private.CoreLib/shared/System/Threading/ExecutionContext.cs @ 201]
00007FCDDFEDE870 00007FD0D7A2597E System.Threading.ThreadHelper.ThreadStart() [/_/src/System.Private.CoreLib/src/System/Threading/Thread.CoreCLR.cs @ 93]
00007FCDDFEDEBD0 00007fd1512f849f [GCFrame: 00007fcddfedebd0] 
00007FCDDFEDECA0 00007fd1512f849f [DebuggerU2MCatchHandlerFrame: 00007fcddfedeca0] 

300多个线程的调用堆栈大多数线程共享一个公共调用堆栈,
该调用堆栈似乎显示请求传入了死锁方法,而死锁方法继而又调用了 Monitor.ReliableEnter,此方法表示这些线程正试图进入锁定,然而这个obj可能已被其他线程获取了排它锁了。
光看这个调用堆栈,太难看出问题... 但是结合源码就很容易看出DeadlockFunc这个方法早就已经将o1,o2两个object造成了交叉死锁,然后后面启动的300个线程都试图获取获取锁定,结果就是无限等待。

00007FCDDEEDC530 00007fd1521e600c [GCFrame: 00007fcddeedc530] 
00007FCDDEEDC620 00007fd1521e600c [GCFrame: 00007fcddeedc620] 
00007FCDDEEDC680 00007fd1521e600c [HelperMethodFrame_1OBJ: 00007fcddeedc680] System.Threading.Monitor.ReliableEnter(System.Object, Boolean ByRef)
00007FCDDEEDC7D0 00007FD0DB4F765A testwebapi.Controllers.DiagScenarioController.<deadlock>b__3_1()
00007FCDDEEDC800 00007FD0D7A25862 System.Threading.ThreadHelper.ThreadStart_Context(System.Object) [/_/src/System.Private.CoreLib/src/System/Threading/Thread.CoreCLR.cs @ 44]
00007FCDDEEDC820 00007FD0DB03686D System.Threading.ExecutionContext.RunInternal(System.Threading.ExecutionContext, System.Threading.ContextCallback, System.Object) [/_/src/System.Private.CoreLib/shared/System/Threading/ExecutionContext.cs @ 201]
00007FCDDEEDC870 00007FD0D7A2597E System.Threading.ThreadHelper.ThreadStart() [/_/src/System.Private.CoreLib/src/System/Threading/Thread.CoreCLR.cs @ 93]
00007FCDDEEDCBD0 00007fd1512f849f [GCFrame: 00007fcddeedcbd0] 
00007FCDDEEDCCA0 00007fd1512f849f [DebuggerU2MCatchHandlerFrame: 00007fcddeedcca0]

利用syncblk命令找出实际持有排它锁的线程。

> syncblk                                                                                                                                                                                                                             
Index         SyncBlock MonitorHeld Recursion Owning Thread Info          SyncBlock Owner
   25 0000000000E59BB8          603         1 00007FCE8C00FB30 1e8  14   00007fcea81ffb98 System.Object
   26 0000000000E59C00            3         1 00007FD0C8001FA0 1e9  15   00007fcea81ffbb0 System.Object
-----------------------------
Total           326
Free            307

Owning Thread Info列下面有3个子列,第一列是地址,第二列是操作系统线程 ID,第三列是线程索引,也可以通过threads命令拿到线程索引。
通过setthread命令将线程切换到0x1e8上,然后用clrstack查看它的调用堆栈。

> setthread 14                                                                                                                                                                                                                        
> clrstack                                                                                                                                                                                                                            
OS Thread Id: 0x1e8 (14)
        Child SP               IP Call Site
00007FCE77FFE500 00007fd1521e600c [GCFrame: 00007fce77ffe500] 
00007FCE77FFE5F0 00007fd1521e600c [GCFrame: 00007fce77ffe5f0] 
00007FCE77FFE650 00007fd1521e600c [HelperMethodFrame_1OBJ: 00007fce77ffe650] System.Threading.Monitor.Enter(System.Object)
00007FCE77FFE7A0 00007FD0DB4F5D0C testwebapi.Controllers.DiagScenarioController.DeadlockFunc()
00007FCE77FFE7E0 00007FD0DB4F5B27 testwebapi.Controllers.DiagScenarioController.<deadlock>b__3_0()
00007FCE77FFE800 00007FD0D7A25862 System.Threading.ThreadHelper.ThreadStart_Context(System.Object) [/_/src/System.Private.CoreLib/src/System/Threading/Thread.CoreCLR.cs @ 44]
00007FCE77FFE820 00007FD0DB03686D System.Threading.ExecutionContext.RunInternal(System.Threading.ExecutionContext, System.Threading.ContextCallback, System.Object) [/_/src/System.Private.CoreLib/shared/System/Threading/ExecutionContext.cs @ 201]
00007FCE77FFE870 00007FD0D7A2597E System.Threading.ThreadHelper.ThreadStart() [/_/src/System.Private.CoreLib/src/System/Threading/Thread.CoreCLR.cs @ 93]
00007FCE77FFEBD0 00007fd1512f849f [GCFrame: 00007fce77ffebd0] 
00007FCE77FFECA0 00007fd1512f849f [DebuggerU2MCatchHandlerFrame: 00007fce77ffeca0] 

这个线程的调用堆栈跟上面的300多个比起来多了一些信息,可以看出它通过DiagScenarioController.DeadlockFunc中的Monitor.Enter方法获得了某个object得排它锁。
 

分析内存泄漏

SOS命令:dotnet-dump 诊断工具 - .NET CLI | Microsoft Docs
检查当前所有托管类型的统计信息 -min [byte]可选参数可以限制统计范围

> dumpheap -stat                                                                                                                                                                                                                      
Statistics:
              MT    Count    TotalSize Class Name
00007fd0dc2b3ab0        1           24 System.Collections.Generic.ObjectEqualityComparer`1[[System.Threading.ThreadPoolWorkQueue+WorkStealingQueue, System.Private.CoreLib]]
00007fd0dc29ca28        1           24 System.Collections.Generic.ObjectEqualityComparer`1[[Microsoft.AspNetCore.Mvc.ModelBinding.Metadata.DefaultModelMetadata, Microsoft.AspNetCore.Mvc.Core]]
...
00007fd0d7f85510      482       160864 System.Object[]
0000000000da9d80    19191      7512096      Free
00007fd0dc2b3718        2      8388656 testwebapi.Controllers.Customer[]
00007fd0dadbeb58  1010240     24245760 testwebapi.Controllers.Customer
00007fd0d7f90f90  1012406     95190854 System.String

可在此处看到大多数是String或Customer对象。String对象占用了90MB左右的空间
使用方法表mt分析占用最多的System.String类型

 
>dumpheap -mt 00007f61876a0f90
00007fcfabc376f8 00007fd0d7f90f90       94     
00007fcfabc37770 00007fd0d7f90f90       94     
00007fcfabc377e8 00007fd0d7f90f90       94     
00007fcfabc37860 00007fd0d7f90f90       94     
00007fcfabc378d8 00007fd0d7f90f90       94  
...
 
Statistics:
              MT    Count    TotalSize Class Name
00007fd0d7f90f90  1012406     95190854 System.String
Total 1012406 objects

一百万个对象,大小都是94byte,使用gcroot命令随便查看几个对象的根。

> gcroot -all 00007fcfabc378d8                                                                                                                                                                                                        
HandleTable:
    00007FD1508815F8 (pinned handle)
    -> 00007FD0A7FFF038 System.Object[]
    -> 00007FCEA830A230 testwebapi.Controllers.Processor
    -> 00007FCEA830A248 testwebapi.Controllers.CustomerCache
    -> 00007FCEA830A260 System.Collections.Generic.List`1[[testwebapi.Controllers.Customer, DiagnosticScenarios]]
    -> 00007FD0B821F0A8 testwebapi.Controllers.Customer[]
    -> 00007FCFABC378C0 testwebapi.Controllers.Customer
    -> 00007FCFABC378D8 System.String
 
Found 1 roots.
>

基本上就是Customer,CustomerCache,Processor这几个对象造成的内存占用过高。

高cpu占用:

dotnet tool install --global dotnet-counters
# 和dotnet-trace配合使用收集dotnet程序信息,参见https://docs.microsoft.com/zh-cn/dotnet/core/diagnostics/debug-highcpu?tabs=windows
dotnet-trace ps # 获取pid
dotnet-counters monitor --refresh-interval 1 -p 22884 # 监视CPU使用率
dotnet-trace collect -p 22884 --providers Microsoft-DotNETCore-SampleProfiler  # --providers 是指定了所需的提供程序,是微软提供的应用程序,也可以不写
# 上一步收集,将收集.nettrace文件,可以使用vs 2022直接打开或者PerfView打开(推荐vs 2022)
# 打开后将会展示出CPU占比,自己忽略掉system等系统路径,查看到自己的代码路径,然后定位具体方法

 如果dotnet-trace安装了也用不了,那是环境变量没有配好

想永久生效环境变量:
1 cd #回到当前用户家目录
2 vim .bash_profile (ps:ubuntu下文件名为~/.profile)
3 在bash_profile文件末尾添加并保存以下命令:

export DOTNET_ROOT=$HOME/.dotnet       #注意等号两边不能有空格

export PATH=$PATH:$HOME/.dotnet:$HOME/.dotnet/tools           #注意等号两边不能有空格

 4 source .bash_profile #立即生效

 

 

 

参考:
调试内存泄漏教程 | Microsoft Docs https://docs.microsoft.com/zh-cn/dotnet/core/diagnostics/debug-memory-leak

利用dotnet-dump分析docker容器内存泄露 https://blog.csdn.net/gavinoldmen/article/details/121940948

posted @ 2022-11-30 19:19  IWing  阅读(944)  评论(5编辑  收藏  举报