应用程序高级调试-请求挂起分析
在平时的开发过程中,遇到一个网关服务请求挂起问题,以此作为切入点,简单介绍一下Windbg这个调试工具,以及如何使用这个工具分析问题。
一、背景介绍
1、业务背景
最近在开发新的业务系统,采用微服务的框架,前后端分离;后端提供的SG服务,前端运用Vue开发页面。后端的SG服务使用的是C#语音,数据库,Redis和Sqlserver等。开发过程中,后端服务在VS中调试代码,单元测试等都非常顺利;部署到开发联调环境前后端联调也有序进行着。
2、技术架构
简单绘制了一下现有微服务的技术架构图。
3、遇到的问题
突然有一天,在开发联调环境进行测试联调时,发现每次调用后端的SG服务都超时,响应时长超过10s,服务也没有抛出异常信息,非常影响开发效率,重启SG 服务器没有解决问题。
于是,考虑抓个dump进行分析一下超时的原因。接下来先给大家普及一下dump的分析工具,这个问题的分析思路,然后详细说一下如何在dump文件中找到异常的具体过程。
二、 Windbg介绍
1、Windbg是个非常强大的调试器,它设计了极其丰富的功能来支持各种调试。针对几个常用的应用场景进行了对比分析
支持的类型 |
说明 |
应用场景 |
用户态调试 |
附加进程的方式调试。调试器与被调试程序建立联系,程序向调试器发送暂停和恢复调试指令。 |
类似VS的单步调试,可以是设置断点单步调制 |
内核态调试 |
用来在本地和远程计算机调试内核 |
1、在系统启动的早期阶段或者系统关闭的后期阶段当不存在交互式的控制台时调试进程; 2、分析进程间通信问题 |
转储文件调试
|
转储文件(Dump)是一个快照,它显示正在执行的进程和某个时刻为应用程序加载的模块。转储文件带有堆信息的转储还包括应用程序在该点的内存快照。 |
1、性能分析,内存泄漏,线程阻塞, 2、故障异常排查 3、进程Crash分析等 |
远程调试 |
通过调试服务器DbgSrv进行远程调试 |
1、程序运行需要时候全屏 2、程序在客户的机器上Crash崩溃 |
2、Windbg典型的窗口程序,但是它的大多数的调试功能,还是以输入命令进行的,命令不区分大小写。
命令分类 |
用途 |
数量 |
举例 |
说明 |
标准命令 |
适用于各种调试目标的最基本调试功能:查看,结束,帮助等
|
20多个 |
观察栈的命令K 显示线程的命令~ 显示进程的命令| 结束调试的命令q 显示标准命令的? |
1、通常是一两个字符或者符合,例外:version 2、部分命令代表一系列以这个字符开头的双字符命令
|
元命令 |
标准命令没有提供的调试功能。 |
140多个 |
加载模块 .loadby .load 显示已加载模块 .chain |
元命令是内置在调试器引擎或者程序文件中的,可以直接用的 都是一个点开始(.)也叫点命令 |
扩展命令 |
1、用于扩展某一方面的调试功能; 2、用户也可以编写自己的扩展模块和命令
|
难以计数 |
查看线程 !threads 查看对象信息:!do
|
使用扩展命令时,命令以!开头 完整的格式: ![扩展模块名].<扩展命令名>[参数] 其中扩展模块名可以省略 |
3、加载扩展的命令:
1)使用 .load 命令加上扩展模块的完整路径来加载它
2)使用.loadby命令加上扩展模块的名称,自动到扩展模块路径中搜索匹配的模块
3)使用!扩展模块名.扩展命令名的方式会自动搜索和加载指定的模块
4、默认的符号设置:
srv*c:\symcache*http://msdl.microsoft.com/download/symbolsc:\symcache
三、网关服务请求挂起问题的分析过程
首先说明一下这个问题的根因:Redis hang住导致服务连接超时。正常重启redis即可解决,为避免后续再发生类似的问题,修改redis默认设置问题。
• 问题具体分析的步骤:
1、打开dump文件,加载符号文件.loadby sos clr ; .load c:\mycache\mex.dll
0:000> .loadby sos clr 0:000> .load c:\symcache\mex.dll Mex External 3.0.0.7172 Loaded!
2、查看所有的线程!threads, 发现存在lock的46号线程。
0:000> !threads PDB symbol for clr.dll not loaded ThreadCount: 40 UnstartedThread: 0 BackgroundThread: 27 PendingThread: 0 DeadThread: 11 Hosted Runtime: no Lock ID OSID ThreadOBJ State GC Mode GC Alloc Context Domain Count Apt Exception 14 1 121e4 000001d447d948e0 28220 Preemptive 000001D5C85CC5A8:000001D5C85CE008 000001d447d8a0f0 0 Ukn 32 2 107d4 000001d447efa1e0 2b220 Preemptive 0000000000000000:0000000000000000 000001d447d8a0f0 0 MTA (Finalizer) 34 3 11318 000001d8d3ace3e0 102a220 Preemptive 0000000000000000:0000000000000000 000001d447d8a0f0 0 MTA (Threadpool Worker) 35 4 134dc 000001d8d3adf920 21220 Preemptive 0000000000000000:0000000000000000 000001d447d8a0f0 0 Ukn 37 6 101d8 000001d8d3bc89b0 1020220 Preemptive 0000000000000000:0000000000000000 000001d447d8a0f0 0 Ukn (Threadpool Worker) 39 7 b808 000001d8d3c68870 202b220 Preemptive 000001D6C81F6560:000001D6C81F7FD0 000001d8d3ade8e0 1 MTA 40 8 102c0 000001d8d3c67840 8029220 Preemptive 0000000000000000:0000000000000000 000001d447d8a0f0 0 MTA (Threadpool Completion Port) 41 10 10994 000001d8d3c70ae0 8029220 Preemptive 000001D7481920A0:000001D748193FD0 000001d447d8a0f0 0 MTA (Threadpool Completion Port) 22 9 145a8 000001d8d3c70310 8028220 Preemptive 000001D4486DA060:000001D4486DBFD0 000001d447d8a0f0 0 MTA (Threadpool Completion Port) 42 11 1425c 000001d8d3c712b0 202b220 Preemptive 000001D5C86C91B0:000001D5C86C9FD0 000001d8d3ade8e0 0 MTA 44 12 11a5c 000001d8d4ec22c0 3029220 Preemptive 000001D548294B00:000001D548295FD0 000001d8d3ade8e0 0 MTA (Threadpool Worker) 45 13 7ca0 000001d8d4ec2a90 3029220 Preemptive 000001D6484E75D0:000001D6484E8FD0 000001d8d3ade8e0 0 MTA (Threadpool Worker) 46 14 13320 000001d8d4ec6ca0 3029220 Preemptive 000001D7C8454AE8:000001D7C8455FD0 000001d8d3ade8e0 3 MTA (Threadpool Worker) 47 15 11098 000001d8d4ec7470 1029220 Preemptive 000001D6C8201300:000001D6C8201FD0 000001d447d8a0f0 0 MTA (Threadpool Worker) 48 16 a670 000001d8d4eca730 202b220 Preemptive 000001D4C8365CF8:000001D4C8365FD0 000001d8d3ade8e0 0 MTA 49 17 12b2c 000001d8d4ed4ca0 202b220 Preemptive 000001D6484E9330:000001D6484EAFD0 000001d8d3ade8e0 0 MTA 50 18 11dc8 000001d8d4eec8e0 202b220 Preemptive 000001D4486D01C8:000001D4486D1FD0 000001d8d3ade8e0 0 MTA 51 19 13da0 000001d8d3e58f60 2b020 Preemptive 000001D6C81FC508:000001D6C81FDFD0 000001d8d3ade8e0 1 MTA 52 20 cd10 000001d8d3e5ed20 202b220 Preemptive 0000000000000000:0000000000000000 000001d8d3ade8e0 0 MTA 53 21 101e8 000001d8d3e59730 202b220 Preemptive 000001D4C85C4CB8:000001D4C85C5FD0 000001d8d3ade8e0 0 MTA 54 22 2d90 000001d8d3e5fcc0 202b220 Preemptive 000001D4486E6178:000001D4486E7FD0 000001d8d3ade8e0 0 MTA 55 23 13a74 000001d8d4f16d90 202b220 Preemptive 000001D7C8413638:000001D7C8413FD0 000001d8d3ade8e0 0 MTA 56 24 1364c 000001d8d4f15df0 2b020 Preemptive 000001D6484EF488:000001D6484F0FD0 000001d8d3ade8e0 1 MTA 57 25 13890 000001d8d4f165c0 202b220 Preemptive 000001D6C81FE0A0:000001D6C81FFFD0 000001d8d3ade8e0 0 MTA 58 26 119d8 000001d8d4f18500 1029220 Preemptive 000001D4486E9C58:000001D4486E9FD0 000001d447d8a0f0 0 MTA (Threadpool Worker) 59 27 14614 000001d4475e59f0 1029220 Preemptive 000001D5485ACAD8:000001D5485ADFD0 000001d447d8a0f0 0 MTA (Threadpool Worker) XXXX 28 0 000001d4475e61c0 839820 Preemptive 0000000000000000:0000000000000000 000001d447d8a0f0 0 Ukn XXXX 29 0 000001d4475e2b10 839820 Preemptive 0000000000000000:0000000000000000 000001d447d8a0f0 0 Ukn XXXX 30 0 000001d4475e1b70 839820 Preemptive 0000000000000000:0000000000000000 000001d447d8a0f0 0 Ukn XXXX 31 0 000001d4475e6990 839820 Preemptive 0000000000000000:0000000000000000 000001d447d8a0f0 0 Ukn XXXX 32 0 000001d4475e32e0 839820 Preemptive 0000000000000000:0000000000000000 000001d447d8a0f0 0 Ukn XXXX 33 0 000001d4475e13a0 839820 Preemptive 0000000000000000:0000000000000000 000001d447d8a0f0 0 Ukn XXXX 34 0 000001d4475e3ab0 839820 Preemptive 0000000000000000:0000000000000000 000001d447d8a0f0 0 Ukn XXXX 35 0 000001d4475e2340 839820 Preemptive 0000000000000000:0000000000000000 000001d447d8a0f0 0 Ukn XXXX 36 0 000001d4475e5220 839820 Preemptive 0000000000000000:0000000000000000 000001d447d8a0f0 0 Ukn XXXX 37 0 000001d4475e4280 839820 Preemptive 0000000000000000:0000000000000000 000001d447d8a0f0 0 Ukn 60 38 46f4 000001d4475e4a50 1029220 Preemptive 0000000000000000:0000000000000000 000001d447d8a0f0 0 MTA (Threadpool Worker) 61 39 13d24 000001d4475e8100 1029220 Preemptive 0000000000000000:0000000000000000 000001d447d8a0f0 0 MTA (Threadpool Worker) XXXX 5 0 000001d8d4f93b70 839820 Preemptive 0000000000000000:0000000000000000 000001d447d8a0f0 0 Ukn 62 40 10980 000001d8d4f9a8d0 2b220 Preemptive 000001D5485D63B8:000001D5485D7FD0 000001d8d3ade8e0 0 MTA
3、查看线程的用户态执行时间,这里也发现46号线程在上面
0:000> !runaway User Mode Time Thread Time 14:121e4 0 days 0:00:00.625 46:13320 0 days 0:00:00.156 42:1425c 0 days 0:00:00.062 53:101e8 0 days 0:00:00.046 48:a670 0 days 0:00:00.031 55:13a74 0 days 0:00:00.015 32:107d4 0 days 0:00:00.015 0:116dc 0 days 0:00:00.015 62:10980 0 days 0:00:00.000 61:13d24 0 days 0:00:00.000 60:46f4 0 days 0:00:00.000 59:14614 0 days 0:00:00.000 58:119d8 0 days 0:00:00.000 57:13890 0 days 0:00:00.000 56:1364c 0 days 0:00:00.000 54:2d90 0 days 0:00:00.000 52:cd10 0 days 0:00:00.000 51:13da0 0 days 0:00:00.000 50:11dc8 0 days 0:00:00.000 49:12b2c 0 days 0:00:00.000 47:11098 0 days 0:00:00.000 45:7ca0 0 days 0:00:00.000 44:11a5c 0 days 0:00:00.000 43:14518 0 days 0:00:00.000 41:10994 0 days 0:00:00.000 40:102c0 0 days 0:00:00.000 39:b808 0 days 0:00:00.000 38:fee4 0 days 0:00:00.000 37:101d8 0 days 0:00:00.000 36:334 0 days 0:00:00.000 35:134dc 0 days 0:00:00.000 34:11318 0 days 0:00:00.000 33:eaf0 0 days 0:00:00.000 31:1858 0 days 0:00:00.000 30:11a98 0 days 0:00:00.000 29:e30 0 days 0:00:00.000 28:10840 0 days 0:00:00.000 27:117c0 0 days 0:00:00.000 26:b0f8 0 days 0:00:00.000 25:baf0 0 days 0:00:00.000 24:ffb8 0 days 0:00:00.000 23:11c14 0 days 0:00:00.000 22:145a8 0 days 0:00:00.000 21:145d4 0 days 0:00:00.000 20:12660 0 days 0:00:00.000 19:d7f0 0 days 0:00:00.000 18:230c 0 days 0:00:00.000 17:12378 0 days 0:00:00.000 16:8134 0 days 0:00:00.000 15:12914 0 days 0:00:00.000 13:11fd8 0 days 0:00:00.000 12:13044 0 days 0:00:00.000 11:3ef8 0 days 0:00:00.000 10:1160c 0 days 0:00:00.000 9:13a5c 0 days 0:00:00.000 8:11924 0 days 0:00:00.000 7:12068 0 days 0:00:00.000 6:1386c 0 days 0:00:00.000 5:e73c 0 days 0:00:00.000 4:13b34 0 days 0:00:00.000 3:f4a0 0 days 0:00:00.000 2:10cdc 0 days 0:00:00.000 1:14688 0 days 0:00:00.000
4、切换到46号线程 ~46s
0:000> ~46s ntdll!NtWaitForMultipleObjects+0x14: 00007ff8`d8ac67c4 c3 ret
5、查看线程的调用栈信息 !clrstack,发现访问redis获取配置信息时存在异常,并重新连接调用信息
0:046> !clrstack PDB symbol for clr.dll not loaded OS Thread Id: 0x13320 (46) Child SP IP Call Site 0000002f001fb3a8 00007ff8d8ac67c4 [HelperMethodFrame_1OBJ: 0000002f001fb3a8] System.Threading.Thread.JoinInternal(Int32) 0000002f001fb4b0 00007ff86780b33a ServiceStack.Redis.RedisNativeClient.Connect() 0000002f001fb580 00007ff86780b1d4 ServiceStack.Redis.RedisNativeClient.TryConnectIfNeeded() 0000002f001fb5c0 00007ff86781590a ServiceStack.Redis.RedisNativeClient.SendReceive[[System.__Canon, mscorlib]](Byte[][], System.Func`1<System.__Canon>, System.Action`1<System.Func`1<System.__Canon>>, Boolean) 0000002f001fb620 00007ff867815cb4 ServiceStack.Redis.RedisNativeClient.SendExpectData(Byte[][]) 0000002f001fb690 00007ff867817a22 ServiceStack.Redis.RedisNativeClient.get_Info() 0000002f001fb700 00007ff86780ae9e ServiceStack.Redis.RedisClient.GetServerRole() 0000002f001fb730 00007ff867809ec3 ServiceStack.Redis.RedisResolver.CreateRedisClient(ServiceStack.Redis.RedisEndpoint, Boolean) 0000002f001fb7e0 00007ff867809bed ServiceStack.Redis.PooledRedisClientManager.CreateRedisClient() 0000002f001fb850 00007ff867805d2c ServiceStack.Redis.PooledRedisClientManager.GetClient() 0000002f001fb8f0 00007ff867663e75 ****.****.****.RedisPoolManager.GetClient(System.String) 0000002f001fb9a0 00007ff867663bc4 ****.****.****.CacheService.GetClient() 0000002f001fb9e0 00007ff867662ccd ****.****.****.ServiceConfigCacheService.GetAvailableConfigByCache(Boolean) 0000002f001fbc30 00007ff8676619a1 ****.****.****.ServiceDAC.GetAvailableConfig() 0000002f001fbc70 00007ff867661646 ****.****.****.ServiceConfigCache.GetConfigFromCache() 0000002f001fbcb0 00007ff86766157c ****.****.****.ServiceConfigCache..ctor() 0000002f001fbd10 00007ff86765e778 ****.****.****.ServiceConfigCache.get_Current() 0000002f001fbd80 00007ff86765fadb ****.****.****.TCPRounter.Load() 0000002f001fbee0 00007ff86765ebce ****.****.****.TCPRounter..ctor() 0000002f001fbfc0 00007ff86765d7dd ****.****.****.RounterService.GetService(****.****.****.RounterContext, ****.****.****.ProxyModel, System.Collections.Generic.List`1<System.String>) 0000002f001fc090 00007ff86765de1b ****.****.****.TCPProxy.GetService[[System.__Canon, mscorlib]](System.Collections.Generic.List`1<****.****.****.SPI.ServiceConfig>, Boolean, System.Collections.Generic.List`1<System.String>) …………
6、!dso 查看线程栈上的托管对象。发现存在redis连接异常,并尝试连接的信息。
0:046> !dso OS Thread Id: 0x13320 (46) RSP/REG Object Name 0000002F001FAF50 000001d5482fd000 System.Web.Http.Filters.IExceptionFilter[] 0000002F001FB0D8 000001d5482a32c0 System.Web.AspNetSynchronizationContext 0000002F001FB1F8 000001d7c84548e0 System.Threading.ThreadStart 0000002F001FB248 000001d7c8454858 System.Threading.Thread 0000002F001FB290 000001d5482fd000 System.Web.Http.Filters.IExceptionFilter[] 0000002F001FB2C0 000001d7c8454858 System.Threading.Thread 0000002F001FB348 000001d7c8454818 System.Threading.ThreadStart 0000002F001FB368 000001d7c8454858 System.Threading.Thread 0000002F001FB3D8 000001d5482aa550 System.Security.Principal.GenericPrincipal 0000002F001FB3F0 000001d7c8454818 System.Threading.ThreadStart 0000002F001FB410 000001d5482fd000 System.Web.Http.Filters.IExceptionFilter[] 0000002F001FB420 000001d5482fd2f0 System.Web.Http.Controllers.ExceptionFilterResult 0000002F001FB428 000001d7c8452838 System.Random 0000002F001FB430 000001d54834dd30 System.Threading.ExecutionContext 0000002F001FB440 000001d7c8454858 System.Threading.Thread 0000002F001FB460 000001d5482fd2f0 System.Web.Http.Controllers.ExceptionFilterResult 0000002F001FB488 000001d5482aa550 System.Security.Principal.GenericPrincipal 0000002F001FB490 000001d5482fd2f0 System.Web.Http.Controllers.ExceptionFilterResult 0000002F001FB498 000001d7c8452838 System.Random 0000002F001FB4B0 000001d7c8453d48 System.Func`1[[System.Byte[], mscorlib]] 0000002F001FB4C0 000001d7c8454818 System.Threading.ThreadStart 0000002F001FB4C8 000001d7c8454858 System.Threading.Thread 0000002F001FB550 000001d7c8453d48 System.Func`1[[System.Byte[], mscorlib]] 0000002F001FB558 000001d7c8453140 ServiceStack.Redis.RedisClient 0000002F001FB560 000001d7c8453cd8 System.Byte[][] 0000002F001FB568 000001d7c8452838 System.Random 0000002F001FB580 000001d7c8453140 ServiceStack.Redis.RedisClient 0000002F001FB5A8 000001d7c8454618 ServiceStack.Redis.RedisRetryableException 0000002F001FB5B0 000001d7c8453cd8 System.Byte[][] 0000002F001FB5D0 000001d7c8454618 ServiceStack.Redis.RedisRetryableException 0000002F001FB5D8 000001d7c8453cd8 System.Byte[][] 0000002F001FB5E8 000001d7c8454148 ServiceStack.Redis.RedisRetryableException 0000002F001FB600 000001d7c8453140 ServiceStack.Redis.RedisClient 0000002F001FB608 000001d7c8453cd8 System.Byte[][] 0000002F001FB620 000001d7c8453140 ServiceStack.Redis.RedisClient 0000002F001FB628 000001d7c8453140 ServiceStack.Redis.RedisClient 0000002F001FB630 000001d7c8453cd8 System.Byte[][] 0000002F001FB638 000001d7c8453d48 System.Func`1[[System.Byte[], mscorlib]] 0000002F001FB650 000001d7c8452838 System.Random 0000002F001FB658 000001d7c8453cf8 System.Collections.Generic.Dictionary`2[[System.String, mscorlib],[System.String, mscorlib]] 0000002F001FB668 000001d7c8453140 ServiceStack.Redis.RedisClient 0000002F001FB670 000001d7c8453cd8 System.Byte[][] 0000002F001FB678 000001d7c8452838 System.Random 0000002F001FB680 000001d5482fd2f0 System.Web.Http.Controllers.ExceptionFilterResult 0000002F001FB690 000001d7c8453cd8 System.Byte[][] 0000002F001FB6A0 000001d5485ae110 System.Byte[] 0000002F001FB6B0 000001d7c8452838 System.Random 0000002F001FB6B8 000001d54858ec78 ServiceStack.Redis.RedisResolver 0000002F001FB6C8 000001d7c8453140 ServiceStack.Redis.RedisClient 0000002F001FB6E0 000001d5482fd000 System.Web.Http.Filters.IExceptionFilter[] 0000002F001FB6E8 000001d7c8452838 System.Random 0000002F001FB6F0 000001d5482fd2f0 System.Web.Http.Controllers.ExceptionFilterResult 0000002F001FB700 000001d5485a6c78 ServiceStack.Redis.RedisEndpoint 0000002F001FB718 000001d5485a6c78 ServiceStack.Redis.RedisEndpoint 0000002F001FB720 000001d5485a6c78 ServiceStack.Redis.RedisEndpoint 0000002F001FB788 000001d7c8453140 ServiceStack.Redis.RedisClient 0000002F001FB790 000001d7c8452950 ServiceStack.Redis.RedisClient 0000002F001FB7B0 000001d54858ec78 ServiceStack.Redis.RedisResolver 0000002F001FB7B8 000001d54858e7d8 ServiceStack.Redis.PooledRedisClientManager 0000002F001FB7C0 000001d7c8452818 ServiceStack.Redis.PooledRedisClientManager+<>c__DisplayClass77_0 0000002F001FB7C8 000001d7c8452838 System.Random 0000002F001FB7E0 000001d54858ec78 ServiceStack.Redis.RedisResolver 0000002F001FB810 000001d54858ebe8 System.Object 0000002F001FB828 000001d54858ebc8 System.Collections.Concurrent.ConcurrentStack`1[[ServiceStack.Redis.RedisClient, ServiceStack.Redis]] 0000002F001FB838 000001d5482fd1d0 System.Web.Http.ExceptionHandling.CompositeExceptionLogger 0000002F001FB868 000001d548334ad8 System.Object[] (System.Object[]) 0000002F001FB870 000001d7c84527f0 System.Diagnostics.Stopwatch 0000002F001FB8C8 000001d548334ad8 System.Object[] (System.Object[]) 0000002F001FB8D0 000001d7c84527f0 System.Diagnostics.Stopwatch
………………
7、使用命令!mex.do2查看redis的访问信息 ,连接的是本服务器上的redis
0:046> !mex.do2 000001d7c8452950 0x000001d7c8452950 ServiceStack.Redis.RedisClient [statics] 0000 endData : 000001d7c8452ad0 (System.Byte[]) [Length: 2] 0008 lastCommand : NULL 0010 lastSocketException : NULL 0018 socket : NULL 0020 Bstream : NULL 0028 sslStream : NULL 0030 transaction : 000001d7c8452988 (System.Void) 0038 pipeline : NULL 0040 <ClientManager>k__BackingField : NULL 0048 <Host>k__BackingField : 000001d5485a6c20 "127.0.0.1" [9] (System.String) 0050 <NamespacePrefix>k__BackingField : NULL 0058 <Password>k__BackingField : 000001d5485a6b60 "***********" [16] (System.String) 0060 <Client>k__BackingField : NULL 0068 <ConnectionFilter>k__BackingField : NULL 0070 <SendCmdFilter>k__BackingField : NULL 0078 cmdBuffer : 000001d7c8452af0 (System.Collections.Generic.List<System.ArraySegment<System.Byte>>) [Length: 0] 0080 currentBuffer : 000001d7c8452b18 (System.Byte[]) [Length: 1450] 0088 <OnBeforeFlush>k__BackingField : NULL 0090 deactivatedAtTicks : 0 (System.Int64) 0098 LastConnectedAtTimestamp : 0 (System.Int64) 00a0 <Id>k__BackingField : 0 (System.Int64) 00a8 db : 0 (System.Int64) 00b0 clientPort : 0 (System.Int32) 00b4 active : 0 (System.Int32) 00b8 <Port>k__BackingField : 6379 (System.Int32) 00bc <ConnectTimeout>k__BackingField : -1 (System.Int32) 00c0 <RetryCount>k__BackingField : 0 (System.Int32) 00c4 <SendTimeout>k__BackingField : -1 (System.Int32) 00c8 <ReceiveTimeout>k__BackingField : -1 (System.Int32) 00cc <IdleTimeOutSecs>k__BackingField : 240 (System.Int32) 00d0 currentBufferIndex : 0 (System.Int32) 00d4 <Ssl>k__BackingField : False (System.Boolean) 00d5 <IsDisposed>k__BackingField : False (System.Boolean) 00d8 <SslProtocols>k__BackingField : 000001d7c8452a30 (System.Nullable<System.Security.Authentication.SslProtocols>) 00e0 retryTimeout : 000001d7c8452a38 00:00:03 (System.TimeSpan) 00e8 <Name>k__BackingField : NULL 00f0 OnDispose : NULL 00f8 registeredTypeIdsWithinPipelineMap : 000001d7c8452a80 (System.Collections.Generic.Dictionary<System.String,System.Collections.Generic.HashSet<System.String>>) 0100 <Hashes>k__BackingField : 000001d7c8453128 (ServiceStack.Redis.RedisClient+RedisClientHashes) 0108 <Lists>k__BackingField : 000001d7c84530e0 (ServiceStack.Redis.RedisClient+RedisClientLists) 0110 <Sets>k__BackingField : 000001d7c84530f8 (ServiceStack.Redis.RedisClient+RedisClientSets) 0118 <SortedSets>k__BackingField : 000001d7c8453110 (ServiceStack.Redis.RedisClient+RedisClientSortedSets)
8、使用命令!mex.do2查看连接redis抛出的异常, 提示连接redis异常
0:046> !mex.do2 000001d7c8454618 0x000001d7c8454618 ServiceStack.Redis.RedisRetryableException 0000 _className : NULL 0008 _exceptionMethod : NULL 0010 _exceptionMethodString : NULL 0018 _message : 000001d5485afca8 "Socket is not connected" [23] (System.String) 0020 _data : NULL 0028 _innerException : NULL 0030 _helpURL : NULL 0038 _stackTrace : 000001d7c84546f8 (System.SByte[]) [Length: 48] 0040 _watsonBuckets : NULL 0048 _stackTraceString : NULL 0050 _remoteStackTraceString : NULL 0058 _dynamicMethods : NULL 0060 _source : NULL 0068 _safeSerializationManager : 000001d7c84546c0 (System.Runtime.Serialization.SafeSerializationManager) 0070 _xptrs : 0000000000000000 (System.IntPtr) 0078 _ipForWatsonBuckets : 00007ff867815a33 (System.UIntPtr) 0080 _remoteStackIndex : 0 (System.Int32) 0084 _HResult : -2146233088 (System.Int32) 0088 _xcode : -532462766 (System.Int32) 0090 <Code>k__BackingField : NULL
9、问题根因:检查Redis运行窗口,发现redis没有正常启动,重新启动redis,服务访问恢复正常
检测redis 窗口时,发现Redis为hang住的状态,如下图所示:
重启Redis,下图是正常运行的效果:
确定到问题根因: redis属性中【快速编辑模型模式】属性设置为勾选的情况下,启动过程中,点击页面中的位置,会中止启动,此时redis为不可用状态。
四、问题分析方法总结:
1、找和问题相关的因素(锁,运行时间等)。
2、定位疑似存在问题的线程,查看线程的堆栈。
3、查看堆栈上的托管对象,看到明细的异常信息。
4、明确问题根因 。
5、修复验证。
以上是这个网关服务请求挂起问题的总结。借助WindDbg调试工具,分析Dump文件,通过分析业务逻辑内部的执行情况,找到问题的根因。把这个工具和分析过程分享给大家,希望对大家有所帮助。
张宁涛
2022/05/05 更新