web服务器挂死问题
web服务器卡死,登陆到后台查看问题; ps aux执行的时候发现卡死,
重新ssh 登陆 strace ps 发现如下结果:
使用gdb 调试也是卡死!
使用top -b 查看所有的进程,发现 之前的ps 的进程为D状态, 同时web服务器 部分线程进程为D状态;
dmesg 查看结果发现:
[20761.085669] INFO: task apache2:7135 blocked for more than 120 seconds. [20761.085675] Tainted: G W O #4 [20761.085677] "echo 0 > /proc/sys/kernel/hung_task_timeout_secs" disables this message. [20761.085679] apache2 D ffffffc000086ef8 0 7135 4035 0x00000000 [20761.085683] Call trace: [20761.085736] [<ffffffc000086ef8>] __switch_to+0xa0/0xb8 [20761.085767] [<ffffffc0009ff10c>] __schedule+0x24c/0x704 [20761.085769] [<ffffffc0009ff5fc>] schedule+0x38/0x90 [20761.085781] [<ffffffc000a0260c>] rwsem_down_write_failed+0x1d8/0x310 [20761.085783] [<ffffffc000a00928>] down_write+0x5c/0x74 [20761.085795] [<ffffffc0002188d4>] SyS_mprotect+0xb0/0x204 [20761.085797] [<ffffffc000085c74>] el0_svc_naked+0x24/0x28 [20761.085799] INFO: task apache2:7138 blocked for more than 120 seconds. [20761.085800] Tainted: G W O YUN #4 [20761.085801] "echo 0 > /proc/sys/kernel/hung_task_timeout_secs" disables this message. [20761.085802] apache2 D ffffffc000086ef8 0 7138 4035 0x00000000 [20761.085805] Call trace: [20761.085807] [<ffffffc000086ef8>] __switch_to+0xa0/0xb8 [20761.085809] [<ffffffc0009ff10c>] __schedule+0x24c/0x704 [20761.085811] [<ffffffc0009ff5fc>] schedule+0x38/0x90 [20761.085813] [<ffffffc000a0260c>] rwsem_down_write_failed+0x1d8/0x310 [20761.085815] [<ffffffc000a00928>] down_write+0x5c/0x74 [20761.085818] [<ffffffc000249344>] split_huge_page_to_list+0x64/0x7e4 [20761.085819] [<ffffffc00024a570>] __split_huge_page_pmd+0x120/0x354 [20761.085821] [<ffffffc00020dc88>] unmap_single_vma+0x178/0x644 [20761.085823] [<ffffffc00020ec48>] zap_page_range+0xa8/0x114 [20761.085825] [<ffffffc000221150>] SyS_madvise+0x2f4/0x520 [20761.085827] [<ffffffc000085c74>] el0_svc_naked+0x24/0x28 [20761.085828] INFO: task apache2:7158 blocked for more than 120 seconds. [20761.085830] Tainted: G W O server.YUN #4 [20761.085831] "echo 0 > /proc/sys/kernel/hung_task_timeout_secs" disables this message. [20761.085832] apache2 D ffffffc000086ef8 0 7158 4035 0x00000008 [20761.085834] Call trace: [20761.085836] [<ffffffc000086ef8>] __switch_to+0xa0/0xb8 [20761.085838] [<ffffffc0009ff10c>] __schedule+0x24c/0x704 [20761.085840] [<ffffffc0009ff5fc>] schedule+0x38/0x90 [20761.085842] [<ffffffc000a0260c>] rwsem_down_write_failed+0x1d8/0x310 [20761.085843] [<ffffffc000a00928>] down_write+0x5c/0x74 [20761.085845] [<ffffffc000249344>] split_huge_page_to_list+0x64/0x7e4 [20761.085847] [<ffffffc00024a570>] __split_huge_page_pmd+0x120/0x354 [20761.085849] [<ffffffc00020dc88>] unmap_single_vma+0x178/0x644 [20761.085850] [<ffffffc00020ec48>] zap_page_range+0xa8/0x114 [20761.085852] [<ffffffc000221150>] SyS_madvise+0x2f4/0x520 [20761.085854] [<ffffffc000085c74>] el0_svc_naked+0x24/0x28 [20761.085856] INFO: task ps:17403 blocked for more than 120 seconds. [20761.085857] Tainted: G W O #4
查看内核代码只接原因为:fs/proc/base.c 文件中的proc_pid_cmdline_read 函数执行如下代码发生获取信号量失败而导致休眠
down_read(&mm->mmap_sem); arg_start = mm->arg_start; arg_end = mm->arg_end; env_start = mm->env_start; env_end = mm->env_end; up_read(&mm->mmap_sem);
void __sched down_read(struct rw_semaphore *sem) { might_sleep(); rwsem_acquire_read(&sem->dep_map, 0, 0, _RET_IP_); LOCK_CONTENDED(sem, __down_read_trylock, __down_read); } /* * lock for reading */ static inline void __down_read(struct rw_semaphore *sem) { if (unlikely(atomic_long_inc_return_acquire((atomic_long_t *)&sem->count) <= 0)) rwsem_down_read_failed(sem); }
那是什么进程获取此sem没有释放呢?
目前怎样查看?------>首先需要获取内核的堆栈
同时目前google 结果发现:内核有相关patch对此进行修改;见内核patch
http代理服务器(3-4-7层代理)-网络事件库公共组件、内核kernel驱动 摄像头驱动 tcpip网络协议栈、netfilter、bridge 好像看过!!!!
但行好事 莫问前程
--身高体重180的胖子
【推荐】国内首个AI IDE,深度理解中文开发场景,立即下载体验Trae
【推荐】编程新体验,更懂你的AI,立即体验豆包MarsCode编程助手
【推荐】抖音旗下AI助手豆包,你的智能百科全书,全免费不限次数
【推荐】轻量又高性能的 SSH 工具 IShell:AI 加持,快人一步
· AI与.NET技术实操系列:基于图像分类模型对图像进行分类
· go语言实现终端里的倒计时
· 如何编写易于单元测试的代码
· 10年+ .NET Coder 心语,封装的思维:从隐藏、稳定开始理解其本质意义
· .NET Core 中如何实现缓存的预热?
· 25岁的心里话
· 闲置电脑爆改个人服务器(超详细) #公网映射 #Vmware虚拟网络编辑器
· 基于 Docker 搭建 FRP 内网穿透开源项目(很简单哒)
· 零经验选手,Compose 一天开发一款小游戏!
· 一起来玩mcp_server_sqlite,让AI帮你做增删改查!!