web服务器挂死问题

web服务器卡死,登陆到后台查看问题; ps aux执行的时候发现卡死,

重新ssh 登陆 strace ps 发现如下结果:

使用gdb 调试也是卡死!

使用top -b 查看所有的进程,发现 之前的ps 的进程为D状态, 同时web服务器 部分线程进程为D状态;

dmesg 查看结果发现:

复制代码
[20761.085669] INFO: task apache2:7135 blocked for more than 120 seconds.
[20761.085675]       Tainted: G        W  O    #4
[20761.085677] "echo 0 > /proc/sys/kernel/hung_task_timeout_secs" disables this message.
[20761.085679] apache2         D ffffffc000086ef8     0  7135   4035 0x00000000
[20761.085683] Call trace:
[20761.085736] [<ffffffc000086ef8>] __switch_to+0xa0/0xb8
[20761.085767] [<ffffffc0009ff10c>] __schedule+0x24c/0x704
[20761.085769] [<ffffffc0009ff5fc>] schedule+0x38/0x90
[20761.085781] [<ffffffc000a0260c>] rwsem_down_write_failed+0x1d8/0x310
[20761.085783] [<ffffffc000a00928>] down_write+0x5c/0x74
[20761.085795] [<ffffffc0002188d4>] SyS_mprotect+0xb0/0x204
[20761.085797] [<ffffffc000085c74>] el0_svc_naked+0x24/0x28
[20761.085799] INFO: task apache2:7138 blocked for more than 120 seconds.
[20761.085800]       Tainted: G        W  O    YUN #4
[20761.085801] "echo 0 > /proc/sys/kernel/hung_task_timeout_secs" disables this message.
[20761.085802] apache2         D ffffffc000086ef8     0  7138   4035 0x00000000
[20761.085805] Call trace:
[20761.085807] [<ffffffc000086ef8>] __switch_to+0xa0/0xb8
[20761.085809] [<ffffffc0009ff10c>] __schedule+0x24c/0x704
[20761.085811] [<ffffffc0009ff5fc>] schedule+0x38/0x90
[20761.085813] [<ffffffc000a0260c>] rwsem_down_write_failed+0x1d8/0x310
[20761.085815] [<ffffffc000a00928>] down_write+0x5c/0x74
[20761.085818] [<ffffffc000249344>] split_huge_page_to_list+0x64/0x7e4
[20761.085819] [<ffffffc00024a570>] __split_huge_page_pmd+0x120/0x354
[20761.085821] [<ffffffc00020dc88>] unmap_single_vma+0x178/0x644
[20761.085823] [<ffffffc00020ec48>] zap_page_range+0xa8/0x114
[20761.085825] [<ffffffc000221150>] SyS_madvise+0x2f4/0x520
[20761.085827] [<ffffffc000085c74>] el0_svc_naked+0x24/0x28
[20761.085828] INFO: task apache2:7158 blocked for more than 120 seconds.
[20761.085830]       Tainted: G        W  O    server.YUN #4
[20761.085831] "echo 0 > /proc/sys/kernel/hung_task_timeout_secs" disables this message.
[20761.085832] apache2         D ffffffc000086ef8     0  7158   4035 0x00000008
[20761.085834] Call trace:
[20761.085836] [<ffffffc000086ef8>] __switch_to+0xa0/0xb8
[20761.085838] [<ffffffc0009ff10c>] __schedule+0x24c/0x704
[20761.085840] [<ffffffc0009ff5fc>] schedule+0x38/0x90
[20761.085842] [<ffffffc000a0260c>] rwsem_down_write_failed+0x1d8/0x310
[20761.085843] [<ffffffc000a00928>] down_write+0x5c/0x74
[20761.085845] [<ffffffc000249344>] split_huge_page_to_list+0x64/0x7e4
[20761.085847] [<ffffffc00024a570>] __split_huge_page_pmd+0x120/0x354
[20761.085849] [<ffffffc00020dc88>] unmap_single_vma+0x178/0x644
[20761.085850] [<ffffffc00020ec48>] zap_page_range+0xa8/0x114
[20761.085852] [<ffffffc000221150>] SyS_madvise+0x2f4/0x520
[20761.085854] [<ffffffc000085c74>] el0_svc_naked+0x24/0x28
[20761.085856] INFO: task ps:17403 blocked for more than 120 seconds.
[20761.085857]       Tainted: G        W  O     #4
复制代码

查看内核代码只接原因为:fs/proc/base.c 文件中的proc_pid_cmdline_read 函数执行如下代码发生获取信号量失败而导致休眠

down_read(&mm->mmap_sem);
    arg_start = mm->arg_start;
    arg_end = mm->arg_end;
    env_start = mm->env_start;
    env_end = mm->env_end;
    up_read(&mm->mmap_sem);
复制代码
void __sched down_read(struct rw_semaphore *sem)
{
    might_sleep();
    rwsem_acquire_read(&sem->dep_map, 0, 0, _RET_IP_);

    LOCK_CONTENDED(sem, __down_read_trylock, __down_read);
}
/*
 * lock for reading
 */
static inline void __down_read(struct rw_semaphore *sem)
{
    if (unlikely(atomic_long_inc_return_acquire((atomic_long_t *)&sem->count) <= 0))
        rwsem_down_read_failed(sem);
}
复制代码

  那是什么进程获取此sem没有释放呢?

目前怎样查看?------>首先需要获取内核的堆栈 

同时目前google 结果发现:内核有相关patch对此进行修改;见内核patch

 

posted @   codestacklinuxer  阅读(143)  评论(0编辑  收藏  举报
编辑推荐:
· AI与.NET技术实操系列:基于图像分类模型对图像进行分类
· go语言实现终端里的倒计时
· 如何编写易于单元测试的代码
· 10年+ .NET Coder 心语,封装的思维:从隐藏、稳定开始理解其本质意义
· .NET Core 中如何实现缓存的预热?
阅读排行:
· 25岁的心里话
· 闲置电脑爆改个人服务器(超详细) #公网映射 #Vmware虚拟网络编辑器
· 基于 Docker 搭建 FRP 内网穿透开源项目(很简单哒)
· 零经验选手,Compose 一天开发一款小游戏!
· 一起来玩mcp_server_sqlite,让AI帮你做增删改查!!
点击右上角即可分享
微信分享提示