linux kdump
根据反馈的sds日志中未发现硬件报错,OS下PMC Raid卡的驱动版本较老,需升级到最新;出现内存报错的机台情况为:当时机器在重启初始化阶段,内存在初始化所以出现报错;
9.26新收集的sds日志有问题,还请重新收集一下;
建议开启kdump收集异常重启日志信息,开启方法如下:
1. 确定kdump相关package已安装好
[root@server01 ~]# rpm -qa | grep kdump
system-config-kdump-2.0.5-18.el6.noarch
[root@server01 ~]# rpm -qa | grep kexec
kexec-tools-2.0.0-286.el6.x86_64
2. 修改启动参数,在内核行末尾添加红色部分
vim /etc/grub.conf
kernel /vmlinuz-2.6.32-573.el6.x86_64 …rhgb quiet crashkernel=128M
# 注意:上面是一行
3. 修改/etc/kdump.conf,配置dump文件保存在何处,默认保存在/var/crash下,可以不用修改,但要保证/var/crash下有足够磁盘空间
4. 设定kdump服务开机启动
chkconfig kdump on
5. 重启服务器使配置生效:reboot
6. 验证是否生效
128M内存不被正常的系统使用,为捕获内核保留,free -m的输出会显示内存比不加参数时少了128M;
[root@server01 ~]# service kdump status
Kdump is operational
7.进入/boot文件夹,删除initrd-2.6.32-573.el6.x86_64kdump.img,执行service kdump restart;
因为kdump文件之前使用的为老版本的驱动,所以在更新完驱动后,应重新生成initrd-2.6.32-573.el6.x86_64kdump.img文件来确保使用的为新版本驱动;
有什么进展吗?
有个局点,20多台服务器中,4台发生了重启,硬件日志没有什么报错。用户监控日志有内存报错,重启,agent无响应等问题,请帮忙看一下,以下是4台机器的简单分析 :
日志存放 :
ftp://00744532:HX5ej.ge@dropbox-huashan.h3c.com
问题一 :9-19日,10:59 监控系统发现无响应
Serial Number: 210200A00JN17A001976
SDS:11:15服务器启动(手动启动)
71 |
Informational |
1 |
0 |
0 |
2018-09-20 03:10:57 |
2018-09-19 19:10:57 |
EventType: Button / Switch, Event: Reset Button pressed, Data2: 1, Data3: 3 |
72 |
Informational |
1 |
0 |
0 |
2018-09-20 03:11:29 |
2018-09-19 19:11:29 |
EventType: System Event, Event: Timestamp Clock Synch, Data2: 0 |
73 |
Informational |
1 |
0 |
0 |
2018-09-19 19:15:24 |
2018-09-19 11:15:24 |
EventType: System Event, Event: Timestamp Clock Synch, Data2: 128 |
Sosreport:系统宕机期间没有任何记录
Sep 19 10:01:28 datanode08 sz[44045]: [root] solrconfig.xml_model/ZMODEM: 73126 Bytes, 29542 BPS
Sep 19 19:17:51 datanode08 kernel: imklog 5.8.10, log source = /proc/kmsg started.
Sep 19 19:17:51 datanode08 rsyslogd: [origin software="rsyslogd" swVersion="5.8.10" x-pid="5326" x-info="http://www.rsyslog.com"] start
Sep 19 19:17:51 datanode08 kernel: Initializing cgroup subsys cpuset
问题二:重启,监控报内存错误
Serial Number: 210200A00JN187000420
SDS:无硬件报错,但有重启记录
271 |
Informational |
1 |
0 |
0 |
2018-09-19 18:02:21 |
2018-09-19 10:02:21 |
EventType: System Event, Event: Timestamp Clock Synch, Data2: 0 |
272 |
Informational |
1 |
0 |
0 |
2018-09-19 18:02:45 |
2018-09-19 10:02:45 |
EventType: System Event, Event: Timestamp Clock Synch, Data2: 128 |
273 |
Informational |
1 |
0 |
0 |
2018-09-19 18:04:01 |
2018-09-19 10:04:01 |
EventType: OEM, Event: Adapter is ok., Data2: 255 |
274 |
Informational |
1 |
0 |
0 |
2018-09-19 18:04:02 |
2018-09-19 10:04:02 |
EventType: OEM, Event: Green Backup subsystem of adapter is ok., Data2: 255 |
275 |
Informational |
1 |
0 |
0 |
2018-09-19 19:58:55 |
2018-09-19 11:58:55 |
EventType: System Event, Event: Timestamp Clock Synch, Data2: 0 |
276 |
Informational |
1 |
0 |
0 |
2018-09-19 19:58:58 |
2018-09-19 11:58:58 |
EventType: System Event, Event: Timestamp Clock Synch, Data2: 128 |
277 |
Informational |
1 |
0 |
0 |
2018-09-19 20:00:07 |
2018-09-19 12:00:07 |
EventType: OEM, Event: Adapter is ok., Data2: 255 |
278 |
Informational |
1 |
0 |
0 |
2018-09-19 20:00:08 |
2018-09-19 12:00:08 |
EventType: OEM, Event: Green Backup subsystem of adapter is ok., Data2: 255 |
Sosreport :有重启记录时间基本吻合,没有看到MCE报错
Sep 19 03:14:16 datanode12 rhsmd: In order for Subscription Manager to provide your system with updates, your system must be registered with the Customer Portal. Please enter your Red Hat login to ensure your system is up-to-date.
Sep 19 06:40:02 datanode12 auditd[5151]: Audit daemon rotating log files
Sep 19 18:05:14 datanode12 kernel: imklog 5.8.10, log source = /proc/kmsg started.
Sep 19 18:05:14 datanode12 rsyslogd: [origin software="rsyslogd" swVersion="5.8.10" x-pid="5271" x-info="http://www.rsyslog.com"] start
Sep 19 18:05:14 datanode12 kernel: Initializing cgroup subsys cpuset
Sep 19 11:41:08 datanode12 ntpd[6161]: 0.0.0.0 0615 05 clock_sync
Sep 19 11:41:09 datanode12 ntpd[6161]: 0.0.0.0 c618 08 no_sys_peer
Sep 19 20:01:34 datanode12 kernel: imklog 5.8.10, log source = /proc/kmsg started.
Sep 19 20:01:34 datanode12 rsyslogd: [origin software="rsyslogd" swVersion="5.8.10" x-pid="5209" x-info="http://www.rsyslog.com"] start
Sep 19 20:01:34 datanode12 kernel: Initializing cgroup subsys cpuset
问题三:重启,内存报错
Serial Number: 210200A00JN187000424
SDS:服务器发生重启,但没看到内存报错
Informational |
1 |
0 |
0 |
2018-09-19 11:02:03 |
2018-09-19 03:02:03 |
EventType: System Event, Event: Timestamp Clock Synch, Data2: 0 |
Informational |
1 |
0 |
0 |
2018-09-19 11:02:23 |
2018-09-19 03:02:23 |
EventType: System Event, Event: Timestamp Clock Synch, Data2: 128 |
Informational |
1 |
0 |
0 |
2018-09-19 11:03:30 |
2018-09-19 03:03:30 |
EventType: OEM, Event: Adapter is ok., Data2: 255 |
Informational |
1 |
0 |
0 |
2018-09-19 11:03:32 |
2018-09-19 03:03:32 |
EventType: OEM, Event: Green Backup subsystem of adapter is ok., Data2: 255 |
Sosreport : 服务器重启
Sep 19 03:30:01 datanode09 auditd[5158]: Audit daemon rotating log files
Sep 19 11:04:54 datanode09 kernel: imklog 5.8.10, log source = /proc/kmsg started.
Sep 19 11:04:54 datanode09 rsyslogd: [origin software="rsyslogd" swVersion="5.8.10" x-pid="5200" x-info="http://www.rsyslog.com"] start
Sep 19 11:04:54 datanode09 kernel: Initializing cgroup subsys cpuset
Sep 19 11:04:54 datanode09 kernel: Initializing cgroup subsys cpu
问题四:监控agent无响应,ICMP不可达,重启
Chassis Serial Number=210235A1Y7N176000003
SDS:无硬件报错信息记录
Sosreport :系统发生重启
Sep 19 14:32:39 datanode05 sz[10421]: [taskctl] data.txt/ZMODEM: 14447460 Bytes, 1615222 BPS
Sep 19 23:56:22 datanode05 kernel: imklog 5.8.10, log source = /proc/kmsg started.
Sep 19 23:56:22 datanode05 rsyslogd: [origin software="rsyslogd" swVersion="5.8.10" x-pid="5243" x-info="http://www.rsyslog.com"] start
Sep 19 23:56:22 datanode05 kernel: Initializing cgroup subsys cpuset
【推荐】编程新体验,更懂你的AI,立即体验豆包MarsCode编程助手
【推荐】凌霞软件回馈社区,博客园 & 1Panel & Halo 联合会员上线
【推荐】抖音旗下AI助手豆包,你的智能百科全书,全免费不限次数
【推荐】博客园社区专享云产品让利特惠,阿里云新客6.5折上折
【推荐】轻量又高性能的 SSH 工具 IShell:AI 加持,快人一步
· 没有源码,如何修改代码逻辑?
· 一个奇形怪状的面试题:Bean中的CHM要不要加volatile?
· [.NET]调用本地 Deepseek 模型
· 一个费力不讨好的项目,让我损失了近一半的绩效!
· .NET Core 托管堆内存泄露/CPU异常的常见思路
· DeepSeek “源神”启动!「GitHub 热点速览」
· 微软正式发布.NET 10 Preview 1:开启下一代开发框架新篇章
· C# 集成 DeepSeek 模型实现 AI 私有化(本地部署与 API 调用教程)
· DeepSeek R1 简明指南:架构、训练、本地部署及硬件要求
· NetPad:一个.NET开源、跨平台的C#编辑器