linux 3.10的kdump配置的小坑
之前在2.6系列linux内核中,当发现某个模块不要在保留内核中加载的时候,可以通过blacklist参数将其在/etc/kdump.conf中屏蔽
blacklist <list of kernel modules>
最近发现某个sas驱动存在问题,所以打算也这么屏蔽,结果,出错了:
[root@localhost ~]# service kdump restart Redirecting to /bin/systemctl restart kdump.service Job for kdump.service failed because the control process exited with error code. See "systemctl status kdump.service" and "journalctl -xe" for details. [root@localhost ~]# systemctl status kdump.service * kdump.service - Crash recovery kernel arming Loaded: loaded (/usr/lib/systemd/system/kdump.service; enabled; vendor preset: enabled) Active: failed (Result: exit-code) since Tue 2017-11-28 11:58:28 UTC; 10s ago Process: 60563 ExecStop=/usr/bin/kdumpctl stop (code=exited, status=0/SUCCESS) Process: 60572 ExecStart=/usr/bin/kdumpctl start (code=exited, status=1/FAILURE) Main PID: 60572 (code=exited, status=1/FAILURE) Nov 28 11:58:28 localhost.localdomain kdumpctl[60572]: Deprecated kdump config option: blacklist. Refer to kdump.conf manpage for alternatives. Nov 28 11:58:28 localhost.localdomain kdumpctl[60572]: Starting kdump: [FAILED] Nov 28 11:58:28 localhost.localdomain systemd[1]: kdump.service: main process exited, code=exited, status=1/FAILURE Nov 28 11:58:28 localhost.localdomain systemd[1]: Failed to start Crash recovery kernel arming. Nov 28 11:58:28 localhost.localdomain systemd[1]: Unit kdump.service entered failed state. Nov 28 11:58:28 localhost.localdomain systemd[1]: kdump.service failed. [root@localhost ~]# journalctl -xe Nov 28 11:58:28 localhost.localdomain kdumpctl[60563]: kexec: unloaded kdump kernel Nov 28 11:58:28 localhost.localdomain kdumpctl[60563]: Stopping kdump: [OK] Nov 28 11:58:28 localhost.localdomain kdumpctl[60572]: Deprecated kdump config option: blacklist. Refer to kdump.conf manpage for alternatives. Nov 28 11:58:28 localhost.localdomain kdumpctl[60572]: Starting kdump: [FAILED] Nov 28 11:58:28 localhost.localdomain systemd[1]: kdump.service: main process exited, code=exited, status=1/FAILURE Nov 28 11:58:28 localhost.localdomain systemd[1]: Failed to start Crash recovery kernel arming. -- Subject: Unit kdump.service has failed -- Defined-By: systemd -- Support: http://lists.freedesktop.org/mailman/listinfo/systemd-devel -- -- Unit kdump.service has failed. -- -- The result is failed. Nov 28 11:58:28 localhost.localdomain systemd[1]: Unit kdump.service entered failed state. Nov 28 11:58:28 localhost.localdomain systemd[1]: kdump.service failed. Nov 28 11:58:28 localhost.localdomain polkitd[2087]: Unregistered Authentication Agent for unix-process:60547:533046 (system bus name :1.5128, object path /org/freedesktop/PolicyKit1/AuthenticationAgent [root@localhost ~]#
发现blacklist是过时的用法了,然后参照提示:
man kdump.conf 看到如下打印:
blacklist option was recently being used to prevent loading modules in initramfs. General terminology for blacklist has been that module is present in initramfs but it is not actu- ally loaded in kernel. Hence retaining blacklist option creates more confusing behavior. It has been deprecated. Instead, use rd.driver.blacklist option on second kernel to blacklist a certain module. One can edit /etc/sysconfig/kdump.conf and edit KDUMP_COMMANDLINE_APPEND to pass kernel com- mand line options. Refer to dracut.cmdline man page for more details on module blacklist option.
好吧,按照最新的要求,打算修改/etc/sysconfig/kdump.conf,发现这个文件不存在,当然配置文件路径不是关键,/etc/kdump.conf里面配置也行,
我按照manpage的提示,修改文件名是/etc/sysconfig/kdump,然后修改KDUMP_COMMANDLINE_APPEND这行命令,具体的格式参考:
man dracut.cmdline
rd.driver.blacklist=<drivername>[,<drivername>,...] do not load kernel module <drivername>. This parameter can be specified multiple times. rd.driver.pre=<drivername>[,<drivername>,...] force loading kernel module <drivername>. This parameter can be specified multiple times. rd.driver.post=<drivername>[,<drivername>,...] force loading kernel module <drivername> after all automatic loading modules have been loaded. This parameter can be specified multiple times.
另外需要注意的是,当修改了配置,就要重启kdump服务,而这个时候,由于修改了blacklist,会导致重启的时候比较慢,因为在涉及到配置文件变动时,如生成路径修改或blacklist内容增加,都需要重新生成kdump的RAM文件,不然其在发生问题时还是使用老的img RAM文件,这类文件在/boot下以kdump.img结尾的文件就是:
[root@localhost ~]# ls -l /boot/*kdump* -rw------- 1 root root 16878919 Nov 29 01:02 /boot/initramfs-3.10.0-693.5.2.el7.x86_64kdump.img -rw------- 1 root root 35261890 Nov 27 07:04 /boot/initramfs-3.10.0caq1.0kdump.img -rw------- 1 root root 36508192 Nov 24 06:21 /boot/initramfs-3.10.0kdump.img [root@localhost ~]#
最后需要注意的就是,当配置的保留内核在加载驱动或者运行的时候,遇到panic,这个时候就再也没有内核去接管它了,只能在屏幕上打印,或者接串口查看。之前遇到过保留内存不够的
情况下,保留内核自己出现oom了,导致无法收集到crash,查看当前的保留内存可以使用:
[root@localhost ~]# cat /sys/kernel/kexec_crash_size
536870912
查看保留内核是否加载,可以使用:
[root@localhost ~]# cat /sys/kernel/kexec_crash_loaded
1