记录一次docker导致宿主机重启故障解决方法
操作系统环境:CentOS Linux release 7.4.1708 (Core)
内核版本:3.10.0-693.el7.x86_64
查看系统日志/var/log/messages
Jan 5 15:50:01 hanginx01 systemd: Started Session 196 of user root. Jan 5 15:50:01 hanginx01 systemd: Starting Session 196 of user root. Jan 5 15:50:11 hanginx01 dockerd: time="2020-01-05T15:50:11.479595119+08:00" level=info msg="Container d1c1b175808a9e91137eda25b18ffc6c0c48a416fddd29ffc14905e0c1de2cbd failed to exit within 10 seconds of signal 15 - using the force" Jan 5 15:50:11 hanginx01 containerd: time="2020-01-05T15:50:11.670173843+08:00" level=info msg="shim reaped" id=d1c1b175808a9e91137eda25b18ffc6c0c48a416fddd29ffc14905e0c1de2cbd Jan 5 15:50:11 hanginx01 dockerd: time="2020-01-05T15:50:11.679960565+08:00" level=info msg="ignoring event" module=libcontainerd namespace=moby topic=/tasks/delete type="*events.TaskDelete" Jan 5 15:50:11 hanginx01 kernel: docker0: port 1(vethb1da919) entered disabled state Jan 5 15:50:11 hanginx01 kernel: docker0: port 1(vethb1da919) entered disabled state Jan 5 15:50:11 hanginx01 kernel: device vethb1da919 left promiscuous mode Jan 5 15:50:11 hanginx01 kernel: docker0: port 1(vethb1da919) entered disabled state Jan 5 15:50:11 hanginx01 NetworkManager[1028]: <info> [1578210611.8637] device (vethf33af93): driver 'veth' does not support carrier detection. Jan 5 15:50:11 hanginx01 NetworkManager[1028]: <info> [1578210611.8652] manager: (vethf33af93): new Veth device (/org/freedesktop/NetworkManager/Devices/445) Jan 5 15:50:11 hanginx01 NetworkManager[1028]: <info> [1578210611.8680] device (vethb1da919): released from master device docker0 Jan 5 15:50:11 hanginx01 dockerd: time="2020-01-05T15:50:11.890774286+08:00" level=warning msg="d1c1b175808a9e91137eda25b18ffc6c0c48a416fddd29ffc14905e0c1de2cbd cleanup: failed to unmount IP C: umount /home/docker/containers/d1c1b175808a9e91137eda25b18ffc6c0c48a416fddd29ffc14905e0c1de2cbd/mounts/shm, flags: 0x2: no such file or directory" Jan 5 15:50:11 hanginx01 kernel: docker0: port 1(veth0d967d8) entered blocking state Jan 5 15:50:11 hanginx01 kernel: docker0: port 1(veth0d967d8) entered disabled state Jan 5 15:50:11 hanginx01 kernel: device veth0d967d8 entered promiscuous mode Jan 5 15:50:11 hanginx01 kernel: IPv6: ADDRCONF(NETDEV_UP): veth0d967d8: link is not ready Jan 5 15:50:11 hanginx01 kernel: docker0: port 1(veth0d967d8) entered blocking state Jan 5 15:50:11 hanginx01 kernel: docker0: port 1(veth0d967d8) entered forwarding state Jan 5 15:50:11 hanginx01 NetworkManager[1028]: <info> [1578210611.9005] manager: (veth4feb836): new Veth device (/org/freedesktop/NetworkManager/Devices/446) Jan 5 15:50:11 hanginx01 NetworkManager[1028]: <info> [1578210611.9018] manager: (veth0d967d8): new Veth device (/org/freedesktop/NetworkManager/Devices/447) Jan 5 15:50:11 hanginx01 containerd: time="2020-01-05T15:50:11.918219527+08:00" level=info msg="shim containerd-shim started" address="/containerd-shim/moby/d1c1b175808a9e91137eda25b18ffc6c0 c48a416fddd29ffc14905e0c1de2cbd/shim.sock" debug=false pid=28696 Jan 5 15:50:11 hanginx01 kernel: IPVS: Creating netns size=2040 id=150 Jan 5 15:50:12 hanginx01 NetworkManager[1028]: <info> [1578210612.0821] device (veth0d967d8): link connected Jan 5 15:50:12 hanginx01 NetworkManager[1028]: <info> [1578210612.0822] device (docker0): link connected Jan 5 15:50:12 hanginx01 kernel: IPv6: ADDRCONF(NETDEV_CHANGE): veth0d967d8: link becomes ready Jan 5 15:50:14 hanginx01 ntpd[1046]: Listen normally on 156 veth0d967d8 fe80::847:2cff:feb9:1a90 UDP 123 Jan 5 15:50:14 hanginx01 ntpd[1046]: Deleting interface #155 vethb1da919, fe80::b81f:e0ff:fe16:22bb#123, interface stats: received=0, sent=0, dropped=0, active_time=605 secs Jan 5 15:55:25 hanginx01 kernel: microcode: microcode updated early to revision 0xb000021, date = 2017-03-01 Jan 5 15:55:25 hanginx01 kernel: Initializing cgroup subsys cpuset Jan 5 15:55:25 hanginx01 kernel: Initializing cgroup subsys cpu Jan 5 15:55:25 hanginx01 kernel: Initializing cgroup subsys cpuacct Jan 5 15:55:25 hanginx01 kernel: Linux version 3.10.0-693.el7.x86_64 (builder@kbuilder.dev.centos.org) (gcc version 4.8.5 20150623 (Red Hat 4.8.5-16) (GCC) ) #1 SMP Tue Aug 22 21:09:27 UTC 2 017 Jan 5 15:55:25 hanginx01 kernel: Command line: BOOT_IMAGE=/vmlinuz-3.10.0-693.el7.x86_64 root=/dev/mapper/centos-root ro crashkernel=auto rd.lvm.lv=centos/root rd.lvm.lv=centos/swap rhgb qui et LANG=en_US.UTF-8 Jan 5 15:55:25 hanginx01 kernel: e820: BIOS-provided physical RAM map: Jan 5 15:55:25 hanginx01 kernel: BIOS-e820: [mem 0x0000000000000000-0x000000000009afff] usable Jan 5 15:55:25 hanginx01 kernel: BIOS-e820: [mem 0x000000000009b000-0x000000000009ffff] reserved Jan 5 15:55:25 hanginx01 kernel: BIOS-e820: [mem 0x00000000000e0000-0x00000000000fffff] reserved Jan 5 15:55:25 hanginx01 kernel: BIOS-e820: [mem 0x0000000000100000-0x0000000078888fff] usable Jan 5 15:55:25 hanginx01 kernel: BIOS-e820: [mem 0x0000000078889000-0x0000000079a3afff] reserved Jan 5 15:55:25 hanginx01 kernel: BIOS-e820: [mem 0x0000000079a3b000-0x0000000079a9efff] ACPI data Jan 5 15:55:25 hanginx01 kernel: BIOS-e820: [mem 0x0000000079a9f000-0x0000000079ff9fff] ACPI NVS Jan 5 15:55:25 hanginx01 kernel: BIOS-e820: [mem 0x0000000079ffa000-0x000000008fffffff] reserved Jan 5 15:55:25 hanginx01 kernel: BIOS-e820: [mem 0x00000000fed1c000-0x00000000fed44fff] reserved Jan 5 15:55:25 hanginx01 kernel: BIOS-e820: [mem 0x00000000ff000000-0x00000000ffffffff] reserved Jan 5 15:55:25 hanginx01 kernel: BIOS-e820: [mem 0x0000000100000000-0x000000407fffffff] usable Jan 5 15:55:25 hanginx01 kernel: NX (Execute Disable) protection: active Jan 5 15:55:25 hanginx01 kernel: SMBIOS 3.0 present. Jan 5 15:55:25 hanginx01 kernel: e820: last_pfn = 0x4080000 max_arch_pfn = 0x400000000 Jan 5 15:55:25 hanginx01 kernel: x86 PAT enabled: cpu 0, old 0x7040600070406, new 0x7010600070106 Jan 5 15:55:25 hanginx01 kernel: e820: last_pfn = 0x78889 max_arch_pfn = 0x400000000 Jan 5 15:55:25 hanginx01 kernel: found SMP MP-table at [mem 0x000fdcb0-0x000fdcbf] mapped at [ffff8800000fdcb0] Jan 5 15:55:25 hanginx01 kernel: Using GB pages for direct mapping Jan 5 15:55:25 hanginx01 kernel: RAMDISK: [mem 0x357ea000-0x36becfff] Jan 5 15:55:25 hanginx01 kernel: Early table checksum verification disabled Jan 5 15:55:25 hanginx01 kernel: ACPI: RSDP 00000000000f0530 00024 (v02 ALASKA) Jan 5 15:55:25 hanginx01 kernel: ACPI: XSDT 0000000079a4f098 000B4 (v01 ALASKA A M I 01072009 AMI 00010013) Jan 5 15:55:25 hanginx01 kernel: ACPI: FACP 0000000079a838d0 0010C (v05 ALASKA A M I 01072009 AMI 00010013) Jan 5 15:55:25 hanginx01 kernel: ACPI: DSDT 0000000079a4f1e0 346EE (v02 ALASKA A M I 01072009 INTL 20091013) Jan 5 15:55:25 hanginx01 kernel: ACPI: FACS 0000000079ff8f80 00040 Jan 5 15:55:25 hanginx01 kernel: ACPI: APIC 0000000079a839e0 00224 (v03 ALASKA A M I 01072009 AMI 00010013) Jan 5 15:55:25 hanginx01 kernel: ACPI: FPDT 0000000079a83c08 00044 (v01 ALASKA A M I 01072009 AMI 00010013) Jan 5 15:55:25 hanginx01 kernel: ACPI: FIDT 0000000079a83c50 0009C (v01 ALASKA A M I 01072009 AMI 00010013) Jan 5 15:55:25 hanginx01 kernel: ACPI: SPMI 0000000079a83cf0 00041 (v05 ALASKA A M I 00000000 AMI. 00000000) Jan 5 15:55:25 hanginx01 kernel: ACPI: MCFG 0000000079a83d38 0003C (v01 ALASKA A M I 01072009 MSFT 00000097) Jan 5 15:55:25 hanginx01 kernel: ACPI: UEFI 0000000079a83d78 00042 (v01 ALASKA A M I 01072009 00000000) Jan 5 15:55:25 hanginx01 kernel: ACPI: HPET 0000000079a83dc0 00038 (v01 ALASKA A M I 00000001 INTL 20091013) Jan 5 15:55:25 hanginx01 kernel: ACPI: MSCT 0000000079a83df8 00090 (v01 ALASKA A M I 00000001 INTL 20091013) Jan 5 15:55:25 hanginx01 kernel: ACPI: SLIT 0000000079a83e88 00030 (v01 ALASKA A M I 00000001 INTL 20091013) Jan 5 15:55:25 hanginx01 kernel: ACPI: SRAT 0000000079a83eb8 01158 (v03 ALASKA A M I 00000001 INTL 20091013)
Jan 5 15:55:25 hanginx01 kernel: PM: Registered nosave memory: [mem 0x90000000-0xfed1bfff] Jan 5 15:55:25 hanginx01 kernel: PM: Registered nosave memory: [mem 0xfed1c000-0xfed44fff] Jan 5 15:55:25 hanginx01 kernel: PM: Registered nosave memory: [mem 0xfed45000-0xfeffffff] Jan 5 15:55:25 hanginx01 kernel: PM: Registered nosave memory: [mem 0xff000000-0xffffffff] Jan 5 15:55:25 hanginx01 kernel: e820: [mem 0x90000000-0xfed1bfff] available for PCI devices Jan 5 15:55:25 hanginx01 kernel: Booting paravirtualized kernel on bare hardware Jan 5 15:55:25 hanginx01 kernel: setup_percpu: NR_CPUS:5120 nr_cpumask_bits:32 nr_cpu_ids:32 nr_node_ids:2 Jan 5 15:55:25 hanginx01 kernel: PERCPU: Embedded 33 pages/cpu @ffff881fffa00000 s97048 r8192 d29928 u262144 Jan 5 15:55:25 hanginx01 kernel: Built 2 zonelists in Zone order, mobility grouping on. Total pages: 66030059 Jan 5 15:55:25 hanginx01 kernel: Policy zone: Normal Jan 5 15:55:25 hanginx01 kernel: Kernel command line: BOOT_IMAGE=/vmlinuz-3.10.0-693.el7.x86_64 root=/dev/mapper/centos-root ro crashkernel=auto rd.lvm.lv=centos/root rd.lvm.lv=centos/swap r hgb quiet LANG=en_US.UTF-8 Jan 5 15:55:25 hanginx01 kernel: PID hash table entries: 4096 (order: 3, 32768 bytes) Jan 5 15:55:25 hanginx01 kernel: x86/fpu: xstate_offset[2]: 0240, xstate_sizes[2]: 0100 Jan 5 15:55:25 hanginx01 kernel: xsave: enabled xstate_bv 0x7, cntxt size 0x340 using standard form Jan 5 15:55:25 hanginx01 kernel: Memory: 5886008k/270532608k available (6886k kernel code, 2219892k absent, 4482536k reserved, 4545k data, 1764k init) Jan 5 15:55:25 hanginx01 kernel: SLUB: HWalign=64, Order=0-3, MinObjects=0, CPUs=32, Nodes=2 Jan 5 15:55:25 hanginx01 kernel: Hierarchical RCU implementation. Jan 5 15:55:25 hanginx01 kernel: #011RCU restricting CPUs from NR_CPUS=5120 to nr_cpu_ids=32. Jan 5 15:55:25 hanginx01 kernel: NR_IRQS:327936 nr_irqs:1496 0 Jan 5 15:55:25 hanginx01 kernel: Console: colour VGA+ 80x25 Jan 5 15:55:25 hanginx01 kernel: console [tty0] enabled Jan 5 15:55:25 hanginx01 kernel: allocated 1073741824 bytes of page_cgroup Jan 5 15:55:25 hanginx01 kernel: please try 'cgroup_disable=memory' option if you don't want memory cgroups Jan 5 15:55:25 hanginx01 kernel: Enabling automatic NUMA balancing. Configure with numa_balancing= or the kernel.numa_balancing sysctl Jan 5 15:55:25 hanginx01 kernel: tsc: Fast TSC calibration using PIT Jan 5 15:55:25 hanginx01 kernel: tsc: Detected 2095.256 MHz processor Jan 5 15:55:25 hanginx01 kernel: Calibrating delay loop (skipped), value calculated using timer frequency.. 4190.51 BogoMIPS (lpj=2095256) Jan 5 15:55:25 hanginx01 kernel: pid_max: default: 32768 minimum: 301 Jan 5 15:55:25 hanginx01 kernel: Security Framework initialized Jan 5 15:55:25 hanginx01 kernel: SELinux: Initializing. Jan 5 15:55:25 hanginx01 kernel: Yama: becoming mindful. Jan 5 15:55:25 hanginx01 kernel: Dentry cache hash table entries: 33554432 (order: 16, 268435456 bytes) Jan 5 15:55:25 hanginx01 kernel: Inode-cache hash table entries: 16777216 (order: 15, 134217728 bytes) Jan 5 15:55:25 hanginx01 kernel: random: fast init done Jan 5 15:55:25 hanginx01 kernel: Mount-cache hash table entries: 524288 (order: 10, 4194304 bytes) Jan 5 15:55:25 hanginx01 kernel: Mountpoint-cache hash table entries: 524288 (order: 10, 4194304 bytes) Jan 5 15:55:25 hanginx01 kernel: Initializing cgroup subsys memory Jan 5 15:55:25 hanginx01 kernel: Initializing cgroup subsys devices
以下两种方法都可以解决:
解决方法1:
修改/etc/docker/daemon.json为: { "exec-opts": ["native.cgroupdriver=systemd"] } 之后重启docker服务,执行docker info|grep Cgroup,发现结果为systemd(默认是cgroupfs),即可
解决方法2:
#升级docker版本 yum remove docker docker-engine docker-common \ docker-client docker-client-latest docker-latest docker-latest-logrotate \ docker-logrotate docker-selinux docker-engine-selinux -y yum install yum-utils lvm2 device-mapper-persistent-data -y yum-config-manager --add-repo https://download.docker.com/linux/centos/docker-ce.repo yum-config-manager --disable docker-ce-edge docker-ce-test yum install docker-ce.x86_64 -y yum update containerd.io -y #升级内核版本 生产环境谨慎操作,升级内核后需要重启
注意升级内核版本,安装完成后需要重启服务器,再使用uname -a可以看到内核版本号升级为3.10.0-1062.4.3.el7.x86_64
条件允许推荐使用方法2