k8s集群断电后 机器不能启动问题--- Centos 服务器 无法启动 Failed to start Login Service

参考文档:
https://blog.csdn.net/hedao0515/article/details/129718094

先说下主要原因,是因为断电后有些文件没有完整写入,导致文件系统错误,需要借助原生工具修复文件系统。

  1. 重启机器,进入linux选择内核页面,按 ctrl+x 进入引导页面,在linux16这一行 最后填上 init=/bin/bash
    有可能 机器起不来,默认就是这个页面,直接操作第二部即可

  2. cd /etc/mapper

  3. ls

  4. 执行修复操作

  5. xfs_repair -L /dev/mapper/centos-root
    xfs_repair -L /dev/mapper/centos-swap
    xfs_repair -L /dev/mapper/control

  6. 操作完成后,执行 init 6即可恢复。

但是进入操作系统后,k8s集群有问题了,发现是etcd的snap文件找不到了,估计是断电导致的,备份下yaml文件,只能重建集群了

[ root@k8s-master01 ~ ] #docker logs -f 8b5da86c9c3b
2024-01-29 03:13:49.628037 I | etcdmain: etcd Version: 3.3.15
2024-01-29 03:13:49.628081 I | etcdmain: Git SHA: 94745a4ee
2024-01-29 03:13:49.628084 I | etcdmain: Go Version: go1.12.9
2024-01-29 03:13:49.628086 I | etcdmain: Go OS/Arch: linux/amd64
2024-01-29 03:13:49.628089 I | etcdmain: setting maximum number of CPUs to 4, total number of available CPUs is 4
2024-01-29 03:13:49.628115 N | etcdmain: the server is already initialized as member before, starting as etcd member...
2024-01-29 03:13:49.628141 I | embed: peerTLS: cert = /etc/kubernetes/pki/etcd/peer.crt, key = /etc/kubernetes/pki/etcd/peer.key, ca = , trusted-ca = /etc/kubernetes/pki/etcd/ca.crt, client-cert-auth = true, crl-file = 
2024-01-29 03:13:49.628418 I | embed: listening for peers on https://192.168.39.101:2380
2024-01-29 03:13:49.628453 I | embed: listening for client requests on 127.0.0.1:2379
2024-01-29 03:13:49.628462 I | embed: listening for client requests on 192.168.39.101:2379
2024-01-29 03:13:49.629621 I | etcdserver: recovered store from snapshot at index 7440752
2024-01-29 03:13:49.630902 C | etcdserver: recovering backend from snapshot error: database snapshot file path error: snap: snapshot file doesn't exist
panic: recovering backend from snapshot error: database snapshot file path error: snap: snapshot file doesn't exist
	panic: runtime error: invalid memory address or nil pointer dereference
[signal SIGSEGV: segmentation violation code=0x1 addr=0x20 pc=0xbc415e]

goroutine 1 [running]:
github.com/coreos/etcd/etcdserver.NewServer.func1(0xc0002e9d38, 0xc0002e8ab8)
	/Users/leegyuho/go/src/github.com/coreos/etcd/etcdserver/server.go:293 +0x3e
panic(0xdfa7e0, 0xc000090d60)
	/usr/local/go/src/runtime/panic.go:522 +0x1b5
github.com/coreos/etcd/vendor/github.com/coreos/pkg/capnslog.(*PackageLogger).Panicf(0xc0001b35c0, 0xfbbf6c, 0x2a, 0xc0002e8b88, 0x1, 0x1)
	/Users/leegyuho/go/src/github.com/coreos/etcd/vendor/github.com/coreos/pkg/capnslog/pkg_logger.go:83 +0x135
github.com/coreos/etcd/etcdserver.NewServer(0x7ffd6eb14e62, 0x14, 0x0, 0x0, 0x0, 0x0, 0xc00011cd00, 0x1, 0x1, 0xc00011ce00, ...)
	/Users/leegyuho/go/src/github.com/coreos/etcd/etcdserver/server.go:388 +0x2c7b
github.com/coreos/etcd/embed.StartEtcd(0xc000260000, 0xc000260480, 0x0, 0x0)
	/Users/leegyuho/go/src/github.com/coreos/etcd/embed/etcd.go:179 +0x7da
github.com/coreos/etcd/etcdmain.startEtcd(0xc000260000, 0xf974b7, 0x6, 0x1, 0xc0001e0e00)
	/Users/leegyuho/go/src/github.com/coreos/etcd/etcdmain/etcd.go:181 +0x40
github.com/coreos/etcd/etcdmain.startEtcdOrProxyV2()
	/Users/leegyuho/go/src/github.com/coreos/etcd/etcdmain/etcd.go:102 +0x13fb
github.com/coreos/etcd/etcdmain.Main()
	/Users/leegyuho/go/src/github.com/coreos/etcd/etcdmain/main.go:46 +0x38
main.main()
	/Users/leegyuho/go/src/github.com/coreos/etcd/main.go:28 +0x20

etcd服务起不来,导致apiserver以致整个集群都不能启动,etcd的数据很重要阿!!!

posted @ 2024-01-29 11:16  jasmine456  阅读(572)  评论(0编辑  收藏  举报