sealos踩坑记录
前言
记录下我安装sealos
的踩坑历程,全网基本没有什么类似的可靠资料,也许是因为太小众了吧,希望能帮助到搜索到此文的人.
sealos是什么
Sealos 是以 kubernetes 为内核的云操作系统发行版, 单机操作系统如同 linux 发行版本可以在上面安装和使用各种单机应用,如 PPT,Word,Excel 等。 云操作系统只需要把这些单机应用替换成各种云应用,如数据库,对象存储,消息队列等,就很容易理解了,这些应用都是分布式高可用的。 Sealos 就是能支撑运行各种分布式应用的云操作系统。有了 Sealos 就拥有了一朵云。
主要资料参考这里介绍 | sealos 这里不做赘述
资料
- 介绍 | sealos
- labring/sealos: Sealos is a Kubernetes distribution, a general-purpose Cloud Operating System designed for managing cloud-native applications. Demo: https://cloud.sealos.io (github.com)
- sealerio/sealer: Build, Share and Run Both Your Kubernetes Cluster and Distributed Applications (Project under CNCF) --- sealerio/sealer:构建、共享和运行您的 Kubernetes 集群和分布式应用程序(CNCF 下的项目) (github.com)
- 用到的基本镜像可以从这里找到: labring's Profile | Docker Hub
- 安装4.17版本 https://github.com/labring/sealos/releases/download/v4.1.7/sealos_4.1.7_linux_amd64.tar.gz
- 安装crictl命令 https://github.com/kubernetes-sigs/cri-tools/releases/download/v1.25.0/crictl-v1.25.0-linux-amd64.tar.gz
- 命令参考 Kubernetes 生命周期管理 | sealos
架构
- 本身资料中没有画,要么从代码中提炼
- 阅读代码,了解设计模式和代码架构,了解基础操作和实现
安装
官方操作
4.0版本的sealos
# 安装前必读
1.目前只支持root用户,不支持非root和sudo
2.目前只支持在集群内的节点执行安装命令
3.提前卸载掉已安装的docker
4.3.0版本的k8s离线包无法使用4.0版本的sealos安装
5.run命令时如果密码有特殊字符,请加英文单引号
6.离线安装示例:
4.0离线安装示例:
---
# 镜像打包, 在有外网的机器上执行
sealos pull labring/kubernetes:v1.24.0
sealos pull labring/calico:v3.22.1
sealos save -o kubernetes.tar labring/kubernetes:v1.24.0
sealos save -o calico.tar labring/calico:v3.22.1
---
# 加载镜像, 内网机器执行
sealos load -i kubernetes.tar
sealos load -i calico.tar
主机
主机 | 用途 |
---|---|
10.55.10.107 | 计划作为sealos的安装机,以及master节点 |
10.55.10.106 | node节点1 |
10.55.10.97 | node节点2 |
可以选择打通免密,方便定位问题
ssh-keygen -t rsa
cat id_rsa.pub >> authorized_keys
vim authorized_keys # 添加秘钥
vim /etc/ssh/sshd_config # 修改允许root登录 PermitRootLogin yes
systemctl restart sshd
前置检查和文件准备
# 主机只有挂载的/data01磁盘支持overlay,所以注定了没法向上面官方文档给出的那么简单的就能安装完成
[root@test-d-010055010107 data01]# xfs_info /data01
meta-data=/dev/vdb isize=512 agcount=4, agsize=5242880 blks
= sectsz=512 attr=2, projid32bit=1
= crc=1 finobt=0 spinodes=0
data = bsize=4096 blocks=20971520, imaxpct=25
= sunit=0 swidth=0 blks
naming =version 2 bsize=4096 ascii-ci=0 ftype=1
log =internal bsize=4096 blocks=10240, version=2
= sectsz=512 sunit=0 blks, lazy-count=1
realtime =none extsz=4096 blocks=0, rtextents=0
lsmod | grep -e ip_vs -e nf_conntrack_ipv4
# 文件准备,从远处pull下来,然后save成镜像包
ctr image import kubernetes.tar
ctr image import calico.tar
ctr images export calico.tar docker.io/labring/calico:v3.22.1
wget https://github.com/labring/sealos/releases/download/v4.1.4/sealos_4.1.4_linux_amd64.tar.gz \
&& tar zxvf sealos_4.1.4_linux_amd64.tar.gz sealos && chmod +x sealos && mv sealos /usr/bin
# sealos_4.1.4 和 sealos_4.1.7 在Global Flags地方有区别,并且4.1.4有bug无法完成当前主机集群的正常部署,需要使用4.1.7版本
单机安装
# 遇到文件格式问题,需要指定主目录
[root@test-d-010055010107 data01]# ./sealos run
Error: kernel does not support overlay fs: overlay: the backing xfs filesystem is formatted without d_type support, which leads to incorrect behavior. Reformat the filesystem with ftype=1 to enable d_type support. Running without d_type is not supported.: driver not supported
kernel does not support overlay fs: overlay: the backing xfs filesystem is formatted without d_type support, which leads to incorrect behavior. Reformat the filesystem with ftype=1 to enable d_type support. Running without d_type is not supported.: driver not supported
# 加载镜像包有问题,需要指定镜像解包格式
[root@test-d-010055010107 data01]# sealos --root /data01/ --runroot /data01/ load -i kubernetes.tar
Error: loading index: open /var/tmp/oci1097864579/index.json: no such file or directory
loading index: open /var/tmp/oci1097864579/index.json: no such file or directory
# 常用命令
mkdir /data01/sealos
sealos --debug --root /data01/sealos --runroot /data01/sealos/docker load -i calico.tar -t docker-archive
sealos --debug --root /data01/sealos --runroot /data01/sealos/docker load -i new-kubernetes.tar -t oci-archive
sealos load --help
sealos --debug --root /data01/sealos --runroot /data01/sealos/docker run localhost/labring/kuberentes:v1.24 --single # 通过镜像名有问题,这里直接用镜像id
sealos --debug --root /data01/sealos --runroot /data01/sealos/docker run 133c6a0a0d5f --single
# 重置安装
sealos --debug --root /data01/sealos --runroot /data01/sealos/docker reset
# 简化命令
alias s="sealos --debug --root /data01/sealos --runroot /data01/sealos/docker "
s run 133c6a0a0d5f --single
[root@test-d-010055010107 sealos]# s images
REPOSITORY TAG IMAGE ID CREATED SIZE
docker.io/labring/kubernetes v1.24 133c6a0a0d5f 10 days ago 635 MB
docker.io/labring/helm v3.8.2 1123e8b4b455 7 months ago 45.1 MB
docker.io/labring/calico v3.22.1 29516dc98b4b 9 months ago 546 MB
# sealos version must >= v4.1.0
s reset
s run 133c6a0a0d5f 1123e8b4b455 29516dc98b4b --single
# 手动执行image-cri-shim启动,还是有问题,查看有报错
/usr/bin/image-cri-shim -f /etc/image-cri-shim.yaml
fatal failed to setup image_shim, cri/shim: failed to register image service: falling using CRI v1 image API, please using other cri support v1 CRI API
fatal failed to setup image_shim, cri/shim: failed to register image service: falling using CRI v1alpha2 image API, please using other cri support v1alpha2 CRI API
# 排查containerd,看到有报错信息
[root@test-d-010055010107 sealos]# systemctl status containerd -l
● containerd.service - containerd container runtime
Loaded: loaded (/etc/systemd/system/containerd.service; enabled; vendor preset: disabled)
Active: active (running) since Mon 2023-03-27 17:49:48 CST; 16h ago
Docs: https://containerd.io
Main PID: 7077 (containerd)
Memory: 13.9M
CGroup: /system.slice/containerd.service
└─7077 /usr/bin/containerd
Mar 27 17:49:48 test-d-010055010107 systemd[1]: Starting containerd container runtime...
Mar 27 17:49:48 test-d-010055010107 containerd[7077]: time="2023-03-27T17:49:48.229104592+08:00" level=warning msg="failed to load plugin io.containerd.snapshotter.v1.overlayfs" error="/var/lib/containerd/io.containerd.snapshotter.v1.overlayfs does not support d_type. If the backing filesystem is xfs, please reformat with ftype=1 to enable d_type support"
Mar 27 17:49:48 test-d-010055010107 containerd[7077]: time="2023-03-27T17:49:48.229191393+08:00" level=warning msg="failed to load plugin io.containerd.snapshotter.v1.devmapper" error="devmapper not configured"
Mar 27 17:49:48 test-d-010055010107 containerd[7077]: time="2023-03-27T17:49:48.229403283+08:00" level=warning msg="could not use snapshotter overlayfs in metadata plugin" error="/var/lib/containerd/io.containerd.snapshotter.v1.overlayfs does not support d_type. If the backing filesystem is xfs, please reformat with ftype=1 to enable d_type support"
Mar 27 17:49:48 test-d-010055010107 containerd[7077]: time="2023-03-27T17:49:48.229420619+08:00" level=warning msg="could not use snapshotter devmapper in metadata plugin" error="devmapper not configured"
Mar 27 17:49:48 test-d-010055010107 containerd[7077]: time="2023-03-27T17:49:48.238313538+08:00" level=warning msg="failed to load plugin io.containerd.grpc.v1.cri" error="failed to create CRI service: failed to find snapshotter \"overlayfs\""
Mar 27 17:49:48 test-d-010055010107 systemd[1]: Started containerd container runtime.
# 怀疑是 containerd 没有安装成功,尝试安装crictl命令来看看
tar zxvf crictl-v1.25.0-linux-amd6.tar.gz -C /usr/local/bin
# 查看信息,确定是这个问题,尝试修复
[root@test-d-010055010107 sealos]# crictl info
E0328 10:07:11.802780 10291 remote_runtime.go:948] "Status from runtime service failed" err="rpc error: code = Unimplemented desc = unknown service runtime.v1alpha2.RuntimeService"
FATA[0000] getting status of runtime: rpc error: code = Unimplemented desc = unknown service runtime.v1alpha2.RuntimeService
# 查看containerd关于overlayfs的配置,以及修改目录
cp -r /var/lib/container* /data01/
vim /etc/containerd/config.toml 修改 root = "/data01/containerd"
# 顺利启动containerd和image-cri-shim
systemctl restart containerd
systemctl restart image-cri-shim
# 遇到了 /root/.sealos/default/etc/admin.conf 找不到的问题,看着issue需要升级到4.1.7版本,问题解决但又然后发现重复安装有问题,无法继续上次安装
s reset # 重新开始
# 但是安装出来的containerd还是在/var/lib/containerd,需要找到改变此路径的方法,翻阅文档猜测指定criData环境变量可能有用
# 改变命令
s run 133c6a0a0d5f --single --env criData=/data01/containerd
# 的确有用,会把containerd安装到/data01/containerd,但是/root/.sealos/default/Clusterfile中显示的criData还是/var/lib/containerd
# 成功安装
# 但是节点一直未就绪
[root@test-d-010055010107 sealos]# kubectl get node
NAME STATUS ROLES AGE VERSION
test-d-010055010107 NotReady control-plane 8m56s v1.24.0
KubeletNotReady container runtime network not ready: NetworkReady=false reason:NetworkPluginNotReady message:Network plugin returns error: cni plugin not initialized
[root@test-d-010055010107 sealos]# crictl ps -a
CONTAINER IMAGE CREATED STATE NAME ATTEMPT POD ID POD
5d3572591a876 77b49675beae1 12 minutes ago Running kube-proxy 0 dc61529f47415 kube-proxy-vjjqv
9559b3a7d80ec aebe758cef4cd 12 minutes ago Running etcd 0 1a1846fb97f25 etcd-test-d-010055010107
00a5f23d7d227 529072250ccc6 12 minutes ago Running kube-apiserver 0 b65e60cdc8996 kube-apiserver-test-d-010055010107
91b737d89b72e e3ed7dee73e93 12 minutes ago Running kube-scheduler 0 e682c3fb7cc11 kube-scheduler-test-d-010055010107
dd3a2ea10b7c7 88784fb4ac2f6 12 minutes ago Running kube-controller-manager 0 d3177bd65479c kube-controller-manager-test-d-010055010107
[root@test-d-010055010107 sealos]# kubectl get pod -A
NAMESPACE NAME READY STATUS RESTARTS AGE
kube-system coredns-6d4b75cb6d-qfnf5 0/1 Pending 0 3h24m
kube-system coredns-6d4b75cb6d-xzjz5 0/1 Pending 0 3h24m
kube-system etcd-test-d-010055010107 1/1 Running 0 3h24m
kube-system kube-apiserver-test-d-010055010107 1/1 Running 0 3h24m
kube-system kube-controller-manager-test-d-010055010107 1/1 Running 0 3h24m
kube-system kube-proxy-vjjqv 1/1 Running 0 3h24m
kube-system kube-scheduler-test-d-010055010107 1/1 Running 0 3h24m
[root@test-d-010055010107 sealos]# journalctl -xeu kubelet
Mar 28 11:43:40 test-d-010055010107 kubelet[20385]: E0328 11:43:40.678552 20385 kubelet.go:2344] "Container runtime network not ready" networkReady="NetworkReady=f
Mar 28 11:43:45 test-d-010055010107 kubelet[20385]: E0328 11:43:45.679314 20385 kubelet.go:2344] "Container runtime network not ready" networkReady="NetworkReady=f
# 看issue上是说没有安装calico导致的,重新安装
s reset # 并不会删除/root/.sealos
s run 133c6a0a0d5f 1123e8b4b455 29516dc98b4b --single --env criData=/data01/containerd
# 看着一切正常
[root@test-d-010055010107 sealos]# kubectl get pod -A
NAMESPACE NAME READY STATUS RESTARTS AGE
calico-system calico-kube-controllers-6b44b54755-qsmkl 0/1 Pending 0 115s
calico-system calico-node-7grz7 1/1 Running 0 115s
calico-system calico-typha-6f9598cfd9-2sr27 1/1 Running 0 115s
kube-system coredns-6d4b75cb6d-6fncr 1/1 Running 0 2m2s
kube-system coredns-6d4b75cb6d-b8czk 1/1 Running 0 2m2s
kube-system etcd-test-d-010055010107 1/1 Running 1 2m16s
kube-system kube-apiserver-test-d-010055010107 1/1 Running 1 2m18s
kube-system kube-controller-manager-test-d-010055010107 1/1 Running 1 2m16s
kube-system kube-proxy-wnp2g 1/1 Running 0 2m3s
kube-system kube-scheduler-test-d-010055010107 1/1 Running 1 2m16s
tigera-operator tigera-operator-d7957f5cc-5wfc4 1/1 Running 0 2m2s
[root@test-d-010055010107 sealos]#
[root@test-d-010055010107 sealos]#
[root@test-d-010055010107 sealos]# kubectl get node
NAME STATUS ROLES AGE VERSION
test-d-010055010107 Ready control-plane 2m25s v1.24.0
集群安装
有了单机安装的经验,该踩的坑都踩了,直接开始安装集群
# 尝试集群安装
alias s="sealos --debug --root /data01/sealos --runroot /data01/sealos/docker "
s run 133c6a0a0d5f 1123e8b4b455 29516dc98b4b -e defaultVIP=10.55.10.108 -e criData=/data01/containerd --masters 10.55.10.107 --nodes 10.55.10.97,10.55.10.106 --passwd 112233
passwd 112233
[root@test-d-010055010107 ~]# kubectl get node -o wide
NAME STATUS ROLES AGE VERSION INTERNAL-IP EXTERNAL-IP OS-IMAGE KERNEL-VERSION CONTAINER-RUNTIME
test-d-010055010097 Ready <none> 65s v1.24.0 10.55.10.97 <none> CentOS Linux 7 (Core) 3.10.0-693.11.6.el7.x86_64 containerd://1.7.0
test-d-010055010106 Ready <none> 76s v1.24.0 10.55.10.106 <none> CentOS Linux 7 (Core) 3.10.0-693.11.6.el7.x86_64 containerd://1.7.0
test-d-010055010107 Ready control-plane 95s v1.24.0 10.55.10.107 <none> CentOS Linux 7 (Core) 3.10.0-693.11.6.el7.x86_64 containerd://1.7.0
# 看着没啥问题
解决问题用到的参考连接
- unsupported graph driver: vfs · Issue #1576 · sealerio/sealer (github.com)
- 概览 | sealer 有些问题可能也要参考这个文档
- Question: Can
sealos load -i
usedocker save -o
image.tar? · Issue #2526 · labring/sealos --- 问:sealos可以加载-i
usedocker保存-o
image.tar吗?·问题#2526 · labring/sealos (github.com) - crictl安装 - 小吉猫 - 博客园 (cnblogs.com)
- (22条消息) Containerd 安装过程以及踩的坑_/var/lib/containerd_Aisaka81的博客-CSDN博客
- error Applied to cluster error: read admin.conf error in guest: open /root/.sealos/default/etc/admin.conf: no such file or directory · Issue #2548 · labring/sealos (github.com)
- sealos4.0首次安装失败,再次安装没有任何提示且安装未成功 · Issue #1207 · labring/sealos (github.com)
- 单机安装API Server未起来,kubelet也无法启动 · Issue #2313 · labring/sealos (github.com)
- BUG: 单节点部署节点Notready · Issue #1663 · labring/sealos (github.com)
- sealos NetworkReady=false reason:NetworkPluginNotReady message:Network plugin returns error: cni plugin not initialized · Issue #704 · labring/sealos (github.com)
- linux journalctl 命令 - sparkdev - 博客园 (cnblogs.com) Linux系统查看日志命令
感想
- 版本变化多,命令参数有改动,bug隐藏的深
- 需要耐心抽丝剥茧的排查遇到的问题,可以提前安装些k8s定位问题依赖的命令如
ctr/crictl
- 也加入了官方的钉钉群,但基本不答复问题和咨询
- 关注issue,也是唯一有价值的参考资料了
- 禁止转载