ubuntu20.04部署k8s集群(基于docker)
Ubuntu部署k8s集群(基于docker)
本文总结一下部署k8s集群踩的坑以及部署流程。
相关版本:docker-v27.4.1、cri-dockerd-v0.3.16、kubeadm-v1.28.15
注意本人的机器是arm64的,x86已经amd64的可以参考
目前仅完成至基础配置阶段(到集群初始化)
k8s介绍
Kubernetes 是一个开源的容器编排引擎,用来对容器化应用进行自动化部署、扩缩和管理。
简单来说,k8s就是一个用来管理不同机器的docker容器构成的集群。使得众多的容器通过k8s统一管理。
常用下面这几种镜像来构建集群,这里使用的是docker。
- containerd
- CRI-O
- Docker Engine
- Mirantis Container Runtime
集群准备
集群规划
主机名 | 节点 IP | 角色 |
---|---|---|
k8s-master | 192.168.223.129 | k8s-master |
k8s-slave1 | * | k8s-slave1 |
k8s-slave2 | * | k8s-slave1 |
网络规划
类型 | 网络范围 |
---|---|
Pod 网络 | 10.244.0.0/16 |
Service 网络 | 10.96.0.0/12 |
节点网络 | 192.168.223.0/24 |
时间同步(三个主机都要)
# Step1: 查看时间
date
> Thu Sep 7 05:39:21 AM UTC 2024
# Step2: 更换时区
timedatectl set-timezone Asia/Shanghai
date
> Thu Sep 7 01:39:51 PM CST 2024
# Step3: 安装ntpdate时间同步工具
apt install ntpdate
# Step4: 通过linux计划任务完成同步
crontab -e
> Select an editor. To change later, run 'select-editor'.
> 1. /bin/nano <---- easiest
> 2. /usr/bin/vim.basic
> 3. /usr/bin/vim.tiny
> 4. /bin/ed
> Choose 1-4 [1]: 2(选2)
0 0 * * * ntpdate ntp.aliyun.com
# Step5: 查看定时任务是否设置成功
crontab -l
> 0 0 * * * ntpdate ntp.aliyun.com
基础主机配置
设置主机名及hosts配置
# 更改主机名
hostnamectl set-hostname k8s-master
# 设置hosts解析
vim /etc/hosts
> 127.0.0.1 localhost
>
> # The following lines are desirable for IPv6 capable hosts
> ::1 ip6-localhost ip6-loopback
> fe00::0 ip6-localnet
> ff00::0 ip6-mcastprefix
> ff02::1 ip6-allnodes
> ff02::2 ip6-allrouters
>
> 192.168.223.129 k8s-master
> 192.168.223.* k8s-slave1
> 192.168.223.* k8s-slave2
验证
ping k8s-master
> PING k8s-master (192.168.223.129) 56(84) bytes of data.
> 64 bytes from k8s-master (192.168.223.129): icmp_seq=1 ttl=64 time=0.109 ms
> 64 bytes from k8s-master (192.168.223.129): icmp_seq=2 ttl=64 time=0.174 ms
> 64 bytes from k8s-master (192.168.223.129): icmp_seq=3 ttl=64 time=0.139 ms
> 64 bytes from k8s-master (192.168.223.129): icmp_seq=4 ttl=64 time=0.121 ms
> --- k8s-master ping statistics ---
> 4 packets transmitted, 4 received, 0% packet loss, time 3051ms
> rtt min/avg/max/mdev = 0.109/0.135/0.174/0.024 ms
# 剩下的两个节点也去ping一下
设置iptables
iptables -P FORWARD ACCEPT
关闭swap挂载、防火墙及selinux
# Step1: 禁止swap挂载
swapoff -a
# 防止开机自动挂载 swap 分区
sed -i '/ swap / s/^\(.*\)$/#\1/g' /etc/fstab
# Step2: selinux关闭
sed -ri 's#(SELINUX=).*#\1disabled#' /etc/selinux/config
setenforce 0
systemctl disable firewalld && systemctl stop firewalld
# Step3: 关闭防火墙
ufw disable
修改内核参数允许流量传输
# 内核参数调整,开机加载模块
cat <<EOF | sudo tee /etc/modules-load.d/k8s.conf
overlay
br_netfilter
EOF
sudo modprobe overlay
sudo modprobe br_netfilter
# 设置所需的 sysctl 参数,参数在重新启动后保持不变,允许流量传输
cat <<EOF | sudo tee /etc/sysctl.d/k8s.conf
net.bridge.bridge-nf-call-iptables = 1
net.bridge.bridge-nf-call-ip6tables = 1
net.ipv4.ip_forward = 1
EOF
# 注意验证
sudo sysctl --system
> * Applying /etc/sysctl.d/k8s.conf ...
> net.bridge.bridge-nf-call-ip6tables = 1
> net.bridge.bridge-nf-call-iptables = 1
> net.ipv4.ip_forward = 1
lsmod | grep br_netfilter
> br_netfilter 28672 0
> bridge 233472 1 br_netfilter
lsmod | grep overlay
> overlay 143360 10
开启ipvs并加载模块
# 安装相关模块
apt install -y ipset ipvsadm
# 配置相关模块开机自动加载
cat <<EOF | sudo tee /etc/modules-load.d/ipvs.conf
ip_vs
ip_vs_rr
ip_vs_wrr
ip_vs_sh
nf_conntrack
EOF
# 加载模块
sudo modprobe ip_vs
sudo modprobe ip_vs_rr
sudo modprobe ip_vs_wrr
sudo modprobe ip_vs_sh
sudo modprobe nf_conntrack
# 注意验证加载成功
lsmod |grep -e ip_vs -e nf_conntrack
> ip_vs_sh 20480 0
> ip_vs_wrr 20480 0
> ip_vs_rr 20480 0
> ip_vs 192512 6 ip_vs_rr,ip_vs_sh,ip_vs_wrr
> nf_conntrack 184320 5 xt_conntrack,nf_nat,xt_nat,xt_MASQUERADE,ip_vs
> nf_defrag_ipv6 24576 2 nf_conntrack,ip_vs
> nf_defrag_ipv4 16384 1 nf_conntrack
> libcrc32c 16384 5 nf_conntrack,nf_nat,btrfs,raid456,ip_vs
docker安装
k8s就是来管理容器的。这里采用的容器是docker
安装相关依赖
sudo apt-get install \
apt-transport-https \
ca-certificates \
curl \
gnupg-agent \
software-properties-common
安装信任 Docker 的 GPG 公钥
用的阿里云的源,可以去它的源官网搜索docker-ce
sudo install -m 0755 -d /etc/apt/keyrings
curl -fsSL https://mirrors.aliyun.com/docker-ce/linux/ubuntu/gpg | sudo gpg --dearmor -o /etc/apt/keyrings/docker.gpg
sudo chmod a+r /etc/apt/keyrings/docker.gpg
写入软件源信息
echo \
"deb [arch=$(dpkg --print-architecture) signed-by=/etc/apt/keyrings/docker.gpg] https://mirrors.aliyun.com/docker-ce/linux/ubuntu \
"$(. /etc/os-release && echo "$VERSION_CODENAME")" stable" | \
sudo tee /etc/apt/sources.list.d/docker.list > /dev/null
安装Docker
sudo apt-get update
sudo apt-get install docker-ce docker-ce-cli containerd.io docker-buildx-plugin docker-compose-plugin
(附)查找指定docker版本安装
apt-cache madison docker-ce
# 选择指定版本安装
apt-get -y install docker-ce=[VERSION]
修改配置文件
# 创建配置文件夹
sudo mkdir -p /etc/docker
# 写入以下信息
sudo cat > /etc/docker/daemon.json <<EOF
{
"exec-opts": ["native.cgroupdriver=systemd"],
"registry-mirrors": [
"https://8xpk5wnt.mirror.aliyuncs.com"
]
}
EOF
# 读取配置文件并重启
sudo systemctl daemon-reload && sudo systemctl restart docker
- "exec-opts": ["native.cgroupdriver=systemd"]:docker默认的cgroupdriver是cgroup,官网推荐使用systemd
- 后面那个是阿里云的镜像源
设置自启动
systemctl enable --now docker
验证
# 验证cgroupdrive配置情况
docker info | grep Cgroup
# 拉个镜像跑就行(docker hub的不太能拉,懂得都懂)
docker run --name some-nginx -d -p 8080:80 registry.openanolis.cn/openanolis/nginx:1.14.1-8.6
cri-dockerd安装
从k8s-v1.20左右版本,docker已经不支持直接连接了,需要使用cri-dockerd提供的api进行连接。
简单来说,直接用会报错,k8s默认的连接容器是containd。
这里就展示tar包的下载安装流程,rpm应该更简单点,可参考:https://aluopy.cn/docker/cri-dockerd-install/
下载安装包
官网在这:https://github.com/Mirantis/cri-dockerd/releases/tag/v0.3.16
有魔法的直接wget去下载就行。注意自己服务器的架构
我后面丢个amd64-tar和rpm的包吧,这里解释tar包的安装。
# 放在opt下
sudo wget -P /opt https://github.com/Mirantis/cri-dockerd/releases/download/v0.3.16/cri-dockerd-0.3.16.amd64.tgz
# 解压
tar -xvzf cri-dockerd-0.3.16.arm64.tgz -C /tmp
sudo mv /tmp/cri-dockerd /usr/local/bin/
书写配置文件
主要写两个配置文件,让systemctl能够管理:cri-docker.service、cri-docker.socket
# cri-docker.service
cat >> /lib/systemd/system/cri-docker.service << EOF
[Unit]
Description=CRI Interface for Docker Application Container Engine
Documentation=https://docs.mirantis.com
After=network-online.target firewalld.service docker.service
Wants=network-online.target
Requires=cri-docker.socket
[Service]
Type=notify
ExecStart=/usr/local/bin/cri-dockerd --container-runtime-endpoint fd:// --pod-infra-container-image registry.aliyuncs.com/google_containers/pause:3.9
ExecReload=/bin/kill -s HUP $MAINPID
TimeoutSec=0
RestartSec=2
Restart=always
# Note that StartLimit* options were moved from "Service" to "Unit" in systemd 229.
# Both the old, and new location are accepted by systemd 229 and up, so using the old location
# to make them work for either version of systemd.
StartLimitBurst=3
# Note that StartLimitInterval was renamed to StartLimitIntervalSec in systemd 230.
# Both the old, and new name are accepted by systemd 230 and up, so using the old name to make
# this option work for either version of systemd.
StartLimitInterval=60s
# Having non-zero Limit*s causes performance problems due to accounting overhead
# in the kernel. We recommend using cgroups to do container-local accounting.
LimitNOFILE=infinity
LimitNPROC=infinity
LimitCORE=infinity
# Comment TasksMax if your systemd version does not support it.
# Only systemd 226 and above support this option.
TasksMax=infinity
Delegate=yes
KillMode=process
[Install]
WantedBy=multi-user.target
EOF
# cri-docker.socket
cat >> /lib/systemd/system/cri-docker.socket << EOF
[Unit]
Description=CRI Docker Socket for the API
PartOf=cri-docker.service
[Socket]
ListenStream=%t/cri-dockerd.sock
SocketMode=0660
SocketUser=root
SocketGroup=docker
[Install]
WantedBy=sockets.target
EOF
启动并配置自启动
# 配置其自启动并生成套接字
systemctl enable --now cri-docker.socket cri-docker.service
# 这个时候在/var/run里面可以看到套接字的路径,这个是要用于初始化的!
ll /var/run/cri-dockerd.sock
> srw-rw---- 1 root docker 0 Jan 10 19:53 /var/run/cri-dockerd.sock=
k8s安装及初始化
安装k8s
mkdir -p /etc/apt/keyrings
# 下载gpg公钥
curl -fsSL https://mirrors.aliyun.com/kubernetes-new/core/stable/v1.28/deb/Release.key |
gpg --dearmor -o /etc/apt/keyrings/kubernetes-apt-keyring.gpg
# 写入相关的源
echo "deb [signed-by=/etc/apt/keyrings/kubernetes-apt-keyring.gpg] https://mirrors.aliyun.com/kubernetes-new/core/stable/v1.28/deb/ /" |
tee /etc/apt/sources.list.d/kubernetes.list
# 更新源并安装kubelet、kubeadm、kubectl
apt-get update
apt-get install -y kubelet kubeadm kubectl
# 锁定一下版本信息,防止系统自动更新报错
sudo apt-mark hold kubelet kubeadm kubectl
这里仅展示安装相关代码,与docker安装类似
修改相关参数(这里只需要master节点用,其他节点安装了就行)
这边参数的正确性决定启动是否成功
1.配置k8s启动引擎为systemd
vim /etc/default/kubelet
> KUBELET_EXTRA_ARGS="--cgroup-driver=systemd"
# 将上面内容写入,ESC + ":wq"报错退出
生成配置文件并修改
# 生成默认配置文件(我这是直接放在opt底下,可以自选)
kubeadm config print init-defaults > kubeadm.yaml
# 接下来注意配置文件情况,重要!!!
# cat kubeadm.yaml
apiVersion: kubeadm.k8s.io/v1beta3
bootstrapTokens:
- groups:
- system:bootstrappers:kubeadm:default-node-token
token: abcdef.0123456789abcdef
ttl: 24h0m0s
usages:
- signing
- authentication
kind: InitConfiguration
localAPIEndpoint:
advertiseAddress: 192.168.223.129 # 这里改成公网路径
bindPort: 6443
nodeRegistration:
criSocket: unix:///var/run/cri-dockerd.sock #这里注意改成cri-dockerd生成的套接字路径
imagePullPolicy: IfNotPresent
name: k8s-master # 这里注意改成自己的主机名(master节点)
taints: null
---
apiServer:
timeoutForControlPlane: 4m0s
apiVersion: kubeadm.k8s.io/v1beta3
certificatesDir: /etc/kubernetes/pki
clusterName: kubernetes
controllerManager: {}
dns: {}
etcd:
local:
dataDir: /var/lib/etcd
imageRepository: registry.aliyuncs.com/google_containers #这里换源,外网的拉不了,有魔法那也可以不改
kind: ClusterConfiguration
kubernetesVersion: 1.28.0
networking:
dnsDomain: cluster.local
podSubnet: 10.244.0.0/16 # 注意加上这个,这是属于节点的网段
serviceSubnet: 10.96.0.0/12
scheduler: {}
- localAPIEndpoint-advertiseAddress: master节点地址
- nodeRegistration-criSocket:cri提供的套接字路径
- imageRepository:拉取镜像的源
- networking-podSubnet:pod节点所用网段。可以照抄就是,不要与本机网段冲突
提前拉取镜像
# 查看所需镜像
kubeadm config images list --config kubeadm.yaml
> registry.aliyuncs.com/google_containers/kube-apiserver:v1.28.0
> registry.aliyuncs.com/google_containers/kube-controller-manager:v1.28.0
> registry.aliyuncs.com/google_containers/kube-scheduler:v1.28.0
> registry.aliyuncs.com/google_containers/kube-proxy:v1.28.0
> registry.aliyuncs.com/google_containers/pause:3.9
> registry.aliyuncs.com/google_containers/etcd:3.5.15-0
> registry.aliyuncs.com/google_containers/coredns:v1.10.1
# 自动提前拉取
kubeadm config images pull --config kubeadm.yaml
> [config/images] Pulled registry.aliyuncs.com/google_containers/kube-apiserver:v1.28.0
> [config/images] Pulled registry.aliyuncs.com/google_containers/kube-controller-manager:v1.28.0
> [config/images] Pulled registry.aliyuncs.com/google_containers/kube-scheduler:v1.28.0
> [config/images] Pulled registry.aliyuncs.com/google_containers/kube-proxy:v1.28.0
> [config/images] Pulled registry.aliyuncs.com/google_containers/pause:3.9
> [config/images] Pulled registry.aliyuncs.com/google_containers/etcd:3.5.15-0
> [config/images] Pulled registry.aliyuncs.com/google_containers/coredns:v1.10.1
# 拉取成功通过docker images验证一下
docker images
> REPOSITORY TAG IMAGE ID CREATED SIZE
> registry.aliyuncs.com/google_containers/etcd 3.5.15-0 27e3830e1402 5 months ago 139MB
> registry.aliyuncs.com/google_containers/kube-apiserver v1.28.0 00543d2fe5d7 17 months ago 119MB
> registry.aliyuncs.com/google_containers/kube-controller-manager v1.28.0 46cc66ccc7c1 17 months ago 116MB
> registry.aliyuncs.com/google_containers/kube-scheduler v1.28.0 762dce4090c5 17 months ago 57.8MB
> registry.aliyuncs.com/google_containers/kube-proxy v1.28.0 940f54a5bcae 17 months ago 68.3MB
> registry.aliyuncs.com/google_containers/coredns v1.10.1 97e04611ad43 23 months ago 51.4MB
> registry.aliyuncs.com/google_containers/pause 3.9 829e9de338bd 2 years ago 514kB
初始化
# 上述成功了就可以在master节点开始初始化了
kubeadm init --config kubeadm.yaml
成功后会提示如下信息:
···
Your Kubernetes control-plane has initialized successfully!
To start using your cluster, you need to run the following as a regular user:
mkdir -p $HOME/.kube
sudo cp -i /etc/kubernetes/admin.conf $HOME/.kube/config
sudo chown $(id -u):$(id -g) $HOME/.kube/config
Alternatively, if you are the root user, you can run:
export KUBECONFIG=/etc/kubernetes/admin.conf
You should now deploy a pod network to the cluster.
Run "kubectl apply -f [podnetwork].yaml" with one of the options listed at:
https://kubernetes.io/docs/concepts/cluster-administration/addons/
Then you can join any number of worker nodes by running the following on each as root:
kubeadm join 192.168.223.129:6443 --token abcdef.0123456789abcdef \
--discovery-token-ca-cert-hash sha256:c8954c91e9186ff1c485b5cd2701b69d5227ef33d98a5395268eb02332666611
验证
最后运行一下成功后需要执行的命令,并验证主节点是否存在
# 这段命令直接执行,初始化成功后提示的,不需要修改
mkdir -p $HOME/.kube
sudo cp -i /etc/kubernetes/admin.conf $HOME/.kube/config
sudo chown $(id -u):$(id -g) $HOME/.kube/config
# 查看主节点是不是在,docker是否启动成功
kubectl get nodes
> NAME STATUS ROLES AGE VERSION
> k8s-master NotReady control-plane 3h46m v1.28.15
docker ps
> CONTAINER ID IMAGE COMMAND CREATED STATUS PORTS NAMES
> 1218ac19cf38 940f54a5bcae "/usr/local/bin/kube…" 4 hours ago Up 4 hours k8s_kube-proxy_kube-proxy-qctdx_kube-system_1760cfa5-36f2-4cdf-8633-c21ea1b062e9_0
> b3048947a366 registry.aliyuncs.com/google_containers/pause:3.9 "/pause" 4 hours ago Up 4 hours k8s_POD_kube-proxy-qctdx_kube-system_1760cfa5-36f2-4cdf-8633-c21ea1b062e9_0
> 05c9b056d2ba 00543d2fe5d7 "kube-apiserver --ad…" 4 hours ago Up 4 hours k8s_kube-apiserver_kube-apiserver-k8s-master_kube-system_57f73696a474af51d70e9b1c94ce436a_0
> 8e2f18909852 762dce4090c5 "kube-scheduler --au…" 4 hours ago Up 4 hours k8s_kube-scheduler_kube-scheduler-k8s-master_kube-system_de1d2e45d6b9a2308036bffaaadd3eac_0
> 3ae551e86403 27e3830e1402 "etcd --advertise-cl…" 4 hours ago Up 4 hours k8s_etcd_etcd-k8s-master_kube-system_08cdc7707b74498e298e27f1832b2cac_0
> 8a68b0f2521a 46cc66ccc7c1 "kube-controller-man…" 4 hours ago Up 4 hours k8s_kube-controller-manager_kube-controller-manager-k8s-master_kube-system_d78de7bd32e364977b61756008eae129_0
> 1fbd55621266 registry.aliyuncs.com/google_containers/pause:3.9 "/pause" 4 hours ago Up 4 hours k8s_POD_kube-controller-manager-k8s-master_kube-system_d78de7bd32e364977b61756008eae129_0
> 65f41294aba6 registry.aliyuncs.com/google_containers/pause:3.9 "/pause" 4 hours ago Up 4 hours k8s_POD_etcd-k8s-master_kube-system_08cdc7707b74498e298e27f1832b2cac_0
> 2cd06e9a4ecc registry.aliyuncs.com/google_containers/pause:3.9 "/pause" 4 hours ago Up 4 hours k8s_POD_kube-apiserver-k8s-master_kube-system_57f73696a474af51d70e9b1c94ce436a_0
> f5a3742507bf registry.aliyuncs.com/google_containers/pause:3.9 "/pause" 4 hours ago Up 4 hours k8s_POD_kube-scheduler-k8s-master_kube-system_de1d2e45d6b9a2308036bffaaadd3eac_0
calico网络插件连接(仅在主节点执行)
官网方式(通过tigera-operator.yaml与custom-resources.yaml安装)
官网calico官网
# 下载配置文件(不同版本号自行替换就好)
wget https://raw.githubusercontent.com/projectcalico/calico/v3.29.1/manifests/tigera-operator.yaml
wget https://raw.githubusercontent.com/projectcalico/calico/v3.29.1/manifests/custom-resources.yaml
# 下载之后tigera-operator.yaml文件不用修改,直接添加pod
kubectl create -f tigera-operator.yaml
# 验证tigera-operator.yaml是否运行成功
[root@k8s-master01 opt]# kubectl get ns
NAME STATUS AGE
default Active 67s
kube-node-lease Active 67s
kube-public Active 67s
kube-system Active 67s
tigera-operator Active 16s
[root@k8s-master01 opt]# kubectl get pods -n tigera-operator
NAME READY STATUS RESTARTS AGE
tigera-operator-94d7f7696-wts92 1/1 Running 0 23s
# 修改custom-resources.yaml文件,主要是cidr的网络情况(就是kubeadm.yaml当中的podsubnet字段对应的网络)
[root@k8s-master01 opt]# cat custom-resources.yaml
# This section includes base Calico installation configuration.
# For more information, see: https://projectcalico.docs.tigera.io/master/reference/installation/api#operator.tigera.io/v1.Installation
apiVersion: operator.tigera.io/v1
kind: Installation
metadata:
name: default
spec:
# Configures Calico networking.
calicoNetwork:
# Note: The ipPools section cannot be modified post-install.
ipPools:
- blockSize: 26
cidr: 10.244.0.0/16
encapsulation: VXLANCrossSubnet
natOutgoing: Enabled
nodeSelector: all()
registry: swr.cn-north-4.myhuaweicloud.com/ddn-k8s/docker.io
---
# This section configures the Calico API server.
# For more information, see: https://projectcalico.docs.tigera.io/master/reference/installation/api#operator.tigera.io/v1.APIServer
apiVersion: operator.tigera.io/v1
kind: APIServer
metadata:
name: default
spec: {}
# 加入pod任务是否运行成功
kubectl create -f custom-resources.yaml
# 监听pod到其正常运行
watch kubectl get pods -n calico-system
## 正常运行如下
[root@k8s-master01 ~]# kubectl get pods -n calico-system
NAME READY STATUS RESTARTS AGE
calico-kube-controllers-69c46f4d57-29wml 0/1 Running 1 (92s ago) 6m16s
calico-node-tgjtt 0/1 Running 2 (20s ago) 6m17s
calico-typha-8795ffdfb-jdl2j 0/1 Running 2 (27s ago) 6m18s
csi-node-driver-cvn4d 2/2 Running 0 6m17s
# 最后可以看到所有的node都处于ready状态
kubectl get nodes
root@k8s-master:~# kubectl get nodes
NAME STATUS ROLES AGE VERSION
k8s-master Ready control-plane 20h v1.28.15
k8s-slave Ready <none> 20h v1.28.15
通过calico.yaml文件整合到kube-system进行安装
# 下载相关文件(版本不同可以自行修改)
wget --no-check-certificate https://projectcalico.docs.tigera.io/archive/v3.25/manifests/calico.yaml
# 主要修改两个部分内容(由于内容较多,采取vim修改最好)
# 1. 添加网卡名称
- name: CLUSTER_TYPE
value: "k8s,bgp"
## 以下为添加的内容
- name: IP_AUTODETECTION_METHOD
value: "interface=<网卡号,如eth0>"
##
# 2. 修改cidr内容
# The default IPv4 pool to create on startup if none exists. Pod IPs will be
# chosen from this range. Changing this value after installation will have
# no effect. This should fall within `--cluster-cidr`.
### 以下为修改的内容
- name: CALICO_IPV4POOL_CIDR
value: "10.244.0.0/16"
###
# Disable file logging so `kubectl logs` works.
- name: CALICO_DISABLE_FILE_LOGGING
value: "true"
# Set Felix endpoint to host default action to ACCEPT.
- name: FELIX_DEFAULTENDPOINTTOHOSTACTION
value: "ACCEPT"
# Disable IPv6 on Kubernetes.
- name: FELIX_IPV6SUPPORT
value: "false"
- name: FELIX_HEALTHENABLED
value: "true"
# 3. 添加pod进行运行
kubectl create -f calico.yaml
# 4. 检查运行情况以及node节点情况
kubectl get pods -n kube-system
kubectl get nodes
报错汇总
- 在云服务器当中建立集群
由于云服务器外网 IP 并不是本机的网卡,而是网关分配的一个供外部访问的 IP,从而导致初始化进程一直重试绑定,长时间卡住后失败。(通过排查,主要是etcd容器没法绑定主机端口,所以要改下配置文件)
# 修改刚才的kubeadm.yaml文件
vim kubeadm.yaml
# 输出
apiVersion: kubeadm.k8s.io/v1beta3
bootstrapTokens:
- groups:
- system:bootstrappers:kubeadm:default-node-token
token: abcdef.0123456789abcdef
ttl: 24h0m0s
usages:
- signing
- authentication
kind: InitConfiguration
localAPIEndpoint:
advertiseAddress: 47.109.190.25
bindPort: 6443
nodeRegistration:
criSocket: unix:///var/run/cri-dockerd.sock
imagePullPolicy: IfNotPresent
name: k8s-master01
taints: null
---
apiServer:
timeoutForControlPlane: 4m0s
certSANs: # 增加证书认证SAN信息
- 47.109.190.25
- 172.26.165.152
apiVersion: kubeadm.k8s.io/v1beta3
certificatesDir: /etc/kubernetes/pki
clusterName: kubernetes
controllerManager: {}
dns: {}
etcd:
local:
dataDir: /var/lib/etcd
extraArgs:
listen-client-urls: https://0.0.0.0:2379 # 修改etcd的两个参数
listen-peer-urls: https://0.0.0.0:2380
imageRepository: registry.aliyuncs.com/google_containers
kind: ClusterConfiguration
kubernetesVersion: 1.28.0
networking:
dnsDomain: cluster.local
podSubnet: 10.244.0.0/16
serviceSubnet: 10.96.0.0/12
scheduler: {}
- 证书问题报错
# 报错如下:
E0112 12:04:01.567779 3725 memcache.go:265] couldn't get current server API group list: Get "https://172.26.165.152:6443/api?timeout=32s": tls: failed to verify certificate: x509: certificate signed by unknown authority (possibly because of "crypto/rsa: verification error" while trying to verify candidate authority certificate "kubernetes")
Unable to connect to the server: tls: failed to verify certificate: x509: certificate signed by unknown authority (possibly because of "crypto/rsa: verification error" while trying to verify candidate authority certificate "kubernetes")
# 按道理,在加入apiserver的SAN的时候,应该不会报这个错误了。
# 原因大抵是没读取到证书,重新生成证书重新复制配置文件过去
kubeadm init phase certs apiserver --config=kubeadm.yaml
cp -i /etc/kubernetes/admin.conf $HOME/.kube/config
- 进行网络插件calico的配置中CrashLoopBackOff报错。
个人碰到的情况有两种,一个是连接问题,也就是上面的不能连接到api-server。二是服务器架构和镜像架构不一致,导致一个node挂掉(这个利用国内镜像的情况下很容易出现)。
- 排查方式
# 通过以下两个命令排查日志
kubectl logs (pod名称) -n (namespace名称)
kubectl decribe po (pod名称) -n (namespace名称)
### 例
[root@k8s-master01 ~]# kubectl get pods -n kube-system
NAME READY STATUS RESTARTS AGE
calico-kube-controllers-665548954f-45n68 1/1 Running 0 43h
calico-node-kvg4m 0/1 Init:CrashLoopBackOff 52 (20h ago) 43h
calico-node-z2k5z 1/1 Running 0 43h
coredns-66f779496c-2mtrb 1/1 Running 0 43h
coredns-66f779496c-4psf9 1/1 Running 0 43h
etcd-k8s-master01 1/1 Running 0 43h
kube-apiserver-k8s-master01 1/1 Running 0 43h
kube-controller-manager-k8s-master01 1/1 Running 0 43h
kube-proxy-s9tct 1/1 Running 0 43h
kube-proxy-wgrng 1/1 Running 0 43h
kube-scheduler-k8s-master01 1/1 Running 0 43h
# 后续命令
kubectl logs calico-node-kvg4m -n kube-system
kubectl describe pods calico-node-kvg4m -n kube-system
###
通过日志基本上可以确定挂掉的原因。
- 如果是连接错误或认证错误,重新生成证书文件并配置环境变量就好了:
kubeadm init phase certs apiserver --config=kubeadm.yaml
、export KUBECONFIG=/etc/kubernetes/admin.conf
- 如果是架构报错,比方说我去去log的时候报错:
尤其是arm64架构采用国内镜像时,kubectl logs calico-node-l6d2s -n calico-system -c flexvol-driver exec /usr/local/bin/flexvol.sh: exec format error
swr.cn-north-4.myhuaweicloud.com/ddn-k8s/docker.io/rancher/mirrored-calico-cni:v3.17.2
默认是amd64的镜像。需要替换为
swr.cn-north-4.myhuaweicloud.com/ddn-k8s/docker.io/calico/typha:v3.28.0-linuxarm64
- 镜像拉取报错ImagePullBackOff、ErrImageNeverPull
这个是因为docker.io难以连接。有两种解决方案
- 如果是calico3.26+的用户,加入pod插件时,使用的是custom-resources.yaml。增加这一行参数
registry: swr.cn-north-4.myhuaweicloud.com/ddn-k8s/docker.io
。(要注意这只能是非arm架构的服务器,不然镜像可能会不兼容)
# 文件内容如下
# This section includes base Calico installation configuration.
# For more information, see: https://projectcalico.docs.tigera.io/master/reference/installation/api#operator.tigera.io/v1.Installation
apiVersion: operator.tigera.io/v1
kind: Installation
metadata:
name: default
spec:
# Configures Calico networking.
calicoNetwork:
# Note: The ipPools section cannot be modified post-install.
ipPools:
- blockSize: 26
cidr: 10.244.0.0/16
encapsulation: VXLANCrossSubnet
natOutgoing: Enabled
nodeSelector: all()
registry: swr.cn-north-4.myhuaweicloud.com/ddn-k8s/docker.io
---
# This section configures the Calico API server.
# For more information, see: https://projectcalico.docs.tigera.io/master/reference/installation/api#operator.tigera.io/v1.APIServer
apiVersion: operator.tigera.io/v1
kind: APIServer
metadata:
name: default
spec: {}
- 如果是使用calico.yaml(
wget --no-check-certificate https://projectcalico.docs.tigera.io/archive/v3.25/manifests/calico.yaml
)的用户# 会用vim的用户搜索image对应的镜像进行修改就行 # 或者使用以下替换命令 sed -i 's#docker.io/#swr.cn-north-4.myhuaweicloud.com/ddn-k8s/docker.io/#g' calico.yaml
- 最后重新应用就好
kubectl apply -f calico.yaml
kubectl apply -f custom-resources.yaml
- 安装portainer报错 Warning FailedScheduling 2s (x4 over 15m) default-scheduler 0/2 nodes are available: pod has unbound immediate PersistentVolumeClaims. preemption: 0/2 nodes are available: 2 No preemption victims found for incoming pod..
- 原因:kubeadm初始化时不会自动下载一个默认储存类,需要自行下载:
kubectl apply -f https://raw.githubusercontent.com/rancher/local-path-provisioner/master/deploy/local-path-storage.yaml
### local-path-storage.yaml内容,用于直接拷贝
apiVersion: v1
kind: Namespace
metadata:
name: local-path-storage
---
apiVersion: v1
kind: ServiceAccount
metadata:
name: local-path-provisioner-service-account
namespace: local-path-storage
---
apiVersion: rbac.authorization.k8s.io/v1
kind: Role
metadata:
name: local-path-provisioner-role
namespace: local-path-storage
rules:
- apiGroups: [""]
resources: ["pods"]
verbs: ["get", "list", "watch", "create", "patch", "update", "delete"]
---
apiVersion: rbac.authorization.k8s.io/v1
kind: ClusterRole
metadata:
name: local-path-provisioner-role
rules:
- apiGroups: [""]
resources: ["nodes", "persistentvolumeclaims", "configmaps", "pods", "pods/log"]
verbs: ["get", "list", "watch"]
- apiGroups: [""]
resources: ["persistentvolumes"]
verbs: ["get", "list", "watch", "create", "patch", "update", "delete"]
- apiGroups: [""]
resources: ["events"]
verbs: ["create", "patch"]
- apiGroups: ["storage.k8s.io"]
resources: ["storageclasses"]
verbs: ["get", "list", "watch"]
---
apiVersion: rbac.authorization.k8s.io/v1
kind: RoleBinding
metadata:
name: local-path-provisioner-bind
namespace: local-path-storage
roleRef:
apiGroup: rbac.authorization.k8s.io
kind: Role
name: local-path-provisioner-role
subjects:
- kind: ServiceAccount
name: local-path-provisioner-service-account
namespace: local-path-storage
---
apiVersion: rbac.authorization.k8s.io/v1
kind: ClusterRoleBinding
metadata:
name: local-path-provisioner-bind
roleRef:
apiGroup: rbac.authorization.k8s.io
kind: ClusterRole
name: local-path-provisioner-role
subjects:
- kind: ServiceAccount
name: local-path-provisioner-service-account
namespace: local-path-storage
---
apiVersion: apps/v1
kind: Deployment
metadata:
name: local-path-provisioner
namespace: local-path-storage
spec:
replicas: 1
selector:
matchLabels:
app: local-path-provisioner
template:
metadata:
labels:
app: local-path-provisioner
spec:
serviceAccountName: local-path-provisioner-service-account
containers:
- name: local-path-provisioner
image: rancher/local-path-provisioner:v0.0.30
imagePullPolicy: IfNotPresent
command:
- local-path-provisioner
- --debug
- start
- --config
- /etc/config/config.json
volumeMounts:
- name: config-volume
mountPath: /etc/config/
env:
- name: POD_NAMESPACE
valueFrom:
fieldRef:
fieldPath: metadata.namespace
- name: CONFIG_MOUNT_PATH
value: /etc/config/
volumes:
- name: config-volume
configMap:
name: local-path-config
---
apiVersion: storage.k8s.io/v1
kind: StorageClass
metadata:
name: local-path
provisioner: rancher.io/local-path
volumeBindingMode: WaitForFirstConsumer
reclaimPolicy: Delete
---
kind: ConfigMap
apiVersion: v1
metadata:
name: local-path-config
namespace: local-path-storage
data:
config.json: |-
{
"nodePathMap":[
{
"node":"DEFAULT_PATH_FOR_NON_LISTED_NODES",
"paths":["/opt/local-path-provisioner"]
}
]
}
setup: |-
#!/bin/sh
set -eu
mkdir -m 0777 -p "$VOL_DIR"
teardown: |-
#!/bin/sh
set -eu
rm -rf "$VOL_DIR"
helperPod.yaml: |-
apiVersion: v1
kind: Pod
metadata:
name: helper-pod
spec:
priorityClassName: system-node-critical
tolerations:
- key: node.kubernetes.io/disk-pressure
operator: Exists
effect: NoSchedule
containers:
- name: helper-pod
image: busybox
imagePullPolicy: IfNotPresent
###
# 检查储存类是否启动成功
kubectl get storageclass
# 将它设置为默认
kubectl patch storageclass local-path -p '{"metadata": {"annotations":{"storageclass.kubernetes.io/is-default-class":"true"}}}'
【推荐】国内首个AI IDE,深度理解中文开发场景,立即下载体验Trae
【推荐】编程新体验,更懂你的AI,立即体验豆包MarsCode编程助手
【推荐】抖音旗下AI助手豆包,你的智能百科全书,全免费不限次数
【推荐】轻量又高性能的 SSH 工具 IShell:AI 加持,快人一步
· 无需6万激活码!GitHub神秘组织3小时极速复刻Manus,手把手教你使用OpenManus搭建本
· Manus爆火,是硬核还是营销?
· 终于写完轮子一部分:tcp代理 了,记录一下
· 别再用vector<bool>了!Google高级工程师:这可能是STL最大的设计失误
· 单元测试从入门到精通