基于Centos 7.8 和Kubeadm部署k8s高可用集群
原文作者:Zhangguanzhang
原文链接:http://zhangguanzhang.github.io/2019/11/24/kubeadm-base-use/
一:系统基础配置
这里我们认为您的系统是最新且最小化安装的。
1. 确保时间统一
yum install chrony -y systemctl enable chronyd && systemctl restart chronyd
2:关闭交换分区
swapoff -a && sysctl -w vm.swappiness=0
sed -ri '/^[^#]*swap/s@^@#@' /etc/fstab
3:关闭防火墙以及selinux
systemctl stop firewalld && systemctl disable firewalld
setenforce 0
sed -ri '/^[^#]*SELINUX=/s#=.+$#=disabled#' /etc/selinux/config
4. 关闭NetworkManager,如果ip不是通过NetworkManager纳管的,建议关闭,然后使用network;这里我们依然使用的是network
systemctl disable NetworkManager && systemctl stop NetworkManager
systemctl restart network
5. 安装epel源,并且替换为阿里云的epel源
yum install epel-release wget -y
wget -O /etc/yum.repos.d/epel.repo http://mirrors.aliyun.com/repo/epel-7.repo
6. 安装依赖组件
yum install -y \
curl \
git \
conntrack-tools \
psmisc \
nfs-utils \
jq \
socat \
bash-completion \
ipset \
ipvsadm \
conntrack \
libseccomp \
net-tools \
crontabs \
sysstat \
unzip \
iftop \
nload \
strace \
bind-utils \
tcpdump \
telnet \
lsof \
htop
二:集群kube-proxy使用ipvs模式需要开机加载下列模块
这里按照规范使用systemd-modules-load
来加载而不是在/etc/rc.local
里写modprobe
vim /etc/modules-load.d/ipvs.conf
ip_vs
ip_vs_rr
ip_vs_wrr
ip_vs_sh
nf_conntrack
br_netfilter
systemctl daemon-reload && systemctl enable --now systemd-modules-load.service
确认内核加载模块
[root@k8s-m1 ~]# lsmod | grep ip_v ip_vs_sh 12688 0 ip_vs_wrr 12697 0 ip_vs_rr 12600 0 ip_vs 145497 6 ip_vs_rr,ip_vs_sh,ip_vs_wrr nf_conntrack 139264 1 ip_vs libcrc32c 12644 3 xfs,ip_vs,nf_conntrack
三: 设定系统参数
所有机器
需要设定/etc/sysctl.d/k8s.conf
的系统参数,目前对ipv6支持不怎么好,所以里面也关闭ipv6了。
cat <<EOF > /etc/sysctl.d/k8s.conf net.ipv6.conf.all.disable_ipv6 = 1 net.ipv6.conf.default.disable_ipv6 = 1 net.ipv6.conf.lo.disable_ipv6 = 1 net.ipv4.neigh.default.gc_stale_time = 120 net.ipv4.conf.all.rp_filter = 0 net.ipv4.conf.default.rp_filter = 0 net.ipv4.conf.default.arp_announce = 2 net.ipv4.conf.lo.arp_announce = 2 net.ipv4.conf.all.arp_announce = 2 net.ipv4.ip_forward = 1 net.ipv4.tcp_max_tw_buckets = 5000 net.ipv4.tcp_syncookies = 1 net.ipv4.tcp_max_syn_backlog = 1024 net.ipv4.tcp_synack_retries = 2 # 要求iptables不对bridge的数据进行处理 net.bridge.bridge-nf-call-ip6tables = 1 net.bridge.bridge-nf-call-iptables = 1 net.bridge.bridge-nf-call-arptables = 1 net.netfilter.nf_conntrack_max = 2310720 fs.inotify.max_user_watches=89100 fs.may_detach_mounts = 1 fs.file-max = 52706963 fs.nr_open = 52706963 vm.overcommit_memory=1 vm.panic_on_oom=0 EOF
如果kube-proxy使用ipvs的话为了防止timeout需要设置下tcp参数
cat <<EOF >> /etc/sysctl.d/k8s.conf # https://github.com/moby/moby/issues/31208 # ipvsadm -l --timout # 修复ipvs模式下长连接timeout问题 小于900即可 net.ipv4.tcp_keepalive_time = 600 net.ipv4.tcp_keepalive_intvl = 30 net.ipv4.tcp_keepalive_probes = 10 EOF sysctl --system
优化设置 journal 日志相关,避免日志重复搜集,浪费系统资源。修改systemctl启动的最小文件打开数量。关闭ssh反向dns解析
# 下面两句apt系列系统没有,执行不影响 sed -ri 's/^\$ModLoad imjournal/#&/' /etc/rsyslog.conf sed -ri 's/^\$IMJournalStateFile/#&/' /etc/rsyslog.conf sed -ri 's/^#(DefaultLimitCORE)=/\1=100000/' /etc/systemd/system.conf sed -ri 's/^#(DefaultLimitNOFILE)=/\1=100000/' /etc/systemd/system.conf sed -ri 's/^#(UseDNS )yes/\1no/' /etc/ssh/sshd_config
文件最大打开数,按照规范,在子配置文件写
cat>/etc/security/limits.d/kubernetes.conf<<EOF * soft nproc 131072 * hard nproc 131072 * soft nofile 131072 * hard nofile 131072 root soft nproc 131072 root hard nproc 131072 root soft nofile 131072 root hard nofile 131072 EOF
docker官方的内核检查脚本建议(RHEL7/CentOS7: User namespaces disabled; add 'user_namespace.enable=1' to boot command line)
,如果是yum系列的系统使用下面命令开启,
grubby --args="user_namespace.enable=1" --update-kernel="$(grubby --default-kernel)"
四: 安装docker
检查系统内核和模块是否适合运行 docker (仅适用于 linux 系统),该脚本可能因为墙的原因无法生成,可以先去掉重定向看看能不能访问到脚本
curl -s https://raw.githubusercontent.com/docker/docker/master/contrib/check-config.sh > check-config.sh bash ./check-config.sh
现在docker存储驱动都是使用的overlay2(不要使用devicemapper,这个坑非常多),我们重点关注overlay2是否不是绿色
这里我们使用年份命名版本的docker-ce,假设我们要安装v1.18.5
的k8s,我们去https://github.com/kubernetes/kubernetes/tree/master/CHANGELOG
里进对应版本的CHANGELOG-1.18.md
里搜The list of validated docker versions remain
查找官方验证过的docker版本,docker版本不一定得在列表里,实际上测试过19.03也能使用(19.03+修复了runc的一个性能bug),这里我们使用docker官方的安装脚本安装docker(该脚本支持centos和ubuntu).
export VERSION=19.03 curl -fsSL "https://get.docker.com/" | bash -s -- --mirror Aliyun
所有机器
配置加速源并配置docker的启动参数使用systemd,使用systemd是官方的建议,详见 https://kubernetes.io/docs/setup/cri/
mkdir -p /etc/docker/ cat>/etc/docker/daemon.json<<EOF { "exec-opts": ["native.cgroupdriver=systemd"], "bip": "169.254.123.1/24", "oom-score-adjust": -1000, "registry-mirrors": [ "https://fz5yth0r.mirror.aliyuncs.com", "https://dockerhub.mirrors.nwafu.edu.cn/", "https://mirror.ccs.tencentyun.com", "https://docker.mirrors.ustc.edu.cn/", "https://reg-mirror.qiniu.com", "http://hub-mirror.c.163.com/", "https://registry.docker-cn.com" ], "storage-driver": "overlay2", "storage-opts": [ "overlay2.override_kernel_check=true" ], "log-driver": "json-file", "log-opts": { "max-size": "100m", "max-file": "3" } } EOF
Live Restore Enabled
这个千万别开,某些极端情况下容器Dead状态之类的必须重启docker daemon才能解决,开了就只能重启机器解决了
复制补全脚本
cp /usr/share/bash-completion/completions/docker /etc/bash_completion.d/
启动docker并看下信息是否正常
systemctl enable --now docker docker info
五:kube-nginx部署
这里我们使用nginx实现local proxy来玩,因为localproxy是每台机器上的,可以不用SLB和无视在云上vpc里无法使用vip的限制,需要每个机器上运行nginx实现
每台机器配置hosts
[root@k8s-m1 src]# cat /etc/hosts 127.0.0.1 localhost localhost.localdomain localhost4 localhost4.localdomain4 ::1 localhost localhost.localdomain localhost6 localhost6.localdomain6 127.0.0.1 apiserver.k8s.local 192.168.50.101 apiserver01.k8s.local 192.168.50.102 apiserver02.k8s.local 192.168.50.103 apiserver03.k8s.local 192.168.50.101 k8s-m1 192.168.50.102 k8s-m2 192.168.50.103 k8s-m3 192.168.50.104 k8s-node1 192.168.50.105 k8s-node2 192.168.50.106 k8s-node3
每台机器生成nginx配置文件,上面的三个hosts可以不写,写下面配置文件里域名写ip即可,但是这样更改ip需要重新加载。这里我跟原作者不一样的,是我自己手动编译nginx来做的。
mkdir -p /etc/kubernetes [root@k8s-m1 src]# cat /etc/kubernetes/nginx.conf user nginx nginx; worker_processes auto; events { worker_connections 20240; use epoll; } error_log /var/log/kube_nginx_error.log info; stream { upstream kube-servers { hash consistent; server apiserver01.k8s.local:6443 weight=5 max_fails=1 fail_timeout=3s; server apiserver02.k8s.local:6443 weight=5 max_fails=1 fail_timeout=3s; server apiserver03.k8s.local:6443 weight=5 max_fails=1 fail_timeout=3s; } server { listen 8443 reuseport; proxy_connect_timeout 3s; # 加大timeout proxy_timeout 3000s; proxy_pass kube-servers; } }
因为localproxy是每台机器上的,可以不用SLB和vpc无法使用vip的限制,这里我们编译安装kube-nginx;所有机器都需要安装
yum install gcc gcc-c++ -y groupadd nginx useradd -r -g nginx nginx wget http://nginx.org/download/nginx-1.16.1.tar.gz -P /usr/local/src/ cd /usr/local/src/ tar zxvf nginx-1.16.1.tar.gz cd nginx-1.16.1/ ./configure --with-stream --without-http --prefix=/usr/local/kube-nginx --without-http_uwsgi_module --without-http_scgi_module --without-http_fastcgi_module make && make install #编写systemd启动 [root@k8s-m1 src]# cat /usr/lib/systemd/system/kube-nginx.service [Unit] Description=kube-apiserver nginx proxy After=network.target After=network-online.target Wants=network-online.target [Service] Type=forking ExecStartPre=/usr/local/kube-nginx/sbin/nginx -c /etc/kubernetes/nginx.conf -p /usr/local/kube-nginx -t ExecStart=/usr/local/kube-nginx/sbin/nginx -c /etc/kubernetes/nginx.conf -p /usr/local/kube-nginx ExecReload=/usr/local/kube-nginx/sbin/nginx -c /etc/kubernetes/nginx.conf -p /usr/local/kube-nginx -s reload PrivateTmp=true Restart=always RestartSec=5 StartLimitInterval=0 LimitNOFILE=65536 [Install] WantedBy=multi-user.target systemctl daemon-reload && systemctl enable kube-nginx && systemctl restart kube-nginx
六: kubeadm部署
1. 配置kubernetes阿里云的源
cat <<EOF > /etc/yum.repos.d/kubernetes.repo [kubernetes] name=Kubernetes baseurl=https://mirrors.aliyun.com/kubernetes/yum/repos/kubernetes-el7-x86_64/ enabled=1 gpgcheck=1 repo_gpgcheck=1 gpgkey=https://mirrors.aliyun.com/kubernetes/yum/doc/yum-key.gpg EOF
2. master部分
k8s的node就是kubelet+cri(一般是docker),kubectl是一个agent读取kubeconfig去访问kube-apiserver来操作集群,kubeadm是部署,所以master节点需要安装三个,node一般不需要kubectl
安装相关软件
yum install -y \ kubeadm-1.18.5 \ kubectl-1.18.5 \ kubelet-1.18.5 \ --disableexcludes=kubernetes && \ systemctl enable kubelet
node节点安装软件
yum install -y \ kubeadm-1.18.5 \ kubelet-1.18.5 \ --disableexcludes=kubernetes && \ systemctl enable kubelet
配置集群信息(第一个master上配置)
打印默认init的配置信息
kubeadm config print init-defaults > initconfig.yaml #我们看下默认init的集群参数 apiVersion: kubeadm.k8s.io/v1beta2 bootstrapTokens: - groups: - system:bootstrappers:kubeadm:default-node-token token: abcdef.0123456789abcdef ttl: 24h0m0s usages: - signing - authentication kind: InitConfiguration localAPIEndpoint: advertiseAddress: 1.2.3.4 bindPort: 6443 nodeRegistration: criSocket: /var/run/dockershim.sock name: k8s-m1 taints: - effect: NoSchedule key: node-role.kubernetes.io/master --- apiServer: timeoutForControlPlane: 4m0s apiVersion: kubeadm.k8s.io/v1beta2 certificatesDir: /etc/kubernetes/pki clusterName: kubernetes controllerManager: {} dns: type: CoreDNS etcd: local: dataDir: /var/lib/etcd imageRepository: k8s.gcr.io kind: ClusterConfiguration kubernetesVersion: v1.16.0 networking: dnsDomain: cluster.local serviceSubnet: 10.96.0.0/12 scheduler: {}
我们主要关注和只保留ClusterConfiguration
的段,然后修改下,可以参考下列的v1beta2
文档,如果是低版本可能是v1beta1,某些字段和新的是不一样的,自行查找godoc看
https://godoc.org/k8s.io/kubernetes/cmd/kubeadm/app/apis/kubeadm/v1beta2#hdr-Basics
https://godoc.org/k8s.io/kubernetes/cmd/kubeadm/app/apis/kubeadm/v1beta2
https://godoc.org/k8s.io/kubernetes/cmd/kubeadm/app/apis/kubeadm/v1beta2#pkg-constants
https://godoc.org/k8s.io/kubernetes/cmd/kubeadm/app/apis/kubeadm/v1beta2#ClusterConfiguration
ip啥的自行更改成和自己的一致,cidr不懂咋计算就别乱改。controlPlaneEndpoint写域名(内网没dns所有机器写hosts也行)或者SLB,VIP,原因和注意事项见 https://zhangguanzhang.github.io/2019/03/11/k8s-ha/ 这个文章我把HA解释得很清楚了,不要再问我了,下面是最终的yaml
apiVersion: kubeadm.k8s.io/v1beta2 kind: ClusterConfiguration imageRepository: registry.aliyuncs.com/k8sxio kubernetesVersion: v1.18.5 # 如果镜像列出的版本不对就这里写正确版本号 certificatesDir: /etc/kubernetes/pki clusterName: kubernetes networking: #https://godoc.org/k8s.io/kubernetes/cmd/kubeadm/app/apis/kubeadm/v1beta2#Networking dnsDomain: cluster.local serviceSubnet: 10.96.0.0/12 podSubnet: 10.244.0.0/16 controlPlaneEndpoint: apiserver.k8s.local:8443 # 单个master的话写master的ip或者不写 apiServer: # https://godoc.org/k8s.io/kubernetes/cmd/kubeadm/app/apis/kubeadm/v1beta2#APIServer timeoutForControlPlane: 4m0s extraArgs: authorization-mode: "Node,RBAC" enable-admission-plugins: "NamespaceLifecycle,LimitRanger,ServiceAccount,PersistentVolumeClaimResize,DefaultStorageClass,DefaultTolerationSeconds,NodeRestriction,MutatingAdmissionWebhook,ValidatingAdmissionWebhook,ResourceQuota,Priority,PodPreset" runtime-config: api/all=true,settings.k8s.io/v1alpha1=true storage-backend: etcd3 etcd-servers: https://192.168.50.101:2379,https://192.168.50.102:2379,https://192.168.50.103:2379 certSANs: - 10.96.0.1 # service cidr的第一个ip - 127.0.0.1 # 多个master的时候负载均衡出问题了能够快速使用localhost调试 - localhost - apiserver.k8s.local # 负载均衡的域名或者vip - 192.168.50.101 - 192.168.50.102 - 192.168.50.103 - apiserver01.k8s.local - apiserver02.k8s.local - apiserver03.k8s.local - master - kubernetes - kubernetes.default - kubernetes.default.svc - kubernetes.default.svc.cluster.local extraVolumes: - hostPath: /etc/localtime mountPath: /etc/localtime name: localtime readOnly: true controllerManager: # https://godoc.org/k8s.io/kubernetes/cmd/kubeadm/app/apis/kubeadm/v1beta2#ControlPlaneComponent extraArgs: bind-address: "0.0.0.0" experimental-cluster-signing-duration: 867000h extraVolumes: - hostPath: /etc/localtime mountPath: /etc/localtime name: localtime readOnly: true scheduler: extraArgs: bind-address: "0.0.0.0" extraVolumes: - hostPath: /etc/localtime mountPath: /etc/localtime name: localtime readOnly: true dns: # https://godoc.org/k8s.io/kubernetes/cmd/kubeadm/app/apis/kubeadm/v1beta2#DNS type: CoreDNS # or kube-dns imageRepository: coredns # azk8s.cn已失效,使用dockerhub上coredns官方镜像 imageTag: 1.6.7 # 阿里镜像仓库目前只有1.6.7,最新见dockerhub etcd: # https://godoc.org/k8s.io/kubernetes/cmd/kubeadm/app/apis/kubeadm/v1beta2#Etcd local: imageRepository: quay.io/coreos imageTag: v3.4.7 dataDir: /var/lib/etcd serverCertSANs: # server和peer的localhost,127,::1都默认自带的不需要写 - master - 192.168.50.101 - 192.168.50.102 - 192.168.50.103 - etcd01.k8s.local - etcd02.k8s.local - etcd03.k8s.local peerCertSANs: - master - 192.168.50.101 - 192.168.50.102 - 192.168.50.103 - etcd01.k8s.local - etcd02.k8s.local - etcd03.k8s.local extraArgs: # 暂时没有extraVolumes auto-compaction-retention: "1h" max-request-bytes: "33554432" quota-backend-bytes: "8589934592" enable-v2: "false" # disable etcd v2 api # external: //外部etcd的时候这样配置 https://godoc.org/k8s.io/kubernetes/cmd/kubeadm/app/apis/kubeadm/v1beta2#Etcd # endpoints: # - "https://172.19.0.2:2379" # - "https://172.19.0.3:2379" # - "https://172.19.0.4:2379" # caFile: "/etc/kubernetes/pki/etcd/ca.crt" # certFile: "/etc/kubernetes/pki/etcd/etcd.crt" # keyFile: "/etc/kubernetes/pki/etcd/etcd.key" --- apiVersion: kubeproxy.config.k8s.io/v1alpha1 kind: KubeProxyConfiguration # https://godoc.org/k8s.io/kube-proxy/config/v1alpha1#KubeProxyConfiguration mode: ipvs # or iptables ipvs: excludeCIDRs: null minSyncPeriod: 0s scheduler: "rr" # 调度算法 syncPeriod: 15s iptables: masqueradeAll: true masqueradeBit: 14 minSyncPeriod: 0s syncPeriod: 30s --- apiVersion: kubelet.config.k8s.io/v1beta1 kind: KubeletConfiguration # https://godoc.org/k8s.io/kubelet/config/v1beta1#KubeletConfiguration cgroupDriver: systemd failSwapOn: true # 如果开启swap则设置为false
- swap的话看最后一行,apiserver的exterArgs是为了开启
podPreset
,1.16之前且包括1.16,runtime-config
的值应该设置为api/all,settings.k8s.io/v1alpha1=true
- 单台master的话把
controlPlaneEndpoint
的值改为第一个master的ip - etcd的支持版本可以代码里查看 https://github.com/kubernetes/kubernetes/blob/master/cmd/kubeadm/app/constants/constants.go#L422-L430
检查文件是否错误,忽略warning
,错误的话会抛出error,没错则会输出到包含字符串kubeadm join xxx
啥的
kubeadm init --config initconfig.yaml --dry-run
检查镜像是否正确,版本号不正确就把yaml里的kubernetesVersion
取消注释写上自己对应的版本号
kubeadm config images list --config initconfig.yaml
预先拉取镜像
kubeadm config images pull --config initconfig.yaml # 下面是输出 [config/images] Pulled gcr.azk8s.cn/google_containers/kube-apiserver:v1.18.5 [config/images] Pulled gcr.azk8s.cn/google_containers/kube-controller-manager:v1.18.5 [config/images] Pulled gcr.azk8s.cn/google_containers/kube-scheduler:v1.18.5 [config/images] Pulled gcr.azk8s.cn/google_containers/kube-proxy:v1.18.5 [config/images] Pulled gcr.azk8s.cn/google_containers/pause:3.1 [config/images] Pulled quay.azk8s.cn/coreos/etcd:v3.4.7 [config/images] Pulled coredns/coredns:1.6.3
七:kubeadm init
下面init只在第一个master上面操作
# --experimental-upload-certs 参数的意思为将相关的证书直接上传到etcd中保存,这样省去我们手动分发证书的过程 # 注意在v1.15+版本中,已经变成正式参数,不再是实验性质,之前的版本请使用 --experimental-upload-certs kubeadm init --config initconfig.yaml --upload-certs
如果超时了看看是不是kubelet没起来,调试见 https://github.com/zhangguanzhang/Kubernetes-ansible/wiki/systemctl-running-debug
记住init后打印的token,复制kubectl的kubeconfig,kubectl的kubeconfig路径默认是~/.kube/config
mkdir -p $HOME/.kube sudo cp -i /etc/kubernetes/admin.conf $HOME/.kube/config sudo chown $(id -u):$(id -g) $HOME/.kube/config
init的yaml信息实际上会存在集群的configmap里,我们可以随时查看,该yaml在其他node和master join的时候会使用到
kubectl -n kube-system get cm kubeadm-config -o yaml
如果单个master,也不想整其他的node,需要去掉master节点上的污点,下一步的多master操作不需要整
kubectl taint nodes --all node-role.kubernetes.io/master-
设置ep的rbac
kube-apiserver的web健康检查路由有权限,我们需要开放用来监控或者对接SLB的健康检查,yaml文件 https://github.com/zhangguanzhang/Kubernetes-ansible-base/blob/roles/master/files/healthz-rbac.yml
kubectl apply -f https://raw.githubusercontent.com/zhangguanzhang/Kubernetes-ansible-base/roles/master/files/healthz-rbac.yml
配置其他master的k8s管理组件
手动拷贝(某些低版本不支持上传证书的时候操作,如果前面kubeadm init的时候加了上传证书选项这步不用执行)
第一个master上拷贝ca证书到其他master节点上,因为交互输入密码,我们安装sshpass,zhangguanzhang是root密码
yum install sshpass -y alias ssh='sshpass -p zhangguanzhang ssh -o StrictHostKeyChecking=no' alias scp='sshpass -p zhangguanzhang scp -o StrictHostKeyChecking=no'
复制ca证书到其他master节点
for node in 172.19.0.3 172.19.0.4;do ssh $node 'mkdir -p /etc/kubernetes/pki/etcd' scp -r /etc/kubernetes/pki/ca.* $node:/etc/kubernetes/pki/ scp -r /etc/kubernetes/pki/sa.* $node:/etc/kubernetes/pki/ scp -r /etc/kubernetes/pki/front-proxy-ca.* $node:/etc/kubernetes/pki/ scp -r /etc/kubernetes/pki/etcd/ca.* $node:/etc/kubernetes/pki/etcd/ done
其他master join进来
kubeadm join apiserver.k8s.local:8443 --token vo6qyo.4cm47w561q9p830v \ --discovery-token-ca-cert-hash sha256:46e177c317037a4815c6deaab8089da4340663efeeead40810d4f53239256671 \ --control-plane --certificate-key ba869da2d611e5afba5f9959a5f18891c20fb56d90592225765c0b965e3d8783
token忘记的话可以kubeadm token list
查看,可以通过kubeadm token create
创建
sha256的值可以通过下列命令获取
openssl x509 -pubkey -in \ /etc/kubernetes/pki/ca.crt | \ openssl rsa -pubin -outform der 2>/dev/null | \ openssl dgst -sha256 -hex | sed 's/^.* //'
设置kubectl的补全脚本
kubectl completion bash > /etc/bash_completion.d/kubectl
所有master配置etcdctl
复制出容器里的etcdctl
docker cp `docker ps -a | awk '/k8s_etcd/{print $1}'`:/usr/local/bin/etcdctl /usr/local/bin/etcdctl
1.13还是具体哪个版本后k8s默认使用v3 api的etcd,这里我们配置下etcdctl的参数
cat >/etc/profile.d/etcd.sh<<'EOF' ETCD_CERET_DIR=/etc/kubernetes/pki/etcd/ ETCD_CA_FILE=ca.crt ETCD_KEY_FILE=healthcheck-client.key ETCD_CERT_FILE=healthcheck-client.crt ETCD_EP=https://192.168.50.101:2379,https://192.168.50.102:2379,https://192.168.50.103:2379 alias etcd_v2="etcdctl --cert-file ${ETCD_CERET_DIR}/${ETCD_CERT_FILE} \ --key-file ${ETCD_CERET_DIR}/${ETCD_KEY_FILE} \ --ca-file ${ETCD_CERET_DIR}/${ETCD_CA_FILE} \ --endpoints $ETCD_EP" alias etcd_v3="ETCDCTL_API=3 \ etcdctl \ --cert ${ETCD_CERET_DIR}/${ETCD_CERT_FILE} \ --key ${ETCD_CERET_DIR}/${ETCD_KEY_FILE} \ --cacert ${ETCD_CERET_DIR}/${ETCD_CA_FILE} \ --endpoints $ETCD_EP" EOF
重新ssh下或者手动加载下环境变量. /etc/profile.d/etcd.sh
[root@k8s-m1 ~]# etcd_v3 endpoint status --write-out=table +-----------------------------+------------------+---------+---------+-----------+------------+-----------+------------+--------------------+--------+ | ENDPOINT | ID | VERSION | DB SIZE | IS LEADER | IS LEARNER | RAFT TERM | RAFT INDEX | RAFT APPLIED INDEX | ERRORS | +-----------------------------+------------------+---------+---------+-----------+------------+-----------+------------+--------------------+--------+ | https://192.168.50.101:2379 | 9fdaf6a25119065e | 3.4.7 | 3.1 MB | false | false | 5 | 305511 | 305511 | | | https://192.168.50.102:2379 | a3d9d41cf6d05e08 | 3.4.7 | 3.1 MB | true | false | 5 | 305511 | 305511 | | | https://192.168.50.103:2379 | 3b34476e501895d4 | 3.4.7 | 3.0 MB | false | false | 5 | 305511 | 305511 | | +-----------------------------+------------------+---------+---------+-----------+------------+-----------+------------+--------------------+--------+
配置etcd备份脚本
mkdir -p /opt/etcd cat>/opt/etcd/etcd_cron.sh<<'EOF' #!/bin/bash set -e export PATH=/usr/local/sbin:/usr/local/bin:/usr/sbin:/usr/bin:/root/bin : ${bak_dir:=/root/} #缺省备份目录,可以修改成存在的目录 : ${cert_dir:=/etc/kubernetes/pki/etcd/} : ${endpoints:=https://192.168.50.101:2379,https://192.168.50.102:2379,https://192.168.50.103:2379} bak_prefix='etcd-' cmd_suffix='date +%Y-%m-%d-%H:%M' bak_suffix='.db' #将规范化后的命令行参数分配至位置参数($1,$2,...) temp=`getopt -n $0 -o c:d: -u -- "$@"` [ $? != 0 ] && { echo ' Examples: # just save once bash $0 /tmp/etcd.db # save in contab and keep 5 bash $0 -c 5 ' exit 1 } set -- $temp # -c 备份保留副本数量 # -d 指定备份存放目录 while true;do case "$1" in -c) [ -z "$bak_count" ] && bak_count=$2 printf -v null %d "$bak_count" &>/dev/null || \ { echo 'the value of the -c must be number';exit 1; } shift 2 ;; -d) [ ! -d "$2" ] && mkdir -p $2 bak_dir=$2 shift 2 ;; *) [[ -z "$1" || "$1" == '--' ]] && { shift;break; } echo "Internal error!" exit 1 ;; esac done function etcd_v2(){ etcdctl --cert-file $cert_dir/healthcheck-client.crt \ --key-file $cert_dir/healthcheck-client.key \ --ca-file $cert_dir/ca.crt \ --endpoints $endpoints $@ } function etcd_v3(){ ETCDCTL_API=3 etcdctl \ --cert $cert_dir/healthcheck-client.crt \ --key $cert_dir/healthcheck-client.key \ --cacert $cert_dir/ca.crt \ --endpoints $endpoints $@ } etcd::cron::save(){ cd $bak_dir/ etcd_v3 snapshot save $bak_prefix$($cmd_suffix)$bak_suffix rm_files=`ls -t $bak_prefix*$bak_suffix | tail -n +$[bak_count+1]` if [ -n "$rm_files" ];then rm -f $rm_files fi } main(){ [ -n "$bak_count" ] && etcd::cron::save || etcd_v3 snapshot save $@ } main $@ EOF
PS: 这个脚本在etcd 3.4.x之后,只能选择一个节点进行连接和备份,所有需要在每个master上执行,然后修改etcd 的 endpoints 为对应master的IP地址:
例如:
: ${endpoints:=https://192.168.50.101:2379}
crontab -e添加下面内容自动保留四
个备份副本
0 2 * * * bash /opt/etcd/etcd_cron.sh -c 4 -d /opt/etcd/ &>/dev/null
node
按照前面的做:
- 配置系统设置
- 设置hostname
- 安装docker-ce
- 设置hosts和nginx
- 配置软件源,安装kubeadm kubelet
和master的join一样,提前准备好环境和docker,然后join的时候不需要带--control-plane
,只有一个master的话join的那个ip写controlPlaneEndpoint
的值
kubeadm join apiserver.k8s.local:8443 --token vo6qyo.4cm47w561q9p830v \ --discovery-token-ca-cert-hash sha256:46e177c317037a4815c6deaab8089da4340663efeeead40810d4f53239256671
[root@k8s-m1 ~]# kubectl get node NAME STATUS ROLES AGE VERSION k8s-m1 Ready master 23h v1.18.5 k8s-m2 Ready master 23h v1.18.5 k8s-m3 Ready master 23h v1.18.5 k8s-node1 Ready node 23h v1.18.5 k8s-node2 Ready node 121m v1.18.5 k8s-node3 Ready node 82m v1.18.5
addon(此章开始到结尾选取任意一个master上执行)
容器的网络还没处理好,coredns无法分配到ip会处于pending
状态,这里我用flannel部署,如果你了解bgp可以使用calico
yaml文件来源与flannel官方github https://github.com/coreos/flannel/tree/master/Documentation
修改
-
如果是在1.16之前使用psp,
policy/v1beta1
得修改成extensions/v1beta1;这里不用修改
apiVersion: policy/v1beta1
kind: PodSecurityPolicy
- rbac的version改为下面,不要使用v1beta1
了,使用下面命令修改
sed -ri '/apiVersion: rbac/s#v1.+#v1#' kube-flannel.yml
- 官方yaml自带了四种架构的daemonset,我们删掉除了amd64以外的,大概是227行到结尾
sed -ri '227,$d' kube-flannel.yml
- pod的cidr修改了的话这里也要修改,如果是在同一个二层,可以使用把vxlan
改为性能更强的host-gw
模式,vxlan的话需要安全组放开8472端口的udp
net-conf.json: | { "Network": "10.244.0.0/16", "Backend": { "Type": "vxlan" } }
- 修改limits,需要大于request
limits: cpu: "200m" memory: "100Mi"
部署flannel
貌似没有遇到这个错误
1.15后node的cidr是数组,而不是单个了,flannel目前0.11和之前版本部署的话会有下列错误,见文档
https://github.com/kubernetes/kubernetes/blob/v1.15.0/staging/src/k8s.io/api/core/v1/types.go#L3890-L3893
https://github.com/kubernetes/kubernetes/blob/v1.18.2/staging/src/k8s.io/api/core/v1/types.go#L4206-L4216
Error registering network: failed to acquire lease: node "xxx" pod cidr not assigned
手动打patch,后续扩的node也记得打下
nodes=`kubectl get node --no-headers | awk '{print $1}'` for node in $nodes;do cidr=`kubectl get node "$node" -o jsonpath='{.spec.podCIDRs[0]}'` [ -z "$(kubectl get node $node -o jsonpath='{.spec.podCIDR}')" ] && { kubectl patch node "$node" -p '{"spec":{"podCIDR":"'"$cidr"'"}}' } done
最终的kube-flannel.yml如下:
[root@k8s-m1 ~]# cat kube-flannel.yml --- apiVersion: policy/v1beta1 kind: PodSecurityPolicy metadata: name: psp.flannel.unprivileged annotations: seccomp.security.alpha.kubernetes.io/allowedProfileNames: docker/default seccomp.security.alpha.kubernetes.io/defaultProfileName: docker/default apparmor.security.beta.kubernetes.io/allowedProfileNames: runtime/default apparmor.security.beta.kubernetes.io/defaultProfileName: runtime/default spec: privileged: false volumes: - configMap - secret - emptyDir - hostPath allowedHostPaths: - pathPrefix: "/etc/cni/net.d" - pathPrefix: "/etc/kube-flannel" - pathPrefix: "/run/flannel" readOnlyRootFilesystem: false # Users and groups runAsUser: rule: RunAsAny supplementalGroups: rule: RunAsAny fsGroup: rule: RunAsAny # Privilege Escalation allowPrivilegeEscalation: false defaultAllowPrivilegeEscalation: false # Capabilities allowedCapabilities: ['NET_ADMIN'] defaultAddCapabilities: [] requiredDropCapabilities: [] # Host namespaces hostPID: false hostIPC: false hostNetwork: true hostPorts: - min: 0 max: 65535 # SELinux seLinux: # SELinux is unused in CaaSP rule: 'RunAsAny' --- kind: ClusterRole apiVersion: rbac.authorization.k8s.io/v1 metadata: name: flannel rules: - apiGroups: ['extensions'] resources: ['podsecuritypolicies'] verbs: ['use'] resourceNames: ['psp.flannel.unprivileged'] - apiGroups: - "" resources: - pods verbs: - get - apiGroups: - "" resources: - nodes verbs: - list - watch - apiGroups: - "" resources: - nodes/status verbs: - patch --- kind: ClusterRoleBinding apiVersion: rbac.authorization.k8s.io/v1 metadata: name: flannel roleRef: apiGroup: rbac.authorization.k8s.io kind: ClusterRole name: flannel subjects: - kind: ServiceAccount name: flannel namespace: kube-system --- apiVersion: v1 kind: ServiceAccount metadata: name: flannel namespace: kube-system --- kind: ConfigMap apiVersion: v1 metadata: name: kube-flannel-cfg namespace: kube-system labels: tier: node app: flannel data: cni-conf.json: | { "name": "cbr0", "cniVersion": "0.3.1", "plugins": [ { "type": "flannel", "delegate": { "hairpinMode": true, "isDefaultGateway": true } }, { "type": "portmap", "capabilities": { "portMappings": true } } ] } net-conf.json: | { "Network": "10.244.0.0/16", "Backend": { "Type": "host-gw" } } --- apiVersion: apps/v1 kind: DaemonSet metadata: name: kube-flannel-ds-amd64 namespace: kube-system labels: tier: node app: flannel spec: selector: matchLabels: app: flannel template: metadata: labels: tier: node app: flannel spec: affinity: nodeAffinity: requiredDuringSchedulingIgnoredDuringExecution: nodeSelectorTerms: - matchExpressions: - key: kubernetes.io/os operator: In values: - linux - key: kubernetes.io/arch operator: In values: - amd64 hostNetwork: true tolerations: - operator: Exists effect: NoSchedule serviceAccountName: flannel initContainers: - name: install-cni image: quay.io/coreos/flannel:v0.12.0-amd64 command: - cp args: - -f - /etc/kube-flannel/cni-conf.json - /etc/cni/net.d/10-flannel.conflist volumeMounts: - name: cni mountPath: /etc/cni/net.d - name: flannel-cfg mountPath: /etc/kube-flannel/ containers: - name: kube-flannel image: quay.io/coreos/flannel:v0.12.0-amd64 command: - /opt/bin/flanneld args: - --ip-masq - --kube-subnet-mgr resources: requests: cpu: "100m" memory: "50Mi" limits: cpu: "200m" memory: "100Mi" securityContext: privileged: false capabilities: add: ["NET_ADMIN"] env: - name: POD_NAME valueFrom: fieldRef: fieldPath: metadata.name - name: POD_NAMESPACE valueFrom: fieldRef: fieldPath: metadata.namespace volumeMounts: - name: run mountPath: /run/flannel - name: flannel-cfg mountPath: /etc/kube-flannel/ volumes: - name: run hostPath: path: /run/flannel - name: cni hostPath: path: /etc/cni/net.d - name: flannel-cfg configMap: name: kube-flannel-cfg
这里采用了host-gw模式,因为遇到了udp的内核bug,详细请参考:https://zhangguanzhang.github.io/2020/05/23/k8s-vxlan-63-timeout/
kubectl apply -f kube-flannel.yml
验证集群可用性
kubectl -n kube-system get pod -o wide
等待kube-system空间下的pod都是running后我们来测试下集群可用性
cat<<EOF | kubectl apply -f - apiVersion: apps/v1 kind: Deployment metadata: name: nginx spec: selector: matchLabels: app: nginx template: metadata: labels: app: nginx spec: containers: - image: nginx:alpine name: nginx ports: - containerPort: 80 --- apiVersion: v1 kind: Service metadata: name: nginx spec: selector: app: nginx ports: - protocol: TCP port: 80 targetPort: 80 --- apiVersion: v1 kind: Pod metadata: name: busybox namespace: default spec: containers: - name: busybox image: zhangguanzhang/centos command: - sleep - "3600" imagePullPolicy: IfNotPresent restartPolicy: Always EOF
等待pod running
验证集群dns
$ kubectl exec -ti busybox -- nslookup kubernetes Server: 10.96.0.10 Address 1: 10.96.0.10 kube-dns.kube-system.svc.cluster.local Name: kubernetes Address 1: 10.96.0.1 kubernetes.default.svc.cluster.local
关于kubeadm过程和更多详细参数选项见下面文章
新添加节点
1. 初始化centos 7
初始化脚本
#!/bin/bash #----配置时间统一性---- echo "配置时间" yum install chrony -y mv /etc/chrony.conf /etc/chrony.conf.bak cat>/etc/chrony.conf<<EOF server ntp.aliyun.com iburst stratumweight 0 driftfile /var/lib/chrony/drift rtcsync makestep 10 3 bindcmdaddress 127.0.0.1 bindcmdaddress ::1 keyfile /etc/chrony.keys commandkey 1 generatecommandkey logchange 0.5 logdir /var/log/chrony EOF /usr/bin/systemctl enable chronyd /usr/bin/systemctl restart chronyd #---关闭交换分区--- echo "关闭交换分区" swapoff -a && sysctl -w vm.swappiness=0 sed -ri '/^[^#]*swap/s@^@#@' /etc/fstab #---关闭防火墙以及selinux--- echo "关闭防火墙以及selinux" systemctl stop firewalld systemctl disable firewalld setenforce 0 sed -ri '/^[^#]*SELINUX=/s#=.+$#=disabled#' /etc/selinux/config #---关闭NetworkManager--- echo "关闭NetworkManager" systemctl disable NetworkManager systemctl stop NetworkManager #---安装epel源,并且替换为阿里云的epel源--- yum install epel-release wget -y wget -O /etc/yum.repos.d/epel.repo http://mirrors.aliyun.com/repo/epel-7.repo #---安装依赖组件--- echo "安装依赖组件" yum install -y \ curl \ git \ conntrack-tools \ psmisc \ nfs-utils \ jq \ socat \ bash-completion \ ipset \ ipvsadm \ conntrack \ libseccomp \ net-tools \ crontabs \ sysstat \ unzip \ iftop \ nload \ strace \ bind-utils \ tcpdump \ telnet \ lsof \ htop #---ipvs模式需要开机加载下列模块--- echo "ipvs模式需要开机加载下列模块" cat>/etc/modules-load.d/ipvs.conf<<EOF ip_vs ip_vs_rr ip_vs_wrr ip_vs_sh nf_conntrack br_netfilter EOF systemctl daemon-reload systemctl enable --now systemd-modules-load.service #---设定系统参数--- cat <<EOF > /etc/sysctl.d/k8s.conf net.ipv6.conf.all.disable_ipv6 = 1 net.ipv6.conf.default.disable_ipv6 = 1 net.ipv6.conf.lo.disable_ipv6 = 1 net.ipv4.neigh.default.gc_stale_time = 120 net.ipv4.conf.all.rp_filter = 0 net.ipv4.conf.default.rp_filter = 0 net.ipv4.conf.default.arp_announce = 2 net.ipv4.conf.lo.arp_announce = 2 net.ipv4.conf.all.arp_announce = 2 net.ipv4.ip_forward = 1 net.ipv4.tcp_max_tw_buckets = 5000 net.ipv4.tcp_syncookies = 1 net.ipv4.tcp_max_syn_backlog = 1024 net.ipv4.tcp_synack_retries = 2 # 要求iptables不对bridge的数据进行处理 net.bridge.bridge-nf-call-ip6tables = 1 net.bridge.bridge-nf-call-iptables = 1 net.bridge.bridge-nf-call-arptables = 1 net.netfilter.nf_conntrack_max = 2310720 fs.inotify.max_user_watches=89100 fs.may_detach_mounts = 1 fs.file-max = 52706963 fs.nr_open = 52706963 vm.overcommit_memory=1 vm.panic_on_oom=0 # https://github.com/moby/moby/issues/31208 # ipvsadm -l --timout # 修复ipvs模式下长连接timeout问题 小于900即可 net.ipv4.tcp_keepalive_time = 600 net.ipv4.tcp_keepalive_intvl = 30 net.ipv4.tcp_keepalive_probes = 10 EOF sysctl --system #---优化设置 journal 日志相关--- sed -ri 's/^\$ModLoad imjournal/#&/' /etc/rsyslog.conf sed -ri 's/^\$IMJournalStateFile/#&/' /etc/rsyslog.conf sed -ri 's/^#(DefaultLimitCORE)=/\1=100000/' /etc/systemd/system.conf sed -ri 's/^#(DefaultLimitNOFILE)=/\1=100000/' /etc/systemd/system.conf sed -ri 's/^#(UseDNS )yes/\1no/' /etc/ssh/sshd_config #---优化文件最大打开数--- cat>/etc/security/limits.d/kubernetes.conf<<EOF * soft nproc 131072 * hard nproc 131072 * soft nofile 131072 * hard nofile 131072 root soft nproc 131072 root hard nproc 131072 root soft nofile 131072 root hard nofile 131072 EOF #---设置user_namespace.enable=1--- grubby --args="user_namespace.enable=1" --update-kernel="$(grubby --default-kernel)"
2. 编译安装nginx
yum install gcc gcc-c++ -y tar zxvf nginx-1.16.1.tar.gz cd nginx-1.16.1/ ./configure --with-stream --without-http --prefix=/usr/local/kube-nginx --without-http_uwsgi_module --without-http_scgi_module --without-http_fastcgi_module make && make install groupadd nginx useradd -r -g nginx nginx systemctl daemon-reload && systemctl enable kube-nginx && systemctl restart kube-nginx
3. 重新生成tocken
kubeadm token create openssl x509 -pubkey -in /etc/kubernetes/pki/ca.crt | openssl rsa -pubin -outform der 2>/dev/null | openssl dgst -sha256 -hex | sed 's/^.* //' kubeadm join apiserver.k8s.local:8443 --token 8ceduc.cy0r23j2hpsw80ff --discovery-token-ca-cert-hash sha256:46e177c317037a4815c6deaab8089da4340663efeeead40810d4f53239256671
error execution phase preflight: couldn't validate the identity of the API Server: could not find a JWS signature in the cluster-info ConfigMap for token ID "vo6qyo"
此时就需要重新生成tocken。