基于saltstack自动化部署高可用kubernetes集群
SaltStack自动化部署HA-Kubernetes
- 本项目在GitHub上,会不定期更新,大家也可以提交ISSUE,地址为:https://github.com/skymyyang/salt-k8s-ha
- SaltStack自动化部署Kubernetes v1.12.5版本(支持HA、TLS双向认证、RBAC授权、Flannel网络、ETCD集群、Kuber-Proxy使用LVS等)。
- SaltStack自动化部署
Kubernetes v1.13.4
版本,请切换到1.13-Release
分支.
版本明细:Release-v1.12.5
- 测试通过系统:Centos 7.6
- salt-ssh:salt-ssh 2018.3.3 (Oxygen)
- Kubernetes: v1.12.5
- Etcd:v3.3.10
- Docker:最新版本即可
- Flannel:v0.10.0
- CNI-Plugins:v0.7.0
- 建议部署节点:最少三个Master节点,请配置好主机名解析(必备)
架构介绍
- 使用Salt Grains进行角色定义,增加灵活性。
- 使用Salt Pillar进行配置项管理,保证安全性。
- 使用Salt SSH执行状态,不需要安装Agent,保证通用性。
- 使用Kubernetes当前稳定版本v1.12.5,保证稳定性。
- 使用HaProxy和keepalived来保证集群的高可用。
0.系统初始化(必备)
- 设置主机名!!!
[root@linux-node1 ~]# cat /etc/hostname linux-node1 [root@linux-node2 ~]# cat /etc/hostname linux-node2 [root@linux-node3 ~]# cat /etc/hostname linux-node3 [root@linux-node4 ~]# cat /etc/hostname linux-node4
- 设置/etc/hosts保证主机名能够解析
[root@linux-node1 ~]# cat /etc/hosts 127.0.0.1 localhost localhost.localdomain localhost4 localhost4.localdomain4 ::1 localhost localhost.localdomain localhost6 localhost6.localdomain6 192.168.150.141 linux-node1 192.168.150.142 linux-node2 192.168.150.143 linux-node3 192.168.150.144 linux-node4
- 关闭SELinux和防火墙以及NetworkManager
systemctl disable --now firewalld NetworkManager setenforce 0 sed -ri '/^[^#]*SELINUX=/s#=.+$#=disabled#' /etc/selinux/config
- 设置时间同步客户端
yum install chrony -y cat <<EOF > /etc/chrony.conf server ntp.aliyun.com iburst stratumweight 0 driftfile /var/lib/chrony/drift rtcsync makestep 10 3 bindcmdaddress 127.0.0.1 bindcmdaddress ::1 keyfile /etc/chrony.keys commandkey 1 generatecommandkey logchange 0.5 logdir /var/log/chrony EOF systemctl restart chronyd systemctl enable --now chronyd
- 升级内核
#因为市面上包管理下内核版本过低,安装docker后无论centos还是ubuntu会有如下bug,4.15的内核依然存在 kernel:unregister_netdevice: waiting for lo to become free. Usage count = 1 #安装必要软件包 yum install wget git jq psmisc vim perl -y #升级内核需要使用 elrepo 的yum 源,首先我们导入 elrepo 的 key并安装 elrepo 源 rpm --import https://www.elrepo.org/RPM-GPG-KEY-elrepo.org rpm -Uvh http://www.elrepo.org/elrepo-release-7.0-2.el7.elrepo.noarch.rpm #查看可用内核 yum --disablerepo="*" --enablerepo="elrepo-kernel" list available --showduplicates #自选版本内核安装方法 export Kernel_Vsersion=4.18.16-1 wget http://mirror.rc.usf.edu/compute_lock/elrepo/kernel/el7/x86_64/RPMS/kernel-ml{,-devel}-${Kernel_Vsersion}.el7.elrepo.x86_64.rpm yum localinstall -y kernel-ml* #查看这个内核里是否有这个内核模块 find /lib/modules -name '*nf_conntrack_ipv4*' -type f #修改内核启动顺序,默认启动的顺序应该为1,升级以后内核是往前面插入,为0(如果每次启动时需要手动选择哪个内核,该步骤可以省略) grub2-set-default 0 && grub2-mkconfig -o /etc/grub2.cfg #使用下面命令看看确认下是否启动默认内核指向上面安装的内核 grubby --default-kernel #docker官方的内核检查脚本建议(RHEL7/CentOS7: User namespaces disabled; add 'user_namespace.enable=1' to boot command line),使用下面命令开启 grubby --args="user_namespace.enable=1" --update-kernel="$(grubby --default-kernel)" #重新加载内核 reboot
- 设置IPVS模块所需加载的模块(所有机器)
$ :> /etc/modules-load.d/ipvs.conf $ module=( ip_vs ip_vs_lc ip_vs_wlc ip_vs_rr ip_vs_wrr ip_vs_lblc ip_vs_lblcr ip_vs_dh ip_vs_sh ip_vs_fo ip_vs_nq ip_vs_sed ip_vs_ftp ) $ for kernel_module in ${module[@]};do /sbin/modinfo -F filename $kernel_module |& grep -qv ERROR && echo $kernel_module >> /etc/modules-load.d/ipvs.conf || : done $ systemctl enable --now systemd-modules-load.service
- 需要设定/etc/sysctl.d/k8s.conf的系统参数
$ cat <<EOF > /etc/sysctl.d/k8s.conf # https://github.com/moby/moby/issues/31208 # ipvsadm -l --timout # 修复ipvs模式下长连接timeout问题 小于900即可 net.ipv4.tcp_keepalive_time = 600 net.ipv4.tcp_keepalive_intvl = 30 net.ipv4.tcp_keepalive_probes = 10 net.ipv6.conf.all.disable_ipv6 = 1 net.ipv6.conf.default.disable_ipv6 = 1 net.ipv6.conf.lo.disable_ipv6 = 1 net.ipv4.neigh.default.gc_stale_time = 120 net.ipv4.conf.all.rp_filter = 0 net.ipv4.conf.default.rp_filter = 0 net.ipv4.conf.default.arp_announce = 2 net.ipv4.conf.lo.arp_announce = 2 net.ipv4.conf.all.arp_announce = 2 net.ipv4.ip_forward = 1 net.ipv4.tcp_max_tw_buckets = 5000 net.ipv4.tcp_syncookies = 1 net.ipv4.tcp_max_syn_backlog = 1024 net.ipv4.tcp_synack_retries = 2 net.bridge.bridge-nf-call-ip6tables = 1 net.bridge.bridge-nf-call-iptables = 1 net.netfilter.nf_conntrack_max = 2310720 fs.inotify.max_user_watches=89100 fs.may_detach_mounts = 1 fs.file-max = 52706963 fs.nr_open = 52706963 net.bridge.bridge-nf-call-arptables = 1 vm.swappiness = 0 vm.overcommit_memory=1 vm.panic_on_oom=0 EOF $ sysctl --system
- 以上条件必须严格检查,否则,一定不会部署成功!
1.设置部署节点到其它所有节点的SSH免密码登录(包括本机)
[root@linux-node1 ~]# ssh-keygen -t rsa [root@linux-node1 ~]# ssh-copy-id linux-node1 [root@linux-node1 ~]# ssh-copy-id linux-node2 [root@linux-node1 ~]# ssh-copy-id linux-node3 [root@linux-node1 ~]# ssh-copy-id linux-node4 [root@linux-node1 ~]# scp /etc/hosts linux-node2:/etc/ [root@linux-node1 ~]# scp /etc/hosts linux-node3:/etc/ [root@linux-node1 ~]# scp /etc/hosts linux-node4:/etc/
2.安装Salt-SSH并克隆本项目代码。
2.1 安装Salt SSH(注意:老版本的Salt SSH不支持Roster定义Grains,需要2017.7.4以上版本)
[root@linux-node1 ~]# yum install -y https://mirrors.aliyun.com/saltstack/yum/redhat/salt-repo-latest-2.el7.noarch.rpm [root@linux-node1 ~]# sed -i "s/repo.saltstack.com/mirrors.aliyun.com\/saltstack/g" /etc/yum.repos.d/salt-latest.repo [root@linux-node1 ~]# yum install -y salt-ssh git unzip
2.2 获取本项目代码,并放置在 /srv
目录
[root@linux-node1 ~]# git clone https://github.com/skymyyang/salt-k8s-ha.git [root@linux-node1 ~]# cd salt-k8s-ha/ [root@linux-node1 ~]# mv * /srv/ [root@linux-node1 srv]# /bin/cp /srv/roster /etc/salt/roster [root@linux-node1 srv]# /bin/cp /srv/master /etc/salt/master
2.3 下载二进制文件,也可以自行官方下载,为了方便国内用户访问,请在百度云盘下载,下载k8s-v1.12.5-auto.zip。 下载完成后,将 files 目录移动到 /srv/salt/k8s/
目录下,并解压 Kubernetes二进制文件下载地址: https://pan.baidu.com/s/1Ag2ocpVmkg-uEoV13A7HFw
[root@linux-node1 ~]# cd /srv/salt/k8s/ [root@linux-node1 k8s]# unzip k8s-v1.12.5-auto.zip [root@linux-node1 k8s]# rm -f k8s-v1.12.5-auto.zip [root@linux-node1 k8s]# ls -l files/ total 0 drwxr-xr-x 2 root root 94 Jan 18 19:19 cfssl-1.2 drwxr-xr-x 2 root root 195 Jan 18 19:19 cni-plugins-amd64-v0.7.4 drwxr-xr-x 3 root root 123 Jan 18 19:19 etcd-v3.3.10-linux-amd64 drwxr-xr-x 2 root root 47 Jan 18 19:19 flannel-v0.10.0-linux-amd64 drwxr-xr-x 3 root root 17 Jan 18 19:19 k8s-v1.12.5
3.Salt SSH管理的机器以及角色分配
- k8s-role: 用来设置K8S的角色
- etcd-role: 用来设置etcd的角色,如果只需要部署一个etcd,只需要在一台机器上设置即可
- etcd-name: 如果对一台机器设置了etcd-role就必须设置etcd-name
[root@linux-node1 ~]# vim /etc/salt/roster linux-node1: host: 192.168.150.141 user: root priv: /root/.ssh/id_rsa minion_opts: grains: k8s-role: master etcd-role: node etcd-name: etcd-node1 linux-node2: host: 192.168.150.142 user: root priv: /root/.ssh/id_rsa minion_opts: grains: k8s-role: master etcd-role: node etcd-name: etcd-node2 linux-node3: host: 192.168.150.143 user: root priv: /root/.ssh/id_rsa minion_opts: grains: k8s-role: master etcd-role: node etcd-name: etcd-node3 linux-node4: host: 192.168.150.144 user: root priv: /root/.ssh/id_rsa minion_opts: grains: k8s-role: node
4.修改对应的配置参数,本项目使用Salt Pillar保存配置
[root@k8s-m1 ~]# vim /srv/pillar/k8s.sls #设置Master的IP地址(必须修改) MASTER_IP_M1: "192.168.150.141" MASTER_IP_M2: "192.168.150.142" MASTER_IP_M3: "192.168.150.143" #设置Master的HOSTNAME完整的FQDN名称(必须修改) MASTER_H1: "linux-node1" MASTER_H2: "linux-node2" MASTER_H3: "linux-node3" #设置ETCD集群访问地址(必须修改) ETCD_ENDPOINTS: "https://192.168.150.141:2379,https://192.168.150.142:2379,https://192.168.150.143:2379" FLANNEL_ETCD_PREFIX: "/kubernetes/network" #设置ETCD集群初始化列表(必须修改) ETCD_CLUSTER: "etcd-node1=https://192.168.150.141:2380,etcd-node2=https://192.168.150.142:2380,etcd-node3=https://192.168.150.143:2380" #通过Grains FQDN自动获取本机IP地址,请注意保证主机名解析到本机IP地址 NODE_IP: {{ grains['fqdn_ip4'][0] }} HOST_NAME: {{ grains['fqdn'] }} #设置BOOTSTARP的TOKEN,可以自己生成 BOOTSTRAP_TOKEN: "be8dad.da8a699a46edc482" TOKEN_ID: "be8dad" TOKEN_SECRET: "da8a699a46edc482" ENCRYPTION_KEY: "8eVtmpUpYjMvH8wKZtKCwQPqYRqM14yvtXPLJdhu0gA=" #配置Service IP地址段 SERVICE_CIDR: "10.1.0.0/16" #Kubernetes服务 IP (从 SERVICE_CIDR 中预分配) CLUSTER_KUBERNETES_SVC_IP: "10.1.0.1" #Kubernetes DNS 服务 IP (从 SERVICE_CIDR 中预分配) CLUSTER_DNS_SVC_IP: "10.1.0.2" #设置Node Port的端口范围 NODE_PORT_RANGE: "20000-40000" #设置POD的IP地址段 POD_CIDR: "10.2.0.0/16" #设置集群的DNS域名 CLUSTER_DNS_DOMAIN: "cluster.local." #设置Docker Registry地址 #DOCKER_REGISTRY: "https://192.168.150.135:5000" #设置Master的VIP地址(必须修改) MASTER_VIP: "192.168.150.253" #设置网卡名称 VIP_IF: "ens32"
5.执行SaltStack状态
- 测试Salt SSH联通性
[root@k8s-m1 ~]# salt-ssh '*' test.ping
执行高级状态,会根据定义的角色再对应的机器部署对应的服务
- 部署Etcd,由于Etcd是基础组建,需要先部署,目标为部署etcd的节点
[root@linux-node1 ~]# salt-ssh -L 'linux-node1,linux-node2,linux-node3' state.sls k8s.etcd
- 部署K8S集群
[root@linux-node1 ~]# salt-ssh '*' state.highstate
由于包比较大,根据电脑硬件配置,这里执行时间较长,5分钟+,喝杯咖啡休息一下,如果执行有失败可以再次执行即可!
6.测试Kubernetes安装
#先验证etcd [root@linux-node1 ~]# source /etc/profile [root@linux-node1 ~]# etcdctl --endpoints=https://192.168.150.141:2379 \ --ca-file=/opt/kubernetes/ssl/ca.pem \ --cert-file=/opt/kubernetes/ssl/etcd.pem \ --key-file=/opt/kubernetes/ssl/etcd-key.pem cluster-health [root@linux-node1 ~]# etcdctl --endpoints=https://192.168.150.141:2379 \ --ca-file=/opt/kubernetes/ssl/ca.pem \ --cert-file=/opt/kubernetes/ssl/etcd.pem \ --key-file=/opt/kubernetes/ssl/etcd-key.pem member list [root@linux-node1 ~]# kubectl get cs NAME STATUS MESSAGE ERROR controller-manager Healthy ok scheduler Healthy ok etcd-2 Healthy {"health":"true"} etcd-1 Healthy {"health":"true"} etcd-0 Healthy {"health":"true"} [root@k8s-m1 ~]# kubectl get node NAME STATUS ROLES AGE VERSION linux-node1 Ready master 14m v1.12.5 linux-node2 Ready master 24m v1.12.5 linux-node3 Ready master 24m v1.12.5 linux-node4 Ready node 30m v1.12.5
7.测试Kubernetes集群和Flannel网络
[root@linux-node1 ~]# kubectl create deployment nginx --image=nginx:alpine deployment.apps/nginx created 需要等待拉取镜像,可能稍有的慢,请等待。 [root@linux-node1 ~]# kubectl get pod NAME READY STATUS RESTARTS AGE nginx-54458cd494-8fj47 1/1 Running 0 13s [root@linux-node1 ~]# kubectl get pod -o wide NAME READY STATUS RESTARTS AGE IP NODE NOMINATED NODE READINESS GATES nginx-54458cd494-8fj47 1/1 Running 0 111s 10.2.70.3 linux-node1 <none> <none> 测试联通性 [root@linux-node1 ~]# ping -c 1 10.2.70.3 PING 10.2.69.2 (10.2.69.2) 56(84) bytes of data. 64 bytes from 10.2.69.2: icmp_seq=1 ttl=61 time=2.02 ms --- 10.2.69.2 ping statistics --- 1 packets transmitted, 1 received, 0% packet loss, time 0ms rtt min/avg/max/mdev = 2.028/2.028/2.028/0.000 ms [root@linux-node1 ~]# curl --head http://10.2.70.3 HTTP/1.1 200 OK Server: nginx/1.15.8 Date: Wed, 27 Feb 2019 09:52:48 GMT Content-Type: text/html Content-Length: 612 Last-Modified: Thu, 31 Jan 2019 23:32:11 GMT Connection: keep-alive ETag: "5c53857b-264" Accept-Ranges: bytes 测试扩容,将Nginx应用的Pod副本数量拓展到2个节点 [root@linux-node1 ~]# kubectl scale deployment nginx --replicas=2 deployment.extensions/nginx scaled [root@linux-node1 ~]# kubectl get pod NAME READY STATUS RESTARTS AGE nginx-54458cd494-8fj47 1/1 Running 0 5m4s nginx-54458cd494-qzhpf 1/1 Running 0 17s
8.如何新增Kubernetes节点
- 设置SSH无密码登录,并且在
/etc/hosts
中继续增加对应的解析。确保所有节点都能解析 - 在
/etc/salt/roster
里面,增加对应的机器 - 执行SaltStack状态
salt-ssh '*' state.highstate
[root@linux-node5 ~]# vim /etc/salt/roster linux-node5: host: 192.168.150.145 user: root priv: /root/.ssh/id_rsa minion_opts: grains: k8s-role: node [root@linux-node1 ~]# salt-ssh 'linux-node5' state.highstate
9.下一步要做什么?
你可以安装Kubernetes必备的插件。如何安装必备的插件。请参考该项目的原地址。