基于Containerd安装部署高可用Kubernetes集群

转载自:https://blog.weiyigeek.top/2021/7-30-623.html

简述

Kubernetes(后续简称k8s)是 Google(2014年6月) 开源的一个容器编排引擎,使用Go语言开发,它支持自动化部署、大规模可伸缩、以及云平台中多个主机上的容器化应用进行管理。其目标是让部署容器化的应用更加简单并且高效,提供了资源调度、部署管理、服务发现、扩容缩容、状态 监控、维护等一整套功能, 努力成为跨主机集群的自动化部署、自动化扩展以及运行应用程序容器的平台,它支持一些列CNCF毕业项目,包括 Containerd、calico 等 。

环境准备

主机规划

温馨提示: 此处使用的是 Ubuntu 20.04 操作系统, 该系统已做安全加固和内核优化符合等保2.0要求【SecOpsDev/Ubuntu-InitializeSecurity.sh at master · WeiyiGeek/SecOpsDev (github.com)】, 如你的Linux未进行相应配置环境可能与读者有些许差异, 如需要进行(windows server 、Ubuntu、CentOS)安全加固请参照如下加固脚本进行加固,

加固脚本地址:【 https://github.com/WeiyiGeek/SecOpsDev/blob/master/OS-操作系统/Linux/Ubuntu/Ubuntu-InitializeSecurity.sh

加固脚本内容如下:

#!/bin/bash
# @Author: WeiyiGeek
# @Description: Ubuntu TLS Security Initiate
# @Create Time:  2019年9月1日 16:43:33
# @Last Modified time: 2021-11-15 11:06:31
# @E-mail: master@weiyigeek.top
# @Blog: https://www.weiyigeek.top
# @wechat: WeiyiGeeker
# @Github: https://github.com/WeiyiGeek/SecOpsDev/tree/master/OS-操作系统/Linux/
# @Version: 3.3
#-------------------------------------------------#
# 脚本主要功能说明:
# (1) Ubuntu 系统初始化操作包括IP地址设置、基础软件包更新以及安装加固。
# (2) Ubuntu 系统容器以及JDK相关环境安装。
# (3) Ubuntu 系统中异常错误日志解决。
# (4) Ubuntu 系统常规服务安装配置,加入数据备份目录。
# (5) Ubuntu 脚本错误优化、添加禁用cloud-init
#-------------------------------------------------#

## 系统全局变量定义
HOSTNAME=Ubuntu-Security-Template
IP=192.168.1.2
GATEWAY=192.168.1.1
DNSIP=("223.5.5.5" "223.6.6.6")
SSHPORT=20211
DefaultUser="WeiyiGeek"  # 系统创建的用户名称非root用户
ROOTPASS=WeiyiGeek       # 密码建议12位以上且包含数字、大小写字母以及特殊字符。
APPPASS=WeiyiGeek

## 名称: err 、info 、warning
## 用途:全局Log信息打印函数
## 参数: $@
log::err() {
  printf "[$(date +'%Y-%m-%dT%H:%M:%S')]: \033[31mERROR: $@ \033[0m\n"
}
log::info() {
  printf "[$(date +'%Y-%m-%dT%H:%M:%S')]: \033[32mINFO: $@ \033[0m\n"
}
log::warning() {
  printf "[$(date +'%Y-%m-%dT%H:%M:%S')]: \033[33mWARNING: $@ \033[0m\n"
}


## 名称: os::Network
## 用途: 网络配置相关操作脚本包括(IP地址修改)
## 参数: 无
os::Network () {
  log::info "[-] 操作系统网络配置相关脚本,开始执行....."

# (1) IP地址与主机名称设置
sudo cp /etc/netplan/00-installer-config.yaml{,.bak}
mkdir /opt/init/
sudo tee /opt/init/network.sh <<'EOF'
#!/bin/bash
CURRENT_IP=$(hostname -I | cut -f 1 -d " ")
GATEWAY=$(hostname -I | cut -f 1,2,3 -d ".")
if [[ $# -lt 3 ]];then
  echo "Usage: $0 IP Gateway Hostname"
  exit
fi
echo "IP:${1} # GATEWAY:${2} # HOSTNAME:${3}"
sudo sed -i "s#${CURRENT_IP}#${1}#g" /etc/netplan/00-installer-config.yaml
sudo sed -i "s#${GATEWAY}.1#${2}#g" /etc/netplan/00-installer-config.yaml
sudo hostnamectl set-hostname ${3} 
sudo netplan apply
EOF
sudo chmod +x /opt/init/network.sh

# (2) 本地主机名解析设置
sed -i "s/127.0.1.1\s.\w.*$/127.0.1.1 ${HOSTNAME}/g" /etc/hosts
grep -q "^\$(hostname -I)\s.\w.*$" /etc/hosts && sed -i "s/\$(hostname -I)\s.\w.*$/${IPADDR} ${HOSTNAME}" /etc/hosts || echo "${IPADDR} ${HOSTNAME}" >> /etc/hosts

# (3) 系统DNS域名解析服务设置
cp -a /etc/resolv.conf{,.bak}
for dns in ${DNSIP[@]};do echo "nameserver ${dns}" >> /etc/resolv.conf;done

sudo /opt/init/network.sh ${IP} ${GATEWAY} ${HOSTNAME}
log::info "[*] network configure modifiy successful! restarting Network........."
}


## 名称: os::Software
## 用途: 操作系统软件包管理及更新源配置
## 参数: 无
os::Software () {
  log::info "[-] 操作系统软件包管理及更新源配置相关脚本,开始执行....."

# (1) 卸载多余软件,例如 snap 软件及其服务
sudo systemctl stop snapd snapd.socket #停止snapd相关的进程服务
sudo apt autoremove --purge -y snapd
sudo systemctl daemon-reload
sudo rm -rf ~/snap /snap /var/snap /var/lib/snapd /var/cache/snapd /run/snapd

# (2) 软件源设置与系统更新
sudo cp /etc/apt/sources.list{,.bak}
sudo tee /etc/apt/sources.list <<'EOF'
#阿里云Mirrors - Ubuntu
deb http://mirrors.aliyun.com/ubuntu/ focal main restricted universe multiverse
deb-src http://mirrors.aliyun.com/ubuntu/ focal main restricted universe multiverse
deb http://mirrors.aliyun.com/ubuntu/ focal-security main restricted universe multiverse
deb-src http://mirrors.aliyun.com/ubuntu/ focal-security main restricted universe multiverse
deb http://mirrors.aliyun.com/ubuntu/ focal-updates main restricted universe multiverse
deb-src http://mirrors.aliyun.com/ubuntu/ focal-updates main restricted universe multiverse
deb http://mirrors.aliyun.com/ubuntu/ focal-proposed main restricted universe multiverse
deb-src http://mirrors.aliyun.com/ubuntu/ focal-proposed main restricted universe multiverse
deb http://mirrors.aliyun.com/ubuntu/ focal-backports main restricted universe multiverse
deb-src http://mirrors.aliyun.com/ubuntu/ focal-backports main restricted universe multiverse
EOF

# (3) 内核版本升级以及常规软件安装
sudo apt autoclean && sudo apt update && sudo apt upgrade -y
sudo apt install -y nano vim git unzip wget ntpdate dos2unix net-tools tree htop ncdu nload sysstat psmisc bash-completion fail2ban  gcc g++ make jq nfs-common rpcbind libpam-cracklib

# (4) 代理方式进行更新
# sudo apt autoclean && sudo apt -o Acquire::http::proxy="http://proxy.weiyigeek.top/" update && sudo apt -o Acquire::http::proxy="http://proxy.weiyigeek.top" upgrade -y
# sudo apt install -o Acquire::http::proxy="http://proxy.weiyigeek.top/" -y nano vim git unzip wget ntpdate dos2unix net-tools tree htop ncdu nload sysstat psmisc bash-completion fail2ban
}


## 名称: os::TimedataZone
## 用途: 操作系统时间与时区同步配置
## 参数: 无
os::TimedataZone () {
  log::info "[*] 操作系统系统时间时区配置相关脚本,开始执行....."

# (1) 时间同步服务端容器(可选也可以用外部ntp服务器) : docker run -d --rm --cap-add SYS_TIME -e ALLOW_CIDR=0.0.0.0/0 -p 123:123/udp geoffh1977/chrony
echo "同步前的时间: $(date -R)"

# 方式1.Chrony 客户端配置
apt install -y chrony
grep -q "192.168.12.254" /etc/chrony/chrony.conf || sudo tee -a /etc/chrony/chrony.conf <<'EOF'
pool 192.168.4.254 iburst maxsources 1
pool 192.168.10.254 iburst maxsources 1
pool 192.168.12.254 iburst maxsources 1
pool ntp.aliyun.com iburst maxsources 4
keyfile /etc/chrony/chrony.keys
driftfile /var/lib/chrony/chrony.drift
logdir /var/log/chrony
maxupdateskew 100.0
rtcsync
# 允许跳跃式校时 如果在前 3 次校时中时间差大于 1.0s
makestep 1 3
EOF
systemctl enable chrony && systemctl restart chrony && systemctl status chrony -l

# 方式2
# sudo ntpdate 192.168.10.254 || sudo ntpdate 192.168.12.215 || sudo ntpdate ntp1.aliyun.com

# 方式3
# echo 'NTP=192.168.10.254 192.168.4.254' >> /etc/systemd/timesyncd.conf
# echo 'FallbackNTP=ntp.aliyun.com' >> /etc/systemd/timesyncd.conf
# systemctl restart systemd-timesyncd.service

# (2) 时区与地区设置: 
sudo cp /usr/share/zoneinfo/Asia/Shanghai /etc/localtime
sudo timedatectl set-timezone Asia/Shanghai
# sudo dpkg-reconfigure tzdata  # 修改确认
# sudo bash -c "echo 'Asia/Shanghai' > /etc/timezone" # 与上一条命令一样
# 将当前的 UTC 时间写入硬件时钟 (硬件时间默认为UTC)
sudo timedatectl set-local-rtc 0
# 启用NTP时间同步:
sudo timedatectl set-ntp yes
# 校准时间服务器-时间同步(推荐使用chronyc进行平滑同步)
sudo chronyc tracking
# 手动校准-强制更新时间
# chronyc -a makestep
# 系统时钟同步硬件时钟
# sudo hwclock --systohc
sudo hwclock -w

# (3) 重启依赖于系统时间的服务
sudo systemctl restart rsyslog.service cron.service
log::info "[*] Tie confmigure modifiy successful! restarting chronyd rsyslog.service crond.service........."
timedatectl
}


## 名称: os::Security
## 用途: 操作系统安全加固配置脚本(符合等保要求-三级要求)
## 参数: 无
os::Security () {
  log::info "正在进行->操作系统安全加固(符合等保要求-三级要求)配置"

# (0) 系统用户核查配置
  log::info "[-] 锁定或者删除多余的系统账户以及创建低权限用户"
userdel -r lxd
groupdel lxd
defaultuser=(root daemon bin sys games man lp mail news uucp proxy www-data backup list irc gnats nobody systemd-network systemd-resolve systemd-timesync messagebus syslog _apt tss uuidd tcpdump landscape pollinate usbmux sshd systemd-coredump _chrony)
for i in $(cat /etc/passwd | cut -d ":" -f 1,7);do
  flag=0; name=${i%%:*}; terminal=${i##*:}
  if [[ "${terminal}" == "/bin/bash" || "${terminal}" == "/bin/sh" ]];then
    log::warning "${i} 用户,shell终端为 /bin/bash 或者 /bin/sh"
  fi
  for j in ${defaultuser[@]};do
    if [[ "${name}" == "${j}" ]];then
      flag=1
      break;
    fi
  done
  if [[ $flag -eq 0 ]];then
    log::warning "${i} 非默认用户"
  fi
done
cp /etc/shadow /etc/shadow-`date +%Y%m%d`.bak
passwd -l adm&>/dev/null 2&>/dev/null; passwd -l daemon&>/dev/null 2&>/dev/null; passwd -l bin&>/dev/null 2&>/dev/null; passwd -l sys&>/dev/null 2&>/dev/null; passwd -l lp&>/dev/null 2&>/dev/null; passwd -l uucp&>/dev/null 2&>/dev/null; passwd -l nuucp&>/dev/null 2&>/dev/null; passwd -l smmsplp&>/dev/null 2&>/dev/null; passwd -l mail&>/dev/null 2&>/dev/null; passwd -l operator&>/dev/null 2&>/dev/null; passwd -l games&>/dev/null 2&>/dev/null; passwd -l gopher&>/dev/null 2&>/dev/null; passwd -l ftp&>/dev/null 2&>/dev/null; passwd -l nobody&>/dev/null 2&>/dev/null; passwd -l nobody4&>/dev/null 2&>/dev/null; passwd -l noaccess&>/dev/null 2&>/dev/null; passwd -l listen&>/dev/null 2&>/dev/null; passwd -l webservd&>/dev/null 2&>/dev/null; passwd -l rpm&>/dev/null 2&>/dev/null; passwd -l dbus&>/dev/null 2&>/dev/null; passwd -l avahi&>/dev/null 2&>/dev/null; passwd -l mailnull&>/dev/null 2&>/dev/null; passwd -l nscd&>/dev/null 2&>/dev/null; passwd -l vcsa&>/dev/null 2&>/dev/null; passwd -l rpc&>/dev/null 2&>/dev/null; passwd -l rpcuser&>/dev/null 2&>/dev/null; passwd -l nfs&>/dev/null 2&>/dev/null; passwd -l sshd&>/dev/null 2&>/dev/null; passwd -l pcap&>/dev/null 2&>/dev/null; passwd -l ntp&>/dev/null 2&>/dev/null; passwd -l haldaemon&>/dev/null 2&>/dev/null; passwd -l distcache&>/dev/null 2&>/dev/null; passwd -l webalizer&>/dev/null 2&>/dev/null; passwd -l squid&>/dev/null 2&>/dev/null; passwd -l xfs&>/dev/null 2&>/dev/null; passwd -l gdm&>/dev/null 2&>/dev/null; passwd -l sabayon&>/dev/null 2&>/dev/null; passwd -l named&>/dev/null 2&>/dev/null

# (2) 用户密码设置和口令策略设置
  log::info "[-]  配置满足策略的root管理员密码 "
echo  ${ROOTPASS} | passwd --stdin root

log::info "[-] 配置满足策略的app普通用户密码(根据需求配置)"
groupadd application
useradd -m -s /bin/bash -c "application primary user" -g application app 
echo ${APPPASS} | passwd --stdin app

  log::info "[-] 强制用户在下次登录时更改密码 "
chage -d 0 -m 0 -M 90 -W 15 root && passwd --expire root 
chage -d 0 -m 0 -M 90 -W 15 ${DefaultUser} && passwd --expire ${DefaultUser} 
chage -d 0 -m 0 -M 90 -W 15 app && passwd --expire app

  log::info "[-] 用户口令复杂性策略设置 (密码过期周期0~90、到期前15天提示、密码长度至少15、复杂度设置至少有一个大小写、数字、特殊字符、密码三次不能一样、尝试次数为三次)"
egrep -q "^\s*PASS_MIN_DAYS\s+\S*(\s*#.*)?\s*$" /etc/login.defs && sed -ri "s/^(\s*)PASS_MIN_DAYS\s+\S*(\s*#.*)?\s*$/\PASS_MIN_DAYS  0/" /etc/login.defs || echo "PASS_MIN_DAYS  0" >> /etc/login.defs
egrep -q "^\s*PASS_MAX_DAYS\s+\S*(\s*#.*)?\s*$" /etc/login.defs && sed -ri "s/^(\s*)PASS_MAX_DAYS\s+\S*(\s*#.*)?\s*$/\PASS_MAX_DAYS  90/" /etc/login.defs || echo "PASS_MAX_DAYS  90" >> /etc/login.defs
egrep -q "^\s*PASS_WARN_AGE\s+\S*(\s*#.*)?\s*$" /etc/login.defs && sed -ri "s/^(\s*)PASS_WARN_AGE\s+\S*(\s*#.*)?\s*$/\PASS_WARN_AGE  15/" /etc/login.defs || echo "PASS_WARN_AGE  15" >> /etc/login.defs
egrep -q "^\s*PASS_MIN_LEN\s+\S*(\s*#.*)?\s*$" /etc/login.defs && sed -ri "s/^(\s*)PASS_MIN_LEN\s+\S*(\s*#.*)?\s*$/\PASS_MIN_LEN  15/" /etc/login.defs || echo "PASS_MIN_LEN  15" >> /etc/login.defs

egrep -q "^password\s.+pam_cracklib.so\s+\w+.*$" /etc/pam.d/common-password && sed -ri '/^password\s.+pam_cracklib.so/{s/pam_cracklib.so\s+\w+.*$/pam_cracklib.so retry=3 minlen=15 ucredit=-1 lcredit=-1 dcredit=-1 ocredit=-1 difok=1/g;}' /etc/pam.d/common-password
egrep -q "^password\s.+pam_unix.so\s+\w+.*$" /etc/pam.d/common-password && sed -ri '/^password\s.+pam_unix.so/{s/pam_unix.so\s+\w+.*$/pam_unix.so obscure use_authtok try_first_pass sha512 remember=3/g;}' /etc/pam.d/common-password

  log::info "[-] 存储用户密码的文件,其内容经过sha512加密,所以非常注意其权限"
touch /etc/security/opasswd && chown root:root /etc/security/opasswd && chmod 600 /etc/security/opasswd 


# (3) 用户sudo权限以及重要目录和文件的权限设置
  log::info "[-] 用户sudo权限以及重要目录和文件的新建默认权限设置"
# 如uBuntu安装时您创建的用户 WeiyiGeek 防止直接通过 sudo passwd 修改root密码(此时必须要求输入WeiyiGeek密码后才可修改root密码)
# Tips: Sudo允许授权用户权限以另一个用户(通常是root用户)的身份运行程序, 
# DefaultUser="weiyigeek"
sed -i "/# Members of the admin/i ${DefaultUser} ALL=(ALL) PASSWD:ALL" /etc/sudoers


  log::info "[-] 配置用户 umask 为022 "
egrep -q "^\s*umask\s+\w+.*$" /etc/profile && sed -ri "s/^\s*umask\s+\w+.*$/umask 022/" /etc/profile || echo "umask 022" >> /etc/profile
egrep -q "^\s*umask\s+\w+.*$" /etc/bash.bashrc && sed -ri "s/^\s*umask\s+\w+.*$/umask 022/" /etc/bashrc || echo "umask 022" >> /etc/bash.bashrc
# log::info "[-] 设置用户目录创建默认权限, (初始为077比较严格),在设置 umask 为022 及 777 - 022 "
# egrep -q "^\s*(umask|UMASK)\s+\w+.*$" /etc/login.defs && sed -ri "s/^\s*(umask|UMASK)\s+\w+.*$/UMASK 022/" /etc/login.defs || echo "UMASK 022" >> /etc/login.defs

  log::info "[-] 设置或恢复重要目录和文件的权限"
chmod 755 /etc; 
chmod 777 /tmp; 
chmod 700 /etc/inetd.conf&>/dev/null 2&>/dev/null; 
chmod 755 /etc/passwd; 
chmod 755 /etc/shadow; 
chmod 644 /etc/group; 
chmod 755 /etc/security; 
chmod 644 /etc/services; 
chmod 750 /etc/rc*.d
chmod 600 ~/.ssh/authorized_keys

  log::info "[-] 删除潜在威胁文件 "
find / -maxdepth 3 -name hosts.equiv | xargs rm -rf
find / -maxdepth 3 -name .netrc | xargs rm -rf
find / -maxdepth 3 -name .rhosts | xargs rm -rf


# (4) SSHD 服务安全加固设置
log::info "[-] sshd 服务安全加固设置"
# 严格模式
sudo egrep -q "^\s*StrictModes\s+.+$" /etc/ssh/sshd_config && sed -ri "s/^(#)?\s*StrictModes\s+.+$/StrictModes yes/" /etc/ssh/sshd_config || echo "StrictModes yes" >> /etc/ssh/sshd_config
# 监听端口更改
if [ -e ${SSHPORT} ];then export SSHPORT=20211;fi
sudo egrep -q "^\s*Port\s+.+$" /etc/ssh/sshd_config && sed -ri "s/^(#)?\s*Port\s+.+$/Port ${SSHPORT}/" /etc/ssh/sshd_config || echo "Port ${SSHPORT}" >> /etc/ssh/sshd_config
# 禁用X11转发以及端口转发
sudo egrep -q "^\s*X11Forwarding\s+.+$" /etc/ssh/sshd_config && sed -ri "s/^(#)?\s*X11Forwarding\s+.+$/X11Forwarding no/" /etc/ssh/sshd_config || echo "X11Forwarding no" >> /etc/ssh/sshd_config
sudo egrep -q "^\s*X11UseLocalhost\s+.+$" /etc/ssh/sshd_config && sed -ri "s/^(#)?\s*X11UseLocalhost\s+.+$/X11UseLocalhost yes/" /etc/ssh/sshd_config || echo "X11UseLocalhost yes" >> /etc/ssh/sshd_config
sudo egrep -q "^\s*AllowTcpForwarding\s+.+$" /etc/ssh/sshd_config && sed -ri "s/^(#)?\s*AllowTcpForwarding\s+.+$/AllowTcpForwarding no/" /etc/ssh/sshd_config || echo "AllowTcpForwarding no" >> /etc/ssh/sshd_config
sudo egrep -q "^\s*AllowAgentForwarding\s+.+$" /etc/ssh/sshd_config && sed -ri "s/^(#)?\s*AllowAgentForwarding\s+.+$/AllowAgentForwarding no/" /etc/ssh/sshd_config || echo "AllowAgentForwarding no" >> /etc/ssh/sshd_config
# 关闭禁用用户的 .rhosts 文件  ~/.ssh/.rhosts 来做为认证: 缺省IgnoreRhosts yes 
egrep -q "^(#)?\s*IgnoreRhosts\s+.+$" /etc/ssh/sshd_config && sed -ri "s/^(#)?\s*IgnoreRhosts\s+.+$/IgnoreRhosts yes/" /etc/ssh/sshd_config || echo "IgnoreRhosts yes" >> /etc/ssh/sshd_config
# 禁止root远程登录(推荐配置-根据需求配置)
egrep -q "^\s*PermitRootLogin\s+.+$" /etc/ssh/sshd_config && sed -ri "s/^\s*PermitRootLogin\s+.+$/PermitRootLogin no/" /etc/ssh/sshd_config || echo "PermitRootLogin no" >> /etc/ssh/sshd_config
# 登陆前后欢迎提示设置
egrep -q "^\s*(banner|Banner)\s+\W+.*$" /etc/ssh/sshd_config && sed -ri "s/^\s*(banner|Banner)\s+\W+.*$/Banner \/etc\/issue/" /etc/ssh/sshd_config || \
echo "Banner /etc/issue" >> /etc/ssh/sshd_config
log::info "[-] 远程SSH登录前后提示警告Banner设置"
# SSH登录前警告Banner
sudo tee /etc/issue <<'EOF'
****************** [ 安全登陆 (Security Login) ] *****************
Authorized only. All activity will be monitored and reported.By Security Center.
Owner: WeiyiGeek, Site: https://www.weiyigeek.top
EOF
# SSH登录后提示Banner
sed -i '/^fi/a\\n\necho "\\e[1;37;41;5m################## 安全运维 (Security Operation) ####################\\e[0m"\necho "\\e[32mLogin success. Please execute the commands and operation data carefully.By WeiyiGeek.\\e[0m"' /etc/update-motd.d/00-header


# (5) 用户远程登录失败次数与终端超时设置 
  log::info "[-] 用户远程连续登录失败10次锁定帐号5分钟包括root账号"
sed -ri "/^\s*auth\s+required\s+pam_tally2.so\s+.+(\s*#.*)?\s*$/d" /etc/pam.d/sshd 
sed -ri '2a auth required pam_tally2.so deny=10 unlock_time=300 even_deny_root root_unlock_time=300' /etc/pam.d/sshd 
# 宿主机控制台登陆(可选)
# sed -ri "/^\s*auth\s+required\s+pam_tally2.so\s+.+(\s*#.*)?\s*$/d" /etc/pam.d/login
# sed -ri '2a auth required pam_tally2.so deny=5 unlock_time=300 even_deny_root root_unlock_time=300' /etc/pam.d/login

  log::info "[-] 设置登录超时时间为10分钟 "
egrep -q "^\s*(export|)\s*TMOUT\S\w+.*$" /etc/profile && sed -ri "s/^\s*(export|)\s*TMOUT.\S\w+.*$/export TMOUT=600\nreadonly TMOUT/" /etc/profile || echo -e "export TMOUT=600\nreadonly TMOUT" >> /etc/profile
egrep -q "^\s*.*ClientAliveInterval\s\w+.*$" /etc/ssh/sshd_config && sed -ri "s/^\s*.*ClientAliveInterval\s\w+.*$/ClientAliveInterval 600/" /etc/ssh/sshd_config || echo "ClientAliveInterval 600" >> /etc/ssh/sshd_config


# (5) 切换用户日志记录或者切换命令更改(可选)
  log::info "[-] 切换用户日志记录和切换命令更改名称为SU "
egrep -q "^(\s*)SULOG_FILE\s+\S*(\s*#.*)?\s*$" /etc/login.defs && sed -ri "s/^(\s*)SULOG_FILE\s+\S*(\s*#.*)?\s*$/\SULOG_FILE  \/var\/log\/.history\/sulog/" /etc/login.defs || echo "SULOG_FILE  /var/log/.history/sulog" >> /etc/login.defs
egrep -q "^\s*SU_NAME\s+\S*(\s*#.*)?\s*$" /etc/login.defs && sed -ri "s/^(\s*)SU_NAME\s+\S*(\s*#.*)?\s*$/\SU_NAME  SU/" /etc/login.defs || echo "SU_NAME  SU" >> /etc/login.defs
mkdir -vp /usr/local/bin /var/log/.backups /var/log/.history /var/log/.history/sulog
cp /usr/bin/su /var/log/.backups/su.bak
mv /usr/bin/su /usr/bin/SU
# 只能写入不能删除其目标目录中的文件
chmod -R 1777 /var/log/.history
chattr -R +a /var/log/.history 
chattr +a /var/log/.backups

# (6) 用户终端执行的历史命令记录
log::info "[-] 用户终端执行的历史命令记录 "
egrep -q "^HISTSIZE\W\w+.*$" /etc/profile && sed -ri "s/^HISTSIZE\W\w+.*$/HISTSIZE=101/" /etc/profile || echo "HISTSIZE=101" >> /etc/profile
# 方式1
sudo tee /etc/profile.d/history-record.sh <<'EOF'
# 历史命令执行记录文件路径
LOGTIME=$(date +%Y%m%d-%H-%M-%S)
export HISTFILE="/var/log/.history/${USER}.${LOGTIME}.history"
if [ ! -f ${HISTFILE} ];then
  touch ${HISTFILE}
fi
chmod 600 /var/log/.history/${USER}.${LOGTIME}.history
# 历史命令执行文件大小记录设置
HISTFILESIZE=128
HISTTIMEFORMAT="%F_%T $(whoami)#$(who -u am i 2>/dev/null| awk '{print $NF}'|sed -e 's/[()]//g'):"
EOF
# 方式2.未能成功(如后续有小伙伴成功了欢迎留言分享)
# sudo tee /usr/local/bin/history.sh <<'EOF'
# #!/bin/bash
# logfiletime=$(date +%Y%m%d-%H-%M-%S)
# # unalias "history"
# if [ $# -eq 0 ];then history;fi
# for i in $*;do
#   if [ "$i" = "-c" ];then 
#     mv ~/.bash_history > /var/log/.history/${logfiletime}.history
#     history -c
#     continue;
#   fi
# done
# alias history="source /usr/local/bin/history.sh"
# EOF


# (7) GRUB 安全设置 (需要手动设置请按照需求设置)
  log::info "[-] 系统 GRUB 安全设置(防止物理接触从grub菜单中修改密码) "
# Grub 关键文件备份
cp -a /etc/grub.d/00_header /var/log/.backups 
cp -a /etc/grub.d/10_linux /var/log/.backups 
# 设置Grub菜单界面显示时间
sed -i -e 's|GRUB_TIMEOUT_STYLE=hidden|#GRUB_TIMEOUT_STYLE=hidden|g' -e 's|GRUB_TIMEOUT=0|GRUB_TIMEOUT=3|g' /etc/default/grub
sed -i -e 's|set timeout_style=${style}|#set timeout_style=${style}|g' -e 's|set timeout=${timeout}|set timeout=3|g' /etc/grub.d/00_header
# 创建认证密码 (此处密码: WeiyiGeek)
sudo grub-mkpasswd-pbkdf2
# Enter password:
# Reenter password:
# PBKDF2 hash of your password is grub.pbkdf2.sha512.10000.21AC9CEF61B96972BF6F918D2037EFBEB8280001045ED32DFDDCC260591CC6BC8957CF25A6755904A7053E97940A9E4CD5C1EF833C1651C1BCF09D899BED4C7C.9691521F5BB34CD8AEFCED85F4B830A86EC93B61A31885BCBE3FEE927D54EFDEE69FA8B51DBC00FCBDB618D4082BC22B2B6BA4161C7E6B990C4E5CFC9E9748D7
# 设置认证用户以及password_pbkdf2认证
tee -a /etc/grub.d/00_header <<'END'
cat <<'EOF'
# GRUB Authentication
set superusers="grub"
password_pbkdf2 grub grub.pbkdf2.sha512.10000.21AC9CEF61B96972BF6F918D2037EFBEB8280001045ED32DFDDCC260591CC6BC8957CF25A6755904A7053E97940A9E4CD5C1EF833C1651C1BCF09D899BED4C7C.9691521F5BB34CD8AEFCED85F4B830A86EC93B61A31885BCBE3FEE927D54EFDEE69FA8B51DBC00FCBDB618D4082BC22B2B6BA4161C7E6B990C4E5CFC9E9748D7
EOF
END
# 设置进入正式系统不需要认证如进入单用户模式进行重置账号密码时需要进行认证。 (高敏感数据库系统不建议下述操作)
# 在191和193 分别加入--user=grub 和 --unrestricted
# 191       echo "menuentry --user=grub '$(echo "$title" | grub_quote)' ${CLASS} \$menuentry_id_option 'gnulinux-$version-$type-$boot_device_id' {" | sed "s/^/$submenu_indentation/"  # 如果按e进行menu菜单则需要用grub进行认证
# 192   else
# 193       echo "menuentry --unrestricted '$(echo "$os" | grub_quote)' ${CLASS} \$menuentry_id_option 'gnulinux-simple-$boot_device_id' {" | sed "s/^/$submenu_indentation/"          # 正常进入系统则不认证

sed -i '/echo "$title" | grub_quote/ { s/menuentry /menuentry --user=grub /;}' /etc/grub.d/10_linux
sed -i '/echo "$os" | grub_quote/ { s/menuentry /menuentry --unrestricted /;}' /etc/grub.d/10_linux

# Ubuntu 方式更新GRUB从而生成boot启动文件。
update-grub


# (8) 操作系统防火墙启用以及策略设置
  log::info "[-] 系统防火墙启用以及规则设置 "
systemctl enable ufw.service && systemctl start ufw.service && ufw enable
sudo ufw allow proto tcp to any port 20211
# 重启修改配置相关服务
systemctl restart sshd
}



## 名称: os::Operation 
## 用途: 操作系统安全运维设置
## 参数: 无
os::Operation () {
  log::info "[-] 操作系统安全运维设置相关脚本"

# (0) 禁用ctrl+alt+del组合键对系统重启 (必须要配置,我曾入过坑)
  log::info "[-] 禁用控制台ctrl+alt+del组合键重启"
mv /usr/lib/systemd/system/ctrl-alt-del.target /var/log/.backups/ctrl-alt-del.target-$(date +%Y%m%d).bak

# (1) 设置文件删除回收站别名
  log::info "[-] 设置文件删除回收站别名(防止误删文件) "
sudo tee /etc/profile.d/alias.sh <<'EOF'
# User specific aliases and functions
# 删除回收站
# find ~/.trash -delete
# 删除空目录
# find ~/.trash -type d -delete
alias rm="sh /usr/local/bin/remove.sh"
EOF
sudo tee /usr/local/bin/remove.sh <<'EOF'
#!/bin/sh
# 定义回收站文件夹目录.trash
trash="/.trash"
deltime=$(date +%Y%m%d-%H-%M-%S)
TRASH_DIR="${HOME}${trash}/${deltime}"
# 建立回收站目录当不存在的时候
if [ ! -e ${TRASH_DIR} ];then
   mkdir -p ${TRASH_DIR}
fi
for i in $*;do
  if [ "$i" = "-rf" ];then continue;fi
  # 防止误操作
  if [ "$i" = "/" ];then echo '# Danger delete command, Not delete / directory!';exit -1;fi
  #定义秒时间戳
  STAMP=$(date +%s)
  #得到文件名称(非文件夹),参考man basename
  fileName=$(basename $i)
  #将输入的参数,对应文件mv至.trash目录,文件后缀,为当前的时间戳
  mv $i ${TRASH_DIR}/${fileName}.${STAMP}
done
EOF
sudo chmod +775 /usr/local/bin/remove.sh /etc/profile.d/alias.sh /etc/profile.d/history-record.sh
sudo chmod a+x /usr/local/bin/remove.sh /etc/profile.d/alias.sh /etc/profile.d/history-record.sh
source /etc/profile.d/alias.sh  /etc/profile.d/history-record.sh


# (2) 解决普通定时任务无法后台定时执行
log::info "[-] 解决普通定时任务无法后台定时执行 "
linenumber=`expr $(egrep -n "pam_unix.so\s$" /etc/pam.d/common-session-noninteractive | cut -f 1 -d ":") - 2`
sudo sed -ri "${linenumber}a session [success=1 default=ignore] pam_succeed_if.so service in cron quiet use_uid" /etc/pam.d/common-session-noninteractive


# (3) 解决 ubuntu20.04 multipath add missing path 错误
# 添加以下内容,sda视本地环境做调整
tee -a /etc/multipath.conf <<'EOF'
blacklist {
  devnode "^sda"
}
EOF
# 重启multipath-tools服务
sudo service multipath-tools restart

# (4) 禁用 Ubuntu 中的 cloud-init
# 在 /etc/cloud 目录下创建 cloud-init.disabled 文件,注意重启后生效
sudo touch /etc/cloud/cloud-init.disabled
}



## 名称: os::optimizationn
## 用途: 操作系统优化设置(内核参数)
## 参数: 无
os::optimizationn () {
log::info "[-] 正在进行操作系统内核参数优化设置......."

# (1) 系统内核参数的配置(/etc/sysctl.conf)
log::info "[-] 系统内核参数的配置/etc/sysctl.conf"

# /etc/sysctl.d/99-kubernetes-cri.conf
egrep -q "^(#)?net.ipv4.ip_forward.*" /etc/sysctl.conf && sed -ri "s|^(#)?net.ipv4.ip_forward.*|net.ipv4.ip_forward = 1|g"  /etc/sysctl.conf || echo "net.ipv4.ip_forward = 1" >> /etc/sysctl.conf
# egrep -q "^(#)?net.bridge.bridge-nf-call-ip6tables.*" /etc/sysctl.conf && sed -ri "s|^(#)?net.bridge.bridge-nf-call-ip6tables.*|net.bridge.bridge-nf-call-ip6tables = 1|g" /etc/sysctl.conf || echo "net.bridge.bridge-nf-call-ip6tables = 1" >> /etc/sysctl.conf 
# egrep -q "^(#)?net.bridge.bridge-nf-call-iptables.*" /etc/sysctl.conf && sed -ri "s|^(#)?net.bridge.bridge-nf-call-iptables.*|net.bridge.bridge-nf-call-iptables = 1|g" /etc/sysctl.conf || echo "net.bridge.bridge-nf-call-iptables = 1" >> /etc/sysctl.conf
egrep -q "^(#)?net.ipv6.conf.all.disable_ipv6.*" /etc/sysctl.conf && sed -ri "s|^(#)?net.ipv6.conf.all.disable_ipv6.*|net.ipv6.conf.all.disable_ipv6 = 1|g" /etc/sysctl.conf || echo "net.ipv6.conf.all.disable_ipv6 = 1" >> /etc/sysctl.conf
egrep -q "^(#)?net.ipv6.conf.default.disable_ipv6.*" /etc/sysctl.conf && sed -ri "s|^(#)?net.ipv6.conf.default.disable_ipv6.*|net.ipv6.conf.default.disable_ipv6 = 1|g" /etc/sysctl.conf || echo "net.ipv6.conf.default.disable_ipv6 = 1" >> /etc/sysctl.conf
egrep -q "^(#)?net.ipv6.conf.lo.disable_ipv6.*" /etc/sysctl.conf && sed -ri "s|^(#)?net.ipv6.conf.lo.disable_ipv6.*|net.ipv6.conf.lo.disable_ipv6 = 1|g" /etc/sysctl.conf || echo "net.ipv6.conf.lo.disable_ipv6 = 1" >> /etc/sysctl.conf
egrep -q "^(#)?net.ipv6.conf.all.forwarding.*" /etc/sysctl.conf && sed -ri "s|^(#)?net.ipv6.conf.all.forwarding.*|net.ipv6.conf.all.forwarding = 1|g" /etc/sysctl.conf || echo "net.ipv6.conf.all.forwarding = 1"  >> /etc/sysctl.conf
egrep -q "^(#)?vm.max_map_count.*" /etc/sysctl.conf && sed -ri "s|^(#)?vm.max_map_count.*|vm.max_map_count = 262144|g" /etc/sysctl.conf || echo "vm.max_map_count = 262144"  >> /etc/sysctl.conf

tee -a /etc/sysctl.conf <<'EOF'
# 调整提升服务器负载能力之外,还能够防御小流量的Dos、CC和SYN攻击
net.ipv4.tcp_syncookies = 1
net.ipv4.tcp_tw_reuse = 1
# net.ipv4.tcp_tw_recycle = 1
net.ipv4.tcp_fin_timeout = 60
net.ipv4.tcp_synack_retries = 1
net.ipv4.tcp_syn_retries = 1
net.ipv4.tcp_fastopen = 3
# 优化TCP的可使用端口范围及提升服务器并发能力(注意一般流量小的服务器上没必要设置如下参数)
net.ipv4.tcp_keepalive_time = 7200
net.ipv4.tcp_max_syn_backlog = 8192
net.ipv4.tcp_max_tw_buckets = 16384
net.ipv4.ip_local_port_range = 1024 65535
# 优化核套接字TCP的缓存区
net.core.netdev_max_backlog = 8192
net.core.somaxconn = 8192
net.core.rmem_max = 12582912
net.core.rmem_default = 6291456
net.core.wmem_max = 12582912
net.core.wmem_default = 6291456
EOF


# (2) Linux 系统的最大进程数和最大文件打开数限制
log::info "[-] Linux 系统的最大进程数和最大文件打开数限制"
egrep -q "^\s*ulimit -HSn\s+\w+.*$" /etc/profile && sed -ri "s/^\s*ulimit -HSn\s+\w+.*$/ulimit -HSn 65535/" /etc/profile || echo "ulimit -HSn 65535" >> /etc/profile
egrep -q "^\s*ulimit -HSu\s+\w+.*$" /etc/profile && sed -ri "s/^\s*ulimit -HSu\s+\w+.*$/ulimit -HSu 65535/" /etc/profile || echo "ulimit -HSu 65535" >> /etc/profile

tee -a /etc/security/limits.conf <<'EOF'
# ulimit -HSn 65535
# ulimit -HSu 65535
*  soft  nofile  65535
*  hard  nofile  65535
*  soft  nproc   65535
*  hard  nproc   65535
# End of file
EOF
# sed -i "/# End/i *  soft  nproc   65535" /etc/security/limits.conf
# sed -i "/# End/i *  hard  nproc   65535" /etc/security/limits.conf
sysctl -p

# 需重启生效
reboot
}

## 名称: system::swap
## 用途: Liunx 系统创建SWAP交换分区(默认2G)
## 参数: $1(几G)
system::swap () {
  if [ -e $1 ];then
    sudo dd if=/dev/zero of=/swapfile bs=1024 count=2097152   # 2G Swap 分区 1024 * 1024 , centos 以 1000 为标准
  else
    number=$(echo "${1}*1024*1024"|bc)
    sudo dd if=/dev/zero of=/swapfile bs=1024 count=${number}   # 2G Swap 分区 1024 * 1024 , centos 以 1000 为标准
  fi

  sudo mkswap /swapfile && sudo swapon /swapfile
  if [ $(grep -c "/swapfile" /etc/fstab) -eq 0 ];then
sudo tee -a /etc/fstab <<'EOF'
/swapfile swap swap default 0 0
EOF
fi
sudo swapon --show && sudo free -h
}




## 名称: software::Java
## 用途: java 环境安装配置
## 参数: 无
software::Java () {
  # 基础变量
  JAVA_FILE="/root/Downloads/jdk-8u211-linux-x64.tar.gz"
  JAVA_SRC="/usr/local/"
  JAVA_DIR="/usr/local/jdk"
  # 环境配置
  sudo tar -zxvf ${JAVA_FILE} -C ${JAVA_SRC}
  sudo rm -rf /usr/local/jdk 
  JAVA_SRC=$(ls /usr/local/ | grep "jdk")
  sudo ln -s ${JAVA_SRC} ${JAVA_DIR}
  export PATH=${JAVA_DIR}/bin:${PATH}
  sudo cp /etc/profile /etc/profile.$(date +%Y%m%d-%H%M%S).bak
  sudo tee -a /etc/profile <<'EOF'
export JAVA_HOME=/usr/local/jdk
export JRE_HOME=/usr/local/jdk/jre
export CLASSPATH=.:$JAVA_HOME/lib/dt.jar:$JAVA_HOME/lib/tools.jar
export PATH=$JAVA_HOME/bin:$PATH
EOF
  java -version
}


## 名称: software::docker
## 用途: 软件安装之Docker安装
## 参数: 无
# 帮助: https://docs.docker.com/engine/install/ubuntu/
# Ubuntu Focal 20.04 (LTS)
# Ubuntu Bionic 18.04 (LTS)
# Ubuntu Xenial 16.04 (LTS)
function InstallDocker(){
  # 1.卸载旧版本 
  sudo apt-get remove docker docker-engine docker.io containerd runc
  
  # 2.更新apt包索引并安装包以允许apt在HTTPS上使用存储库
  sudo apt-get install -y \
    apt-transport-https \
    ca-certificates \
    curl \
    gnupg-agent \
    software-properties-common

  # 3.添加Docker官方GPG密钥 # -fsSL
  sudo curl  https://download.docker.com/linux/ubuntu/gpg | sudo apt-key add -

  # 4.通过搜索指纹的最后8个字符进行密钥验证
  sudo apt-key fingerprint 0EBFCD88

  # 5.设置稳定存储库
  sudo add-apt-repository \
   "deb [arch=amd64] https://download.docker.com/linux/ubuntu \
   $(lsb_release -cs) \
   stable"

  # 6.Install Docker Engine 默认最新版本
  sudo apt-get update && sudo apt-get install -y docker-ce=5:20.10.7~3-0~ubuntu-focal docker-ce-cli=5:20.10.7~3-0~ubuntu-focal containerd.io docker-compose
  # - 强制IPv4
  # sudo apt-get -o Acquire::ForceIPv4=true  install -y docker-ce=5:19.03.15~3-0~ubuntu-focal docker-ce-cli=5:19.03.15~3-0~ubuntu-focal containerd.io docker-compose

  # 7.安装特定版本的Docker引擎,请在repo中列出可用的版本
  apt-cache madison docker-ce
  # docker-ce | 5:20.10.6~3-0~ubuntu-focal| https://download.docker.com/linux/ubuntu focal/stable amd64 Packages
  # docker-ce | 5:19.03.15~3-0~ubuntu-focal | https://download.docker.com/linux/ubuntu  xenial/stable amd64 Packages
  # 使用第二列中的版本字符串安装特定的版本,例如:5:18.09.1~3-0~ubuntu-xenial。
  # $sudo apt-get install docker-ce=<VERSION_STRING> docker-ce-cli=<VERSION_STRING> containerd.io

  #8.将当前用户加入docker用户组然后重新登陆当前用户使得低权限用户
  sudo gpasswd -a ${USER} docker
  # sudo gpasswd -a weiyigeek docker

  #9.加速器建立
  mkdir -vp /etc/docker/
sudo tee /etc/docker/daemon.json <<-'EOF'
{
  "registry-mirrors": ["https://xlx9erfu.mirror.aliyuncs.com"],
  "exec-opts": ["native.cgroupdriver=systemd"],
  "storage-driver": "overlay2",
  "log-driver": "json-file",
  "log-level": "warn",
  "log-opts": {
    "max-size": "100m",
    "max-file": "10"
  },
  "live-restore": true,
  "dns": ["192.168.12.254","223.6.6.6"],
  "insecure-registries": [ "harbor.weiyigeek.top","harbor.cloud"]
}
EOF
  # "data-root":"/monitor/container",
  # "insecure-registries": ["harbor.weiyigeek.top"]
  # 9.自启与启动
  sudo systemctl daemon-reload
  sudo systemctl enable docker 
  sudo systemctl restart docker

  # 10.退出登陆生效
  exit
}


## 名称: disk::Lvsmanager
## 用途: uBuntu操作系统磁盘 LVS 逻辑卷添加与配置(扩容流程)
## 参数: 无
disk::Lvsmanager () {
  echo "\n分区信息:"
  sudo df -Th
  sudo lsblk
  echo -e "\n 磁盘信息:"
  sudo fdisk -l
  echo -e "\n PV物理卷查看:"
  sudo pvscan
  echo -e "\n vgs虚拟卷查看:"
  sudo vgs
  echo -e "\n lvscan逻辑卷扫描:"
  sudo lvscan
  echo -e "\n 分区扩展"
  echo "Ubuntu \n lvextend -L +74G /dev/ubuntu-vg/ubuntu-lv"
  echo "lsblk"
  echo -e "ubuntu general \n # resize2fs -p -F /dev/mapper/ubuntu--vg-ubuntu--lv"
}

# 安全加固过程临时文件清理为基线镜像做准备
unalias rm
find ~/.trash/* -delete
find /home/ -type d -name .trash -exec find {} -delete \;
find /var/log -name "*.gz" -delete
find /var/log -name "*log.*" -delete
find /var/log -name "vmware-*.*.log" -delete
find /var/log -name "*.log-*" -delete
find /var/log -name "*.log" -exec truncate -s 0 {} \;
find /tmp/* -delete

软件版本

操作系统
Ubuntu 20.04 LTS - 5.4.0-107-generic

TLS证书签发
cfssl - v1.6.1
cfssl-certinfo - v1.6.1
cfssljson - v1.6.1

高可用软件
ipvsadm - 1:1.31-1
haproxy - 2.0.13-2
keepalived - 1:2.0.19-2

ETCD数据库
etcd - v3.5.4

容器运行时
containerd.io - 1.6.4

Kubernetes有关组件版本
kubeadm - v1.23.6
kube-apiserver - v1.23.6
kube-controller-manager - v1.23.6
kubectl - v1.23.6
kubelet - v1.23.6
kube-proxy - v1.23.6
kube-scheduler - v1.23.6

网络插件&辅助软件
calico - v3.22
coredns - v1.9.1
kubernetes-dashboard - v2.5.1
k9s - v0.25.18

温馨提示: 上述环境所使用的到相关软件及插件我已打包, 方便大家进行下载,可访问如下链接
下载地址: http://share.weiyigeek.top/f/36158960-578443238-a1a5fa (访问密码:2099]

网络规划

/kubernetes-cluster-binary-install# tree .
.
├── calico
│   └── calico-v3.22.yaml
├── certificate
│   ├── admin-csr.json
│   ├── apiserver-csr.json
│   ├── ca-config.json
│   ├── ca-csr.json
│   ├── cfssl
│   ├── cfssl-certinfo
│   ├── cfssljson
│   ├── controller-manager-csr.json
│   ├── etcd-csr.json
│   ├── kube-scheduler-csr.json
│   ├── proxy-csr.json
│   └── scheduler-csr.json
├── containerd.io
│   └── config.toml
├── coredns
│   ├── coredns.yaml
│   ├── coredns.yaml.sed
│   └── deploy.sh
├── cri-containerd-cni-1.6.4-linux-amd64.tar.gz
├── etcd-v3.5.4-linux-amd64.tar.gz
├── k9s
├── kubernetes-dashboard
│   ├── kubernetes-dashboard.yaml
│   └── rbac-dashboard-admin.yaml
├── kubernetes-server-linux-amd64.tar.gz
└── nginx.yaml

安装部署

1.基础主机环境准备配置

步骤 01.【所有主机】主机名设置按照上述主机规划进行设置。

# 例如, 在10.10.107.223主机中运行。
hostnamectl set-hostname master-223

# 例如, 在10.10.107.227主机中运行。
hostnamectl set-hostname node-2

步骤 02.【所有主机】将规划中的主机名称与IP地址进行硬解析。

sudo tee -a /etc/hosts <<'EOF'
10.10.107.223 master-223
10.10.107.224 master-224
10.10.107.225 master-225
10.10.107.226 node-1
10.10.107.227 node-2
10.10.107.222 weiyigeek.cluster.k8s
EOF

步骤 03.验证每个节点上IP、MAC 地址和 product_uuid 的唯一性,保证其能相互正常通信

# 使用命令 ip link 或 ifconfig -a 来获取网络接口的 MAC 地址
ifconfig -a
# 使用命令 查看 product_uuid 校验
sudo cat /sys/class/dmi/id/product_uuid

步骤 04.【所有主机】系统时间同步与时区设置

date -R
sudo ntpdate ntp.aliyun.com
sudo timedatectl set-timezone Asia/Shanghai
# 或者
# sudo dpkg-reconfigure tzdata
sudo timedatectl set-local-rtc 0
timedatectl

步骤 05.【所有主机】禁用系统交换分区

swapoff -a && sed -i 's|^/swap.img|#/swap.ing|g' /etc/fstab
# 验证交换分区是否被禁用
free | grep "Swap:"

步骤 06.【所有主机】系统内核参数调整

# 禁用 swap 分区
egrep -q "^(#)?vm.swappiness.*" /etc/sysctl.conf && sed -ri "s|^(#)?vm.swappiness.*|vm.swappiness = 0|g"  /etc/sysctl.conf || echo "vm.swappiness = 0" >> /etc/sysctl.conf
# 允许转发
egrep -q "^(#)?net.ipv4.ip_forward.*" /etc/sysctl.conf && sed -ri "s|^(#)?net.ipv4.ip_forward.*|net.ipv4.ip_forward = 1|g"  /etc/sysctl.conf || echo "net.ipv4.ip_forward = 1" >> /etc/sysctl.conf

# - 允许 iptables 检查桥接流量
egrep -q "^(#)?net.bridge.bridge-nf-call-iptables.*" /etc/sysctl.conf && sed -ri "s|^(#)?net.bridge.bridge-nf-call-iptables.*|net.bridge.bridge-nf-call-iptables = 1|g" /etc/sysctl.conf || echo "net.bridge.bridge-nf-call-iptables = 1" >> /etc/sysctl.conf
egrep -q "^(#)?net.bridge.bridge-nf-call-ip6tables.*" /etc/sysctl.conf && sed -ri "s|^(#)?net.bridge.bridge-nf-call-ip6tables.*|net.bridge.bridge-nf-call-ip6tables = 1|g" /etc/sysctl.conf || echo "net.bridge.bridge-nf-call-ip6tables = 1" >> /etc/sysctl.conf

步骤 07.【所有主机】禁用系统防火墙

ufw disable && systemctl disable ufw && systemctl stop ufw

步骤 08.【master-225 主机】使用 master-225 主机的公钥免账号密码登陆其它主机(可选)方便文件在各主机上传下载。

# 生成ed25519格式的公密钥
sh-keygen -t ed25519

# 例如,在master-225 主机上使用密钥登录到 master-223 设置 (其它主机同样)
ssh-copy-id -p 20211 weiyigeek@10.10.107.223
  # /usr/bin/ssh-copy-id: INFO: Source of key(s) to be installed: "/root/.ssh/id_ed25519.pub"
  # Are you sure you want to continue connecting (yes/no/[fingerprint])? yes # 输入yes
  # weiyigeek@10.10.107.223s password: # 输入主机密码
  # Number of key(s) added: 1
  # Now try logging into the machine, with:   "ssh -p '20211' 'weiyigeek@10.10.107.223'"
  # and check to make sure that only the key(s) you wanted were added.
ssh-copy-id -p 20211 weiyigeek@10.10.107.224
ssh-copy-id -p 20211 weiyigeek@10.10.107.226
ssh-copy-id -p 20211 weiyigeek@10.10.107.227

2.负载均衡管理工具安装与内核加载

步骤 01.安装ipvs模块以及负载均衡相关依赖。

# 查看可用版本
sudo apt-cache madison ipvsadm
  # ipvsadm |   1:1.31-1 | http://mirrors.aliyun.com/ubuntu focal/main amd64 Packages

# 安装
sudo apt -y install ipvsadm ipset sysstat conntrack

# 锁定版本 
apt-mark hold ipvsadm
  # ipvsadm set on hold.

步骤 02.将模块加载到内核中(开机自动设置-需要重启机器生效)

tee /etc/modules-load.d/k8s.conf <<'EOF'
# netfilter
br_netfilter

# containerd
overlay

# nf_conntrack
nf_conntrack

# ipvs
ip_vs
ip_vs_lc
ip_vs_lblc
ip_vs_lblcr
ip_vs_rr
ip_vs_wrr
ip_vs_sh
ip_vs_dh
ip_vs_fo
ip_vs_nq
ip_vs_sed
ip_vs_ftp
ip_tables
ip_set
ipt_set
ipt_rpfilter
ipt_REJECT
ipip
xt_set
EOF

步骤 03.手动加载模块到内核中

mkdir -vp /etc/modules.d/
tee /etc/modules.d/k8s.modules <<'EOF'
#!/bin/bash
# netfilter 模块 允许 iptables 检查桥接流量
modprobe -- br_netfilter
# containerd
modprobe -- overlay
# nf_conntrack
modprobe -- nf_conntrack
# ipvs
modprobe -- ip_vs
modprobe -- ip_vs_lc
modprobe -- ip_vs_lblc
modprobe -- ip_vs_lblcr
modprobe -- ip_vs_rr
modprobe -- ip_vs_wrr
modprobe -- ip_vs_sh
modprobe -- ip_vs_dh
modprobe -- ip_vs_fo
modprobe -- ip_vs_nq
modprobe -- ip_vs_sed
modprobe -- ip_vs_ftp
modprobe -- ip_tables
modprobe -- ip_set
modprobe -- ipt_set
modprobe -- ipt_rpfilter
modprobe -- ipt_REJECT
modprobe -- ipip
modprobe -- xt_set
EOF

chmod 755 /etc/modules.d/k8s.modules && bash /etc/modules.d/k8s.modules && lsmod | grep -e ip_vs -e nf_conntrack
  # ip_vs_sh               16384  0
  # ip_vs_wrr              16384  0
  # ip_vs_rr               16384  0
  # ip_vs                 155648  6 ip_vs_rr,ip_vs_sh,ip_vs_wrr
  # nf_conntrack          139264  1 ip_vs
  # nf_defrag_ipv6         24576  2 nf_conntrack,ip_vs
  # nf_defrag_ipv4         16384  1 nf_conntrack
  # libcrc32c              16384  5 nf_conntrack,btrfs,xfs,raid456,ip_vs

sysctl --system

温馨提示: 在 kernel 4.19 版本及以上将使用 nf_conntrack 模块, 则在 4.18 版本以下则需使用nf_conntrack_ipv4 模块。

3.高可用HAproxy与Keepalived软件安装配置

描述: 由于是测试学习环境, 此处我未专门准备两台HA服务器, 而是直接采用master节点机器,如果是正式环境建议独立出来。

步骤 01.【Master节点机器】安装下载 haproxy (HA代理健康检测) 与 keepalived (虚拟路由协议-主从)。

# 查看可用版本
sudo apt-cache madison haproxy keepalived
  #  haproxy | 2.0.13-2ubuntu0.5 | http://mirrors.aliyun.com/ubuntu focal-security/main amd64 Packages
  # keepalived | 1:2.0.19-2ubuntu0.2 | http://mirrors.aliyun.com/ubuntu focal-updates/main amd64 Packages

# 安装
sudo apt -y install haproxy keepalived

# 锁定版本 
apt-mark hold haproxy keepalived

步骤 02.【Master节点机器】进行 HAProxy 配置,其配置目录为 /etc/haproxy/,所有节点配置是一致的。

sudo cp /etc/haproxy/haproxy.cfg{,.bak}
tee /etc/haproxy/haproxy.cfg<<'EOF'
global
  user haproxy
  group haproxy
  maxconn 2000
  daemon
  log /dev/log local0
  log /dev/log local1 err
  chroot /var/lib/haproxy
  stats socket /run/haproxy/admin.sock mode 660 level admin expose-fd listeners
  stats timeout 30s
  # Default SSL material locations
  ca-base /etc/ssl/certs
  crt-base /etc/ssl/private
  # See: https://ssl-config.mozilla.org/#server=haproxy&server-version=2.0.3&config=intermediate
  ssl-default-bind-ciphers ECDHE-ECDSA-AES128-GCM-SHA256:ECDHE-RSA-AES128-GCM-SHA256:ECDHE-ECDSA-AES256-GCM-SHA384:ECDHE-RSA-AES256-GCM-SHA384:ECDHE-ECDSA-CHACHA20-POLY1305:ECDHE-RSA-CHACHA20-POLY1305:DHE-RSA-AES128-GCM-SHA256:DHE-RSA-AES256-GCM-SHA384
  ssl-default-bind-ciphersuites TLS_AES_128_GCM_SHA256:TLS_AES_256_GCM_SHA384:TLS_CHACHA20_POLY1305_SHA256
  ssl-default-bind-options ssl-min-ver TLSv1.2 no-tls-tickets

defaults
  log     global
  mode    http
  option  httplog
  option  dontlognull
  timeout connect 5000
  timeout client  50000
  timeout server  50000
  timeout http-request 15s
  timeout http-keep-alive 15s
  # errorfile 400 /etc/haproxy/errors/400.http
  # errorfile 403 /etc/haproxy/errors/403.http
  # errorfile 408 /etc/haproxy/errors/408.http
  # errorfile 500 /etc/haproxy/errors/500.http
  # errorfile 502 /etc/haproxy/errors/502.http
  # errorfile 503 /etc/haproxy/errors/503.http
  # errorfile 504 /etc/haproxy/errors/504.http

# 注意: 管理HAproxy (可选)
# frontend monitor-in
#   bind *:33305
#   mode http
#   option httplog
#   monitor-uri /monitor

# 注意: 基于四层代理, 1644 3为VIP的 ApiServer 控制平面端口, 由于是与master节点在一起所以不能使用6443端口.
frontend k8s-master
  bind 0.0.0.0:16443
  bind 127.0.0.1:16443
  mode tcp
  option tcplog
  tcp-request inspect-delay 5s
  default_backend k8s-master

# 注意: Master 节点的默认 Apiserver 是6443端口
backend k8s-master
  mode tcp
  option tcplog
  option tcp-check
  balance roundrobin
  default-server inter 10s downinter 5s rise 2 fall 2 slowstart 60s maxconn 250 maxqueue 256 weight 100
  server master-223 10.10.107.223:6443 check
  server master-224 10.10.107.224:6443 check
  server master-225 10.10.107.225:6443 check
EOF

步骤 03.【Master节点机器】进行 置KeepAlived 配置 ,其配置目录为 /etc/haproxy/

# 创建配置目录,分别在各个master节点执行。
mkdir -vp /etc/keepalived
# __ROLE__ 角色: MASTER 或者 BACKUP
# __NETINTERFACE__ 宿主机物理网卡名称 例如我的ens32
# __IP__ 宿主机物理IP地址
# __VIP__ 虚拟VIP地址
sudo tee /etc/keepalived/keepalived.conf <<'EOF'
! Configuration File for keepalived
global_defs {
  router_id LVS_DEVEL
script_user root
  enable_script_security
}
vrrp_script chk_apiserver {
  script "/etc/keepalived/check_apiserver.sh"
  interval 5
  weight -5
  fall 2  
  rise 1
}
vrrp_instance VI_1 {
  state __ROLE__
  interface __NETINTERFACE__
  mcast_src_ip __IP__
  virtual_router_id 51
  priority 101
  advert_int 2
  authentication {
    auth_type PASS
    auth_pass K8SHA_KA_AUTH
  }
  virtual_ipaddress {
    __VIP__
  }
  # HA 健康检查
  # track_script {
  #   chk_apiserver
  # }
}
EOF

# 此处 master-225 性能较好所以配置为Master (master-225 主机上执行)
# master-225 10.10.107.225 => MASTER
sed -i -e 's#__ROLE__#MASTER#g' \
  -e 's#__NETINTERFACE__#ens32#g' \
  -e 's#__IP__#10.10.107.225#g' \
  -e 's#__VIP__#10.10.107.222#g' /etc/keepalived/keepalived.conf 

# master-224 10.10.107.224 => BACKUP  (master-224 主机上执行)
sed -i -e 's#__ROLE__#BACKUP#g' \
  -e 's#__NETINTERFACE__#ens32#g' \
  -e 's#__IP__#10.10.107.224#g' \
  -e 's#__VIP__#10.10.107.222#g' /etc/keepalived/keepalived.conf 

# master-223 10.10.107.223 => BACKUP  (master-223 主机上执行)
sed -i -e 's#__ROLE__#BACKUP#g' \
  -e 's#__NETINTERFACE__#ens32#g' \
  -e 's#__IP__#10.10.107.223#g' \
  -e 's#__VIP__#10.10.107.222#g' /etc/keepalived/keepalived.conf

温馨提示: 注意上述的健康检查是关闭注释了的,你需要将K8S集群建立完成后再开启。

track_script {
  chk_apiserver
}

步骤 04.【Master节点机器】进行配置 KeepAlived 健康检查文件。

sudo tee /etc/keepalived/check_apiserver.sh <<'EOF'
#!/bin/bash
err=0
for k in $(seq 1 3)
do
  check_code=$(pgrep haproxy)
  if [[ $check_code == "" ]]; then
    err=$(expr $err + 1)
    sleep 1
    continue
  else
    err=0
    break
  fi
done

if [[ $err != "0" ]]; then
  echo "systemctl stop keepalived"
  /usr/bin/systemctl stop keepalived
  exit 1
else
  exit 0
fi
EOF
sudo chmod +x /etc/keepalived/check_apiserver.sh

步骤 05.【Master节点机器】启动 haproxy 、keepalived 相关服务及测试VIP漂移。

# 重载 Systemd 设置 haproxy 、keepalived 开机自启以及立即启动
sudo systemctl daemon-reload
sudo systemctl enable --now haproxy && sudo systemctl enable --now keepalived
# Synchronizing state of haproxy.service with SysV service script with /lib/systemd/systemd-sysv-install.
# Executing: /lib/systemd/systemd-sysv-install enable haproxy
# Synchronizing state of keepalived.service with SysV service script with /lib/systemd/systemd-sysv-install.
# Executing: /lib/systemd/systemd-sysv-install enable keepalived

# 在 master-223 主机中发现vip地址在其主机上。
root@master-223:~$ ip addr
  # 2: ens32: <BROADCAST,MULTICAST,UP,LOWER_UP> mtu 1500 qdisc fq_codel state UP group default qlen 1000
  #     link/ether 00:0c:29:00:0f:8f brd ff:ff:ff:ff:ff:ff
  #     inet 10.10.107.223/24 brd 10.10.107.255 scope global ens32
  #        valid_lft forever preferred_lft forever
  #     inet 10.10.107.222/32 scope global ens32
  #        valid_lft forever preferred_lft forever
# 其它两台主机上通信验证。
root@master-224:~$ ping 10.10.107.222
root@master-225:~$ ping 10.10.107.222
# 手动验证VIP漂移,我们将该服务器上keepalived停止掉。
root@master-223:~$ pgrep haproxy
  # 6320
  # 6321
root@master-223:~$ /usr/bin/systemctl stop keepalived

# 此时,发现VIP已经飘到master-225主机中
root@master-225:~$ ip addr show ens32
  # 2: ens32: <BROADCAST,MULTICAST,UP,LOWER_UP> mtu 1500 qdisc fq_codel state UP group default qlen 1000
  #     link/ether 00:0c:29:93:28:61 brd ff:ff:ff:ff:ff:ff
  #     inet 10.10.107.225/24 brd 10.10.107.255 scope global ens32
  #       valid_lft forever preferred_lft forever
  #     inet 10.10.107.222/32 scope global ens32
  #       valid_lft forever preferred_lft forever

至此,HAProxy 与 Keepalived 配置就告一段落了。

配置部署etcd集群与etcd证书签发

描述: 创建一个高可用的ETCD集群,此处我们在【master-225】机器中操作。

步骤 01.【master-225】创建一个配置与相关文件存放的目录, 以及下载获取cfssl工具进行CA证书制作与签发(cfssl工具往期文章参考地址: https://blog.weiyigeek.top/2019/10-21-12.html#3-CFSSL-生成 )。

# 工作目录创建
mkdir -vp /app/k8s-init-work && cd /app/k8s-init-work

# cfssl 最新下载地址: https://github.com/cloudflare/cfssl/releases
# cfssl 相关工具拉取 (如果拉取较慢,建议使用某雷下载,然后上传到服务器里)
curl -L https://github.com/cloudflare/cfssl/releases/download/v1.6.1/cfssl_1.6.1_linux_amd64 -o /usr/local/bin/cfssl
curl -L https://github.com/cloudflare/cfssl/releases/download/v1.6.1/cfssljson_1.6.1_linux_amd64 -o /usr/local/bin/cfssljson
curl -L https://github.com/cloudflare/cfssl/releases/download/v1.6.1/cfssl-certinfo_1.6.1_linux_amd64 -o /usr/local/bin/cfssl-certinfo

# 赋予执行权限
chmod +x /usr/local/bin/cfssl*

/app# cfssl version
# Version: 1.2.0
# Revision: dev
# Runtime: go1.6

温馨提示:

  • cfssl : CFSSL 命令行工具
  • cfssljson : 用于从cfssl程序中获取JSON输出并将证书、密钥、证书签名请求文件CSR和Bundle写入到文件中,

步骤 02.利用上述 cfssl 工具创建 CA 证书。

# - CA 证书签名请求配置文件
fssl print-defaults csr > ca-csr.json
tee ca-csr.json <<'EOF'
{
  "CN": "kubernetes",
  "key": {
    "algo": "rsa",
    "size": 2048
  },
  "names": [
    {
      "C": "CN",
      "L": "ChongQing",
      "ST": "ChongQing",
      "O": "k8s",
      "OU": "System"
    }
  ],
  "ca": {
    "expiry": "87600h"
  }
}
EOF

# 关键参数解析:
CN: Common Name,浏览器使用该字段验证网站是否合法,一般写的是域名,非常重要。浏览器使用该字段验证网站是否合法
key:生成证书的算法
hosts:表示哪些主机名(域名)或者IP可以使用此csr申请的证书,为空或者""表示所有的都可以使用(本例中没有`"hosts": [""]`字段)
names:常见属性设置
  * C: Country, 国家
  * ST: State,州或者是省份
  * L: Locality Name,地区,城市
  * O: Organization Name,组织名称,公司名称(在k8s中常用于指定Group,进行RBAC绑定)
  * OU: Organization Unit Name,组织单位名称,公司部门

# - CA 证书策略配置文件
cfssl print-defaults config > ca-config.json
tee ca-config.json <<'EOF'
{
  "signing": {
    "default": {
      "expiry": "87600h"
    },
    "profiles": {
      "kubernetes": {
        "expiry": "87600h",
        "usages": [
            "signing",
            "key encipherment",
            "server auth",
            "client auth"
        ]
      },
      "etcd": {
        "expiry": "87600h",
        "usages": [
            "signing",
            "key encipherment",
            "server auth",
            "client auth"
        ]
      }
    }
  }
}
EOF

# 关键参数解析:
default 默认策略,指定了证书的默认有效期是10年(87600h)
profile 自定义策略配置
  * kubernetes:表示该配置(profile)的用途是为kubernetes生成证书及相关的校验工作
  * signing:表示该证书可用于签名其它证书;生成的 ca.pem 证书中 CA=TRUE
  * server auth:表示可以该CA 对 server 提供的证书进行验证
  * client auth:表示可以用该 CA 对 client 提供的证书进行验证
  * expiry:也表示过期时间,如果不写以default中的为准

# - 执行cfssl gencert 命令生成CA证书
# 利用CA证书签名请求配置文件 ca-csr.json 生成CA证书和CA私钥和CSR(证书签名请求):
cfssl gencert -initca ca-csr.json | cfssljson -bare ca
  # 2022/04/27 16:49:37 [INFO] generating a new CA key and certificate from CSR
  # 2022/04/27 16:49:37 [INFO] generate received request
  # 2022/04/27 16:49:37 [INFO] received CSR
  # 2022/04/27 16:49:37 [INFO] generating key: rsa-2048
  # 2022/04/27 16:49:37 [INFO] encoded CSR
  # 2022/04/27 16:49:37 [INFO] signed certificate with serial number 245643466964695827922023924375276493244980966303
$ ls
  # ca-config.json  ca.csr  ca-csr.json  ca-key.pem  ca.pem
$ openssl x509 -in ca.pem -text -noout | grep "Not"
  # Not Before: Apr 27 08:45:00 2022 GMT
  # Not After : Apr 24 08:45:00 2032 GMT

温馨提示: 如果将 expiry 设置为87600h 表示证书过期时间为十年。

步骤 03.配置ETCD证书相关文件以及生成其证书

# etcd 证书请求文件
tee etcd-csr.json <<'EOF'
{
  "CN": "etcd",
  "hosts": [
    "127.0.0.1",
    "10.10.107.223",
    "10.10.107.224",
    "10.10.107.225",
    "etcd1",
    "etcd2",
    "etcd3"
  ],
  "key": {
    "algo": "rsa",
    "size": 2048
  },
  "names": [
    {
      "C": "CN",
      "L": "ChongQing",
      "ST": "ChongQing",
      "O": "etcd",
      "OU": "System"
    }
  ]
}
EOF

# 利用ca证书签发生成etcd证书
cfssl gencert -ca=ca.pem -ca-key=ca-key.pem -config=ca-config.json -profile=etcd etcd-csr.json | cfssljson -bare etcd

$ ls etcd*
etcd.csr  etcd-csr.json  etcd-key.pem  etcd.pem
$ openssl x509 -in etcd.pem -text -noout | grep  "X509v3 Subject Alternative Name" -A 1
  # X509v3 Subject Alternative Name:
  #   DNS:etcd1, DNS:etcd2, DNS:etcd3, IP Address:127.0.0.1, IP Address:10.10.107.223, IP Address:10.10.107.224, IP Address:10.10.107.225

步骤 04.【所有Master节点主机】下载部署ETCD集群, 首先我们需要下载etcd软件包, 可以 Github release 找到最新版本的etcd下载路径(https://github.com/etcd-io/etcd/releases/)。

# 下载
wget -L https://github.com/etcd-io/etcd/releases/download/v3.5.4/etcd-v3.5.4-linux-amd64.tar.gz
tar -zxvf etcd-v3.5.4-linux-amd64.tar.gz
cp -a etcd* /usr/local/bin/

# 版本 
etcd --version
  # etcd Version: 3.5.4
  # Git SHA: 08407ff76
  # Go Version: go1.16.15
  # Go OS/Arch: linux/amd64

# 复制到其它master主机上
scp -P 20211 ./etcd-v3.5.4-linux-amd64.tar.gz weiyigeek@master-223:~
scp -P 20211 ./etcd-v3.5.4-linux-amd64.tar.gz weiyigeek@master-224:~

# 分别在master-223与master-224执行, 解压到 /usr/local/ 目录同样复制二进制文件到 /usr/local/bin/
tar -zxvf /home/weiyigeek/etcd-v3.5.4-linux-amd64.tar.gz -C /usr/local/
cp -a /usr/local/etcd-v3.5.4-linux-amd64/etcd* /usr/local/bin/

温馨提示: etcd 官网地址 ( https://etcd.io/)

步骤 05.创建etcd集群所需的配置文件。

# 证书准备
mkdir -vp /etc/etcd/pki/
cp *.pem /etc/etcd/pki/
ls /etc/etcd/pki/
  # ca-key.pem  ca.pem  etcd-key.pem  etcd.pem

# 上传到~家目录,并需要将其复制到 /etc/etcd/pki/ 目录中
scp -P 20211 *.pem weiyigeek@master-224:~
scp -P 20211 *.pem weiyigeek@master-223:~
  # ****************** [ 安全登陆 (Security Login) ] *****************
  # Authorized only. All activity will be monitored and reported.By Security Center.
  # ca-key.pem             100% 1675     3.5MB/s   00:00
  # ca.pem                 100% 1375     5.2MB/s   00:00
  # etcd-key.pem           100% 1679     7.0MB/s   00:00
  # etcd.pem               100% 1399     5.8MB/s   00:00

# master-225 执行
tee /etc/etcd/etcd.conf <<'EOF'
# [成员配置]
# member 名称
ETCD_NAME=etcd1
# 存储数据的目录(注意需要建立)
ETCD_DATA_DIR="/var/lib/etcd/data"
# 用于监听客户端etcdctl或者curl连接
ETCD_LISTEN_CLIENT_URLS="https://10.10.107.225:2379,https://127.0.0.1:2379"
# 用于监听集群中其它member的连接
ETCD_LISTEN_PEER_URLS="https://10.10.107.225:2380"

# [证书配置]
# ETCD_CERT_FILE=/etc/etcd/pki/etcd.pem
# ETCD_KEY_FILE=/etc/etcd/pki/etcd-key.pem
# ETCD_TRUSTED_CA_FILE=/etc/kubernetes/pki/ca.pem
# ETCD_CLIENT_CERT_AUTH=true
# ETCD_PEER_CLIENT_CERT_AUTH=true
# ETCD_PEER_CERT_FILE=/etc/etcd/pki/etcd.pem
# ETCD_PEER_KEY_FILE=/etc/etcd/pki/etcd-key.pem
# ETCD_PEER_TRUSTED_CA_FILE=/etc/kubernetes/pki/ca.pem

# [集群配置]
# 本机地址用于通知客户端,客户端通过此IPs与集群通信;
ETCD_ADVERTISE_CLIENT_URLS="https://10.10.107.225:2379"
# 本机地址用于通知集群member与member通信
ETCD_INITIAL_ADVERTISE_PEER_URLS="https://10.10.107.225:2380"
# 描述集群中所有节点的信息,本member根据此信息去联系其他member
ETCD_INITIAL_CLUSTER="etcd1=https://10.10.107.225:2380,etcd2=https://10.10.107.224:2380,etcd3=https://10.10.107.223:2380"
# 集群状态新建集群时候设置为new,若是想加入某个已经存在的集群设置为existing
ETCD_INITIAL_CLUSTER_STATE=new
EOF


# master-224 执行
tee /etc/etcd/etcd.conf <<'EOF'
# [成员配置]
# member 名称
ETCD_NAME=etcd2
# 存储数据的目录(注意需要建立)
ETCD_DATA_DIR="/var/lib/etcd/data"
# 用于监听客户端etcdctl或者curl连接
ETCD_LISTEN_CLIENT_URLS="https://10.10.107.224:2379,https://127.0.0.1:2379"
# 用于监听集群中其它member的连接
ETCD_LISTEN_PEER_URLS="https://10.10.107.224:2380"

# [集群配置]
# 本机地址用于通知客户端,客户端通过此IPs与集群通信;
ETCD_ADVERTISE_CLIENT_URLS="https://10.10.107.224:2379"
# 本机地址用于通知集群member与member通信
ETCD_INITIAL_ADVERTISE_PEER_URLS="https://10.10.107.224:2380"
# 描述集群中所有节点的信息,本member根据此信息去联系其他member
ETCD_INITIAL_CLUSTER="etcd1=https://10.10.107.225:2380,etcd2=https://10.10.107.224:2380,etcd3=https://10.10.107.223:2380"
# 集群状态新建集群时候设置为new,若是想加入某个已经存在的集群设置为existing
ETCD_INITIAL_CLUSTER_STATE=new
EOF


# master-223 执行
tee /etc/etcd/etcd.conf <<'EOF'
# [成员配置]
# member 名称
ETCD_NAME=etcd3
# 存储数据的目录(注意需要建立)
ETCD_DATA_DIR="/var/lib/etcd/data"
# 用于监听客户端etcdctl或者curl连接
ETCD_LISTEN_CLIENT_URLS="https://10.10.107.223:2379,https://127.0.0.1:2379"
# 用于监听集群中其它member的连接
ETCD_LISTEN_PEER_URLS="https://10.10.107.223:2380"

# [集群配置]
# 本机地址用于通知客户端,客户端通过此IPs与集群通信;
ETCD_ADVERTISE_CLIENT_URLS="https://10.10.107.223:2379"
# 本机地址用于通知集群member与member通信
ETCD_INITIAL_ADVERTISE_PEER_URLS="https://10.10.107.223:2380"
# 描述集群中所有节点的信息,本member根据此信息去联系其他member
ETCD_INITIAL_CLUSTER="etcd1=https://10.10.107.225:2380,etcd2=https://10.10.107.224:2380,etcd3=https://10.10.107.223:2380"
# 集群状态新建集群时候设置为new,若是想加入某个已经存在的集群设置为existing
ETCD_INITIAL_CLUSTER_STATE=new
EOF

步骤 06.【所有Master节点主机】创建配置 etcd 的 systemd 管理配置文件,并启动其服务。

mkdir -vp /var/lib/etcd/
cat > /usr/lib/systemd/system/etcd.service <<EOF
[Unit]
Description=Etcd Server
Documentation=https://github.com/etcd-io/etcd
After=network.target
After=network-online.target
wants=network-online.target

[Service]
Type=notify
WorkingDirectory=/var/lib/etcd/
EnvironmentFile=-/etc/etcd/etcd.conf
ExecStart=/usr/local/bin/etcd \
  --client-cert-auth \
  --trusted-ca-file /etc/etcd/pki/ca.pem \
  --cert-file /etc/etcd/pki/etcd.pem \
  --key-file /etc/etcd/pki/etcd-key.pem \
  --peer-client-cert-auth \
  --peer-trusted-ca-file /etc/etcd/pki/ca.pem \
  --peer-cert-file /etc/etcd/pki/etcd.pem \
  --peer-key-file /etc/etcd/pki/etcd-key.pem
Restart=on-failure
RestartSec=5
LimitNOFILE=65535
LimitNPROC=65535

[Install]
WantedBy=multi-user.target
EOF

# 重载 systemd && 开机启动与手动启用etcd服务
systemctl daemon-reload && systemctl enable --now etcd.service

步骤 07.【所有Master节点主机】查看各个master节点的etcd集群服务是否正常及其健康状态。

# 服务查看
systemctl status etcd.service

# 利用 etcdctl 工具查看集群成员信息
export ETCDCTL_API=3
etcdctl --endpoints=https://10.10.107.225:2379,https://10.10.107.224:2379,https://10.10.107.223:2379 \
--cacert="/etc/etcd/pki/ca.pem" --cert="/etc/etcd/pki/etcd.pem" --key="/etc/etcd/pki/etcd-key.pem" \
--write-out=table member list
  # +------------------+---------+-------+----------------------------+----------------------------+------------+
  # |        ID        | STATUS  | NAME  |         PEER ADDRS         |        CLIENT ADDRS        | IS LEARNER |
  # +------------------+---------+-------+----------------------------+----------------------------+------------+
  # | 144934d02ad45ec7 | started | etcd3 | https://10.10.107.223:2380 | https://10.10.107.223:2379 |      false |
  # | 2480d95a2df867a4 | started | etcd2 | https://10.10.107.224:2380 | https://10.10.107.224:2379 |      false |
  # | 2e8fddd3366a3d88 | started | etcd1 | https://10.10.107.225:2380 | https://10.10.107.225:2379 |      false |
  # +------------------+---------+-------+----------------------------+----------------------------+------------+

# 集群节点信息
etcdctl --endpoints=https://10.10.107.225:2379,https://10.10.107.224:2379,https://10.10.107.223:2379 \
--cacert="/etc/etcd/pki/ca.pem" --cert="/etc/etcd/pki/etcd.pem" --key="/etc/etcd/pki/etcd-key.pem"  \
--write-out=table endpoint status
  # +----------------------------+------------------+---------+---------+-----------+------------+-----------+------------+--------------------+--------+
  # |          ENDPOINT          |        ID        | VERSION | DB SIZE | IS LEADER | IS LEARNER | RAFT TERM | RAFT INDEX | RAFT APPLIED INDEX | ERRORS |
  # +----------------------------+------------------+---------+---------+-----------+------------+-----------+------------+--------------------+--------+
  # | https://10.10.107.225:2379 | 2e8fddd3366a3d88 |   3.5.4 |   20 kB |     false |      false |         3 |         12 |                 12 |        |
  # | https://10.10.107.224:2379 | 2480d95a2df867a4 |   3.5.4 |   20 kB |      true |      false |         3 |         12 |                 12 |        |
  # | https://10.10.107.223:2379 | 144934d02ad45ec7 |   3.5.4 |   20 kB |     false |      false |         3 |         12 |                 12 |        |
  # +----------------------------+------------------+---------+---------+-----------+------------+-----------+------------+--------------------+--------+

# 集群节点健康状态
etcdctl --endpoints=https://10.10.107.225:2379,https://10.10.107.224:2379,https://10.10.107.223:2379 \
--cacert="/etc/etcd/pki/ca.pem" --cert="/etc/etcd/pki/etcd.pem" --key="/etc/etcd/pki/etcd-key.pem"  \
--write-out=table endpoint health
  # +----------------------------+--------+-------------+-------+
  # |          ENDPOINT          | HEALTH |    TOOK     | ERROR |
  # +----------------------------+--------+-------------+-------+
  # | https://10.10.107.225:2379 |   true |  9.151813ms |       |
  # | https://10.10.107.224:2379 |   true | 10.965914ms |       |
  # | https://10.10.107.223:2379 |   true | 11.165228ms |       |
  # +----------------------------+--------+-------------+-------+

# 集群节点性能测试
etcdctl --endpoints=https://10.10.107.225:2379,https://10.10.107.224:2379,https://10.10.107.223:2379 \
--cacert="/etc/etcd/pki/ca.pem" --cert="/etc/etcd/pki/etcd.pem" --key="/etc/etcd/pki/etcd-key.pem"  \
--write-out=tableendpoint check perf
# 59 / 60 Boooooooooooooooooooooooooooooooooooooooooooooooooooooooooooooooooooooooooooooooooooooooooooooooooooooooooooooooooooooooooooooooooooooooooooooooooooooooooooooooooooooooooooooooooooooooooooooooooooooooooooooooooom   !  98.33%PASS: Throughput is 148 writes/s
# Slowest request took too long: 1.344053s
# Stddev too high: 0.143059s
# FAIL

5.Containerd 运行时安装部署

步骤 01.【所有节点】在各主机中安装二进制版本的 containerd.io 运行时服务,Kubernertes 通过 CRI 插件来连接 containerd 服务中, 控制容器的生命周期。

# 从 Github 中下载最新的版本的 cri-containerd-cni 
wget -L https://github.com/containerd/containerd/releases/download/v1.6.4/cri-containerd-cni-1.6.4-linux-amd64.tar.gz

# 解压到当前cri-containerd-cni目录中。
mkdir -vp cri-containerd-cni
tar -zxvf cri-containerd-cni-1.6.4-linux-amd64.tar.gz -C cri-containerd-cni

步骤 02.查看其文件以及配置文件路径信息。

$ tree ./cri-containerd-cni/
.
├── etc
│   ├── cni
│   │   └── net.d
│   │       └── 10-containerd-net.conflist
│   ├── crictl.yaml
│   └── systemd
│       └── system
│           └── containerd.service
├── opt
│   ├── cni
│   │   └── bin
│   │       ├── bandwidth
│   │       ├── bridge
│   │       ├── dhcp
│   │       ├── firewall
│   │       ├── host-device
│   │       ├── host-local
│   │       ├── ipvlan
│   │       ├── loopback
│   │       ├── macvlan
│   │       ├── portmap
│   │       ├── ptp
│   │       ├── sbr
│   │       ├── static
│   │       ├── tuning
│   │       ├── vlan
│   │       └── vrf
│   └── containerd
│       └── cluster
│           ├── gce
│           │   ├── cloud-init
│           │   │   ├── master.yaml
│           │   │   └── node.yaml
│           │   ├── cni.template
│           │   ├── configure.sh
│           │   └── env
│           └── version
└── usr
    └── local
        ├── bin
        │   ├── containerd
        │   ├── containerd-shim
        │   ├── containerd-shim-runc-v1
        │   ├── containerd-shim-runc-v2
        │   ├── containerd-stress
        │   ├── crictl
        │   ├── critest
        │   ├── ctd-decoder
        │   └── ctr
        └── sbin
            └── runc

# 然后在所有节点上复制到上述文件夹到对应目录中
cd ./cri-containerd-cni/
cp -r etc/ /
cp -r opt/ /
cp -r usr/ /

步骤 03.【所有节点】进行containerd 配置创建并修改 config.toml .

mkdir -vp /etc/containerd

# 默认配置生成
containerd config default >/etc/containerd/config.toml
ls /etc/containerd/config.toml
  # /etc/containerd/config.toml

# pause 镜像源
sed -i "s#k8s.gcr.io/pause#registry.cn-hangzhou.aliyuncs.com/google_containers/pause#g"  /etc/containerd/config.toml

# 使用 SystemdCgroup
sed -i 's#SystemdCgroup = false#SystemdCgroup = true#g' /etc/containerd/config.toml

# docker.io mirror
sed -i '/registry.mirrors]/a\ \ \ \ \ \ \ \ [plugins."io.containerd.grpc.v1.cri".registry.mirrors."docker.io"]' /etc/containerd/config.toml
sed -i '/registry.mirrors."docker.io"]/a\ \ \ \ \ \ \ \ \ \ endpoint = ["https://xlx9erfu.mirror.aliyuncs.com","https://docker.mirrors.ustc.edu.cn"]' /etc/containerd/config.toml

# gcr.io mirror
sed -i '/registry.mirrors]/a\ \ \ \ \ \ \ \ [plugins."io.containerd.grpc.v1.cri".registry.mirrors."gcr.io"]' /etc/containerd/config.toml
sed -i '/registry.mirrors."gcr.io"]/a\ \ \ \ \ \ \ \ \ \ endpoint = ["https://gcr.mirrors.ustc.edu.cn"]' /etc/containerd/config.toml

# k8s.gcr.io mirror
sed -i '/registry.mirrors]/a\ \ \ \ \ \ \ \ [plugins."io.containerd.grpc.v1.cri".registry.mirrors."k8s.gcr.io"]' /etc/containerd/config.toml
sed -i '/registry.mirrors."k8s.gcr.io"]/a\ \ \ \ \ \ \ \ \ \ endpoint = ["https://gcr.mirrors.ustc.edu.cn/google-containers/","https://registry.cn-hangzhou.aliyuncs.com/google_containers/"]' /etc/containerd/config.toml

# quay.io mirror
sed -i '/registry.mirrors]/a\ \ \ \ \ \ \ \ [plugins."io.containerd.grpc.v1.cri".registry.mirrors."quay.io"]' /etc/containerd/config.toml
sed -i '/registry.mirrors."quay.io"]/a\ \ \ \ \ \ \ \ \ \ endpoint = ["https://quay.mirrors.ustc.edu.cn"]' /etc/containerd/config.toml

步骤 04.客户端工具 runtime 与 镜像 端点配置

# 手动设置临时生效
# crictl config runtime-endpoint /run/containerd/containerd.sock
# /run/containerd/containerd.sock 

# 配置文件设置永久生效
cat <<EOF > /etc/crictl.yaml
runtime-endpoint: unix:///run/containerd/containerd.sock
image-endpoint: unix:///run/containerd/containerd.sock
timeout: 10
debug: false
EOF

步骤 05.重载 systemd自启和启动containerd.io服务。

systemctl daemon-reload && systemctl enable --now containerd.service
systemctl status containerd.service
ctr version
  # Client:
  #   Version:  1.5.11
  #   Revision: 3df54a852345ae127d1fa3092b95168e4a88e2f8
  #   Go version: go1.17.8

  # Server:
  #   Version:  1.5.11
  #   Revision: 3df54a852345ae127d1fa3092b95168e4a88e2f8
  #   UUID: 71a28bbb-6ed6-408d-a873-e394d48b35d8

步骤 06.用于根据OCI规范生成和运行容器的CLI工具 runc 版本查看

runc -v
  # runc version 1.1.1
  # commit: v1.1.1-0-g52de29d7
  # spec: 1.0.2-dev
  # go: go1.17.9
  # libseccomp: 2.5.1

温馨提示: 当默认 runc 执行提示 runc: symbol lookup error: runc: undefined symbol: seccomp_notify_respond 时,由于上述软件包中包含的runc对系统依赖过多,所以建议单独下载安装 runc 二进制项目(https://github.com/opencontainers/runc/)

wget https://github.com/opencontainers/runc/releases/download/v1.1.1/runc.amd64

# 执行权限赋予
chmod +x runc.amd64

# 替换掉 /usr/local/sbin/ 路径原软件包中的 runc
mv runc.amd64 /usr/local/sbin/runc

6.Kubernetes 集群安装部署

1) 二进制软件包下载安装 (手动-此处以该实践为例)

步骤 01.【master-225】手动从k8s.io下载Kubernetes软件包并解压安装, 当前最新版本为 v1.23.6 。

# 如果下载较慢请使用某雷
wget -L
https://dl.k8s.io/v1.23.6/kubernetes-server-linux-amd64.tar.gz

# 解压
/app/k8s-init-work/tools# tar -zxf kubernetes-server-linux-amd64.tar.gz

# 安装二进制文件到 /usr/local/bin
/app/k8s-init-work/tools/kubernetes/server/bin# ls
  # apiextensions-apiserver  kube-apiserver.docker_tag           kube-controller-manager.tar  kube-log-runner        kube-scheduler
  # kubeadm                  kube-apiserver.tar                  kubectl                      kube-proxy             kube-scheduler.docker_tag
  # kube-aggregator          kube-controller-manager             kubectl-convert              kube-proxy.docker_tag  kube-scheduler.tar
  # kube-apiserver           kube-controller-manager.docker_tag  kubelet                      kube-proxy.tar         mounter

cp kube-apiserver kube-controller-manager kube-scheduler kube-proxy kubelet kubeadm  kubectl /usr/local/bin

# 验证安装
kubectl version
  # Client Version: version.Info{Major:"1", Minor:"23", GitVersion:"v1.23.6", GitCommit:"ad3338546da947756e8a88aa6822e9c11e7eac22", GitTreeState:"clean", BuildDate:"2022-04-14T08:49:13Z", GoVersion:"go1.17.9", Compiler:"gc", Platform:"linux/amd64"}

步骤 02.【master-225】利用scp将kubernetes相关软件包分发到其它机器中。

scp -P 20211 kube-apiserver kube-controller-manager kube-scheduler kube-proxy kubelet kubeadm kubectl weiyigeek@master-223:/tmp
scp -P 20211 kube-apiserver kube-controller-manager kube-scheduler kube-proxy kubelet kubeadm kubectl weiyigeek@master-224:/tmp
scp -P 20211 kube-apiserver kube-controller-manager kube-scheduler kube-proxy kubelet kubeadm kubectl weiyigeek@node-1:/tmp
scp -P 20211 kube-apiserver kube-controller-manager kube-scheduler kube-proxy kubelet kubeadm kubectl weiyigeek@node-2:/tmp

# 复制如下到指定bin目录中(注意node节点只需要  kube-proxy kubelet 、kubeadm 在此处可选)
cp kube-apiserver kube-controller-manager kube-scheduler kube-proxy kubelet kubeadm kubectl /usr/local/bin/

步骤 03.【所有节点】在集群所有节点上创建准备如下目录

mkdir -vp /etc/kubernetes/{manifests,pki,ssl,cfg} /var/log/kubernetes /var/lib/kubelet

2) 部署配置 kube-apiserver

描述: 它是集群所有服务请求访问的入口点, 通过 API 接口操作集群中的资源。

步骤 01.【master-225】创建apiserver证书请求文件并使用上一章生成的CA签发证书。

# 创建证书申请文件,注意下述文件hosts字段中IP为所有Master/LB/VIP IP,为了方便后期扩容可以多写几个预留的IP。
# 同时还需要填写 service 网络的首个IP(一般是 kube-apiserver 指定的 service-cluster-ip-range 网段的第一个IP,如 10.96.0.1)。
tee apiserver-csr.json <<'EOF'
{
  "CN": "kubernetes",
  "hosts": [
    "127.0.0.1",
    "10.10.107.223",
    "10.10.107.224",
    "10.10.107.225",
    "10.10.107.222",
    "10.96.0.1",
    "weiyigeek.cluster.k8s",
    "master-223",
    "master-224",
    "master-225",
    "kubernetes",
    "kubernetes.default.svc",
    "kubernetes.default.svc.cluster",
    "kubernetes.default.svc.cluster.local"
  ],
  "key": {
    "algo": "rsa",
    "size": 2048
  },
  "names": [
    {
      "C": "CN",
      "L": "ChongQing",
      "ST": "ChongQing",
      "O": "k8s",
      "OU": "System"
    }
  ]
}
EOF

# 使用自签CA签发 kube-apiserver HTTPS证书
cfssl gencert -ca=ca.pem -ca-key=ca-key.pem -config=ca-config.json -profile=kubernetes apiserver-csr.json | cfssljson -bare apiserver

$ ls apiserver*
apiserver.csr  apiserver-csr.json  apiserver-key.pem  apiserver.pem

# 复制到自定义目录
cp *.pem /etc/kubernetes/ssl/

步骤 02.【master-225】创建TLS机制所需TOKEN

cat > /etc/kubernetes/bootstrap-token.csv << EOF
$(head -c 16 /dev/urandom | od -An -t x | tr -d ' '),kubelet-bootstrap,10001,"system:bootstrappers"
EOF

温馨提示: 启用 TLS Bootstrapping 机制Node节点kubelet和kube-proxy要与kube-apiserver进行通信,必须使用CA签发的有效证书才可以,当Node节点很多时,这种客户端证书颁发需要大量工作,同样也会增加集群扩展复杂度。
为了简化流程,Kubernetes引入了TLS bootstraping机制来自动颁发客户端证书,kubelet会以一个低权限用户自动向apiserver申请证书,kubelet的证书由apiserver动态签署。所以强烈建议在Node上使用这种方式,目前主要用于kubelet,kube-proxy还是由我们统一颁发一个证书。

步骤 03.【master-225】创建 kube-apiserver 配置文件

cat > /etc/kubernetes/cfg/kube-apiserver.conf <<'EOF'
KUBE_APISERVER_OPTS="--apiserver-count=3 \
--advertise-address=10.10.107.225 \
--allow-privileged=true \
--authorization-mode=RBAC,Node \
--bind-address=0.0.0.0 \
--enable-aggregator-routing=true \
--enable-admission-plugins=NamespaceLifecycle,NodeRestriction,LimitRanger,ServiceAccount,DefaultStorageClass,ResourceQuota \
--enable-bootstrap-token-auth=true \
--token-auth-file=/etc/kubernetes/bootstrap-token.csv \
--secure-port=6443 \
--service-node-port-range=30000-32767 \
--service-cluster-ip-range=10.96.0.0/16 \
--client-ca-file=/etc/kubernetes/ssl/ca.pem \
--tls-cert-file=/etc/kubernetes/ssl/apiserver.pem  \
--tls-private-key-file=/etc/kubernetes/ssl/apiserver-key.pem \
--kubelet-client-certificate=/etc/kubernetes/ssl/apiserver.pem \
--kubelet-client-key=/etc/kubernetes/ssl/apiserver-key.pem \
--kubelet-preferred-address-types=InternalIP,ExternalIP,Hostname \
--etcd-cafile=/etc/kubernetes/ssl/ca.pem \
--etcd-certfile=/etc/kubernetes/ssl/etcd.pem \
--etcd-keyfile=/etc/kubernetes/ssl/etcd-key.pem \
--etcd-servers=https://10.10.107.225:2379,https://10.10.107.224:2379,https://10.10.107.223:2379 \
--service-account-issuer=https://kubernetes.default.svc.cluster.local \
--service-account-key-file=/etc/kubernetes/ssl/ca-key.pem \
--service-account-signing-key-file=/etc/kubernetes/ssl/ca-key.pem \
--proxy-client-cert-file=/etc/kubernetes/ssl/apiserver.pem \
--proxy-client-key-file=/etc/kubernetes/ssl/apiserver-key.pem \
--requestheader-allowed-names=kubernetes \
--requestheader-extra-headers-prefix=X-Remote-Extra- \
--requestheader-group-headers=X-Remote-Group \
--requestheader-username-headers=X-Remote-User \
--requestheader-client-ca-file=/etc/kubernetes/ssl/ca.pem \
--v=2 \
--event-ttl=1h \
--feature-gates=TTLAfterFinished=true \
--logtostderr=false \
--log-dir=/var/log/kubernetes"
EOF

# 审计日志可选
# --audit-log-maxage=30
# --audit-log-maxbackup=3
# --audit-log-maxsize=100
# --audit-log-path=/var/log/kubernetes/kube-apiserver.log"

# –logtostderr:启用日志
# —v:日志等级
# –log-dir:日志目录
# –etcd-servers:etcd集群地址
# –bind-address:监听地址
# –secure-port:https安全端口
# –advertise-address:集群通告地址
# –allow-privileged:启用授权
# –service-cluster-ip-range:Service虚拟IP地址段
# –enable-admission-plugins:准入控制模块
# –authorization-mode:认证授权,启用RBAC授权和节点自管理
# –enable-bootstrap-token-auth:启用TLS bootstrap机制
# –token-auth-file:bootstrap token文件
# –service-node-port-range:Service nodeport类型默认分配端口范围
# –kubelet-client-xxx:apiserver访问kubelet客户端证书
# –tls-xxx-file:apiserver https证书
# –etcd-xxxfile:连接Etcd集群证书
# –audit-log-xxx:审计日志

# 温馨提示: 在 1.23.* 版本之后请勿使用如下参数。
Flag --enable-swagger-ui has been deprecated,
Flag --insecure-port has been deprecated,
Flag --alsologtostderr has been deprecated,
Flag --logtostderr has been deprecated, will be removed in a future release,
Flag --log-dir has been deprecated, will be removed in a future release,
Flag -- TTLAfterFinished=true. It will be removed in a future release. (还可使用)

步骤 04.【master-225】创建kube-apiserver服务管理配置文件

cat > /lib/systemd/system/kube-apiserver.service << "EOF"
[Unit]
Description=Kubernetes API Server
Documentation=https://github.com/kubernetes/kubernetes
After=etcd.service
Wants=etcd.service

[Service]
EnvironmentFile=-/etc/kubernetes/cfg/kube-apiserver.conf
ExecStart=/usr/local/bin/kube-apiserver $KUBE_APISERVER_OPTS
Restart=on-failure
RestartSec=5
Type=notify
LimitNOFILE=65536
LimitNPROC=65535

[Install]
WantedBy=multi-user.target
EOF

步骤 05.将上述创建生成的文件目录同步到集群的其它master节点上

# master - 225  /etc/kubernetes/ 目录结构
tree /etc/kubernetes/
/etc/kubernetes/
├── bootstrap-token.csv
├── cfg
│   └── kube-apiserver.conf
├── manifests
├── pki
└── ssl
    ├── apiserver-key.pem
    ├── apiserver.pem
    ├── ca-key.pem
    ├── ca.pem
    ├── etcd-key.pem
    └── etcd.pem

# 证书及kube-apiserver.conf配置文件
scp -P 20211 -R /etc/kubernetes/ weiyigeek@master-223:/tmp
scp -P 20211 -R /etc/kubernetes/ weiyigeek@master-224:/tmp

# kube-apiserver 服务管理配置文件
scp -P 20211 /lib/systemd/system/kube-apiserver.service weiyigeek@master-223:/tmp
scp -P 20211 /lib/systemd/system/kube-apiserver.service weiyigeek@master-224:/tmp

# 【master-223】 【master-224】 执行如下命令将上传到/tmp相关文件放入指定目录中
cd /tmp/ && cp -r /tmp/kubernetes/ /etc/
mv kube-apiserver.service /lib/systemd/system/kube-apiserver.service
mv kube-apiserver kubectl  kube-proxy  kube-scheduler kubeadm  kube-controller-manager  kubelet /usr/local/bin

步骤 06.分别修改 /etc/kubernetes/cfg/kube-apiserver.conf 文件中 --advertise-address=10.10.107.225

# 【master-223】
sed -i 's#--advertise-address=10.10.107.225#--advertise-address=10.10.107.223#g' /etc/kubernetes/cfg/kube-apiserver.conf
# 【master-224】
sed -i 's#--advertise-address=10.10.107.225#--advertise-address=10.10.107.224#g' /etc/kubernetes/cfg/kube-apiserver.conf

步骤 07.【master节点】完成上述操作后分别在三台master节点上启动apiserver服务。

# 重载systemd与自启设置
systemctl daemon-reload 
systemctl enable --now kube-apiserver
systemctl status kube-apiserver

# 测试api-server
curl --insecure https://10.10.107.222:16443/
curl --insecure https://10.10.107.223:6443/
curl --insecure https://10.10.107.224:6443/
curl --insecure https://10.10.107.225:6443/

# 测试结果
{
  "kind": "Status",
  "apiVersion": "v1",
  "metadata": {},
  "status": "Failure",
  "message": "forbidden: User \"system:anonymous\" cannot get path \"/\"",
  "reason": "Forbidden",
  "details": {},
  "code": 403
}

3) 部署配置 kubectl

描述: 它是集群管理客户端工具,与 API-Server 服务请求交互, 实现资源的查看与管理。

步骤 01.【master-225】创建kubectl证书请求文件CSR并生成证书

tee admin-csr.json <<'EOF'
{
  "CN": "admin",
  "hosts": [],
  "key": {
    "algo": "rsa",
    "size": 2048
  },
  "names": [
    {
      "C": "CN",
      "L": "ChongQing",
      "ST": "ChongQing",
      "O": "system:masters",
      "OU": "System"
    }
  ]
}
EOF

# 证书文件生成
cfssl gencert -ca=ca.pem -ca-key=ca-key.pem -config=ca-config.json -profile=kubernetes admin-csr.json | cfssljson -bare admin

# 复制证书相关文件到指定目录
ls admin*
  # admin.csr  admin-csr.json  admin-key.pem  admin.pem
cp admin*.pem /etc/kubernetes/ssl/

步骤 02.【master-225】生成kubeconfig配置文件 admin.conf 为 kubectl 的配置文件,包含访问 apiserver 的所有信息,如 apiserver 地址、CA 证书和自身使用的证书

cd /etc/kubernetes 

# 配置集群信息
# 此处也可以采用域名的形式 (https://weiyigeek.cluster.k8s:16443 )
kubectl config set-cluster kubernetes --certificate-authority=/etc/kubernetes/ssl/ca.pem --embed-certs=true --server=https://10.10.107.222:16443 --kubeconfig=admin.conf
  # Cluster "kubernetes" set.

# 配置集群认证用户
kubectl config set-credentials admin --client-certificate=/etc/kubernetes/ssl/admin.pem --client-key=/etc/kubernetes/ssl/admin-key.pem --embed-certs=true --kubeconfig=admin.conf
  # User "admin" set.

# 配置上下文
kubectl config set-context kubernetes --cluster=kubernetes --user=admin --kubeconfig=admin.conf
  # Context "kubernetes" created.

# 使用上下文
kubectl config use-context kubernetes --kubeconfig=admin.conf
  # Context "kubernetes" created.

步骤 03.【master-225】准备kubectl配置文件并进行角色绑定。

mkdir /root/.kube && cp /etc/kubernetes/admin.conf ~/.kube/config
kubectl create clusterrolebinding kube-apiserver:kubelet-apis --clusterrole=system:kubelet-api-admin --user kubernetes --kubeconfig=/root/.kube/config
  # clusterrolebinding.rbac.authorization.k8s.io/kube-apiserver:kubelet-apis created

# 该 config 的内容
$ cat /root/.kube/config
apiVersion: v1
clusters:
- cluster:
    certificate-authority-data: base64(CA证书)
    server: https://10.10.107.222:16443
  name: kubernetes
contexts:
- context:
    cluster: kubernetes
    user: admin
  name: kubernetes
current-context: kubernetes
kind: Config
preferences: {}
users:
- name: admin
  user:
    client-certificate-data: base64(用户证书)
    client-key-data: base64(用户证书KEY)

步骤 04.【master-225】查看集群状态

export KUBECONFIG=$HOME/.kube/config

# 查看集群信息
kubectl cluster-info
  # Kubernetes control plane is running at https://10.10.107.222:16443
  # To further debug and diagnose cluster problems, use 'kubectl cluster-info dump'.

# 查看集群组件状态
kubectl get componentstatuses
  # NAME                 STATUS      MESSAGE                                                                                        ERROR
  # controller-manager   Unhealthy   Get "https://127.0.0.1:10257/healthz": dial tcp 127.0.0.1:10257: connect: connection refused
  # scheduler            Unhealthy   Get "https://127.0.0.1:10259/healthz": dial tcp 127.0.0.1:10259: connect: connection refused
  # etcd-0               Healthy     {"health":"true","reason":""}
  # etcd-1               Healthy     {"health":"true","reason":""}
  # etcd-2               Healthy     {"health":"true","reason":""}

# 查看命名空间以及所有名称中资源对象
kubectl get all --all-namespaces
  # NAMESPACE   NAME                 TYPE        CLUSTER-IP   EXTERNAL-IP   PORT(S)   AGE
  # default     service/kubernetes   ClusterIP   10.96.0.1    <none>        443/TCP   8h
kubectl get ns
  # NAME              STATUS   AGE
  # default           Active   8h
  # kube-node-lease   Active   8h
  # kube-public       Active   8h
  # kube-system       Active   8h

温馨提示: 由于我们还未进行 controller-manager 与 scheduler 部署所以此时其状态为 Unhealthy。

步骤 05.【master-225】同步kubectl配置文件到集群其它master节点

ssh -p 20211 weiyigeek@master-223 'mkdir ~/.kube/'
ssh -p 20211 weiyigeek@master-224 'mkdir ~/.kube/'

scp -P 20211 $HOME/.kube/config weiyigeek@master-223:~/.kube/
scp -P 20211 $HOME/.kube/config weiyigeek@master-224:~/.kube/

# 【master-223】
weiyigeek@master-223:~$ kubectl get services
  # NAME         TYPE        CLUSTER-IP   EXTERNAL-IP   PORT(S)   AGE
  # kubernetes   ClusterIP   10.96.0.1    <none>        443/TCP   8h

步骤 06.配置 kubectl 命令补全 (建议新手勿用,等待后期熟悉相关命令后使用)

# 安装补全工具
apt install -y bash-completion
source /usr/share/bash-completion/bash_completion

# 方式1
source <(kubectl completion bash)

# 方式2
kubectl completion bash > ~/.kube/completion.bash.inc
source ~/.kube/completion.bash.inc

# 自动
tee $HOME/.bash_profile <<'EOF'
# source <(kubectl completion bash)
source ~/.kube/completion.bash.inc
EOF

至此 kubectl 集群客户端配置 完毕.

4) 部署配置 kube-controller-manager

描述: 它是集群中的控制器组件,其内部包含多个控制器资源, 实现对象的自动化控制中心。

步骤 01.【master-225】创建 kube-controller-manager 证书请求文件CSR并生成证书

tee controller-manager-csr.json <<'EOF'
{
  "CN": "system:kube-controller-manager",
  "hosts": [
    "127.0.0.1",
    "10.10.107.223",
    "10.10.107.224",
    "10.10.107.225"
  ],
  "key": {
    "algo": "rsa",
    "size": 2048
  },
  "names": [
    {
      "C": "CN",
      "L": "ChongQing",
      "ST": "ChongQing",
      "O": "system:kube-controller-manager",
      "OU": "System"
    }
  ]
}
EOF

# 说明:
* hosts 列表包含所有 kube-controller-manager 节点 IP;
* CN 为 system:kube-controller-manager;
* O 为 system:kube-controller-manager,kubernetes 内置的 ClusterRoleBindings system:kube-controller-manager 赋予 kube-controller-manager 工作所需的权限

# 利用 CA 颁发 kube-controller-manager 证书文件
cfssl gencert -ca=ca.pem -ca-key=ca-key.pem -config=ca-config.json -profile=kubernetes controller-manager-csr.json | cfssljson -bare controller-manager

$ ls controller*
controller-manager.csr  controller-manager-csr.json  controller-manager-key.pem  controller-manager.pem

# 复制证书
cp controller* /etc/kubernetes/ssl

步骤 02.创建 kube-controller-manager 的 controller-manager.conf 配置文件.

cd /etc/kubernetes/

# 设置集群 也可采用 (https://weiyigeek.cluster.k8s:16443 ) 域名形式。
kubectl config set-cluster kubernetes --certificate-authority=/etc/kubernetes/ssl/ca.pem --embed-certs=true --server=https://10.10.107.222:16443 --kubeconfig=controller-manager.conf

# 设置认证
kubectl config set-credentials system:kube-controller-manager --client-certificate=/etc/kubernetes/ssl/controller-manager.pem --client-key=/etc/kubernetes/ssl/controller-manager-key.pem --embed-certs=true --kubeconfig=controller-manager.conf

# 设置上下文
kubectl config set-context system:kube-controller-manager --cluster=kubernetes --user=system:kube-controller-manager --kubeconfig=controller-manager.conf

# 且换上下文
kubectl config use-context system:kube-controller-manager --kubeconfig=controller-manager.conf

步骤 03.准备创建 kube-controller-manager 配置文件。

cat > /etc/kubernetes/cfg/kube-controller-manager.conf << "EOF"
KUBE_CONTROLLER_MANAGER_OPTS="--allocate-node-cidrs=true \
--bind-address=127.0.0.1 \
--secure-port=10257 \
--authentication-kubeconfig=/etc/kubernetes/controller-manager.conf \
--authorization-kubeconfig=/etc/kubernetes/controller-manager.conf \
--cluster-name=kubernetes \
--client-ca-file=/etc/kubernetes/ssl/ca.pem \
--cluster-signing-cert-file=/etc/kubernetes/ssl/ca.pem \
--cluster-signing-key-file=/etc/kubernetes/ssl/ca-key.pem \
--controllers=*,bootstrapsigner,tokencleaner \
--cluster-cidr=10.128.0.0/16 \
--service-cluster-ip-range=10.96.0.0/16 \
--use-service-account-credentials=true \
--root-ca-file=/etc/kubernetes/ssl/ca.pem \
--service-account-private-key-file=/etc/kubernetes/ssl/ca-key.pem \
--tls-cert-file=/etc/kubernetes/ssl/controller-manager.pem \
--tls-private-key-file=/etc/kubernetes/ssl/controller-manager-key.pem \
--leader-elect=true \
--cluster-signing-duration=87600h \
--v=2 \
--logtostderr=false \
--log-dir=/var/log/kubernetes \
--kubeconfig=/etc/kubernetes/controller-manager.conf
EOF

# 温馨提示:
Flag --logtostderr has been deprecated, will be removed in a future release,
Flag --log-dir has been deprecated, will be removed in a future release
--requestheader-client-ca-file=/etc/kubernetes/pki/front-proxy-ca.crt

步骤 04.创建 kube-controller-manager 服务启动文件

cat > /lib/systemd/system/kube-controller-manager.service << "EOF"
[Unit]
Description=Kubernetes Controller Manager
Documentation=https://github.com/kubernetes/kubernetes

[Service]
EnvironmentFile=-/etc/kubernetes/cfg/kube-controller-manager.conf
ExecStart=/usr/local/bin/kube-controller-manager $KUBE_CONTROLLER_MANAGER_OPTS
Restart=on-failure
RestartSec=5

[Install]
WantedBy=multi-user.target
EOF

步骤 05.【master-225】同样分发上述文件到其它master节点中。

# controller-manager 证书 及 kube-controller-manager.conf 配置文件、kube-controller-manager.service 服务管理配置文件
scp -P 20211 /etc/kubernetes/ssl/controller-manager.pem /etc/kubernetes/ssl/controller-manager-key.pem /etc/kubernetes/controller-manager.conf /etc/kubernetes/cfg/kube-controller-manager.conf /lib/systemd/system/kube-controller-manager.service weiyigeek@master-223:/tmp
scp -P 20211 /etc/kubernetes/ssl/controller-manager.pem /etc/kubernetes/ssl/controller-manager-key.pem /etc/kubernetes/controller-manager.conf /etc/kubernetes/cfg/kube-controller-manager.conf /lib/systemd/system/kube-controller-manager.service weiyigeek@master-224:/tmp

# 【master-223】 【master-224】 执行如下命令将上传到/tmp相关文件放入指定目录中
mv controller-manager*.pem /etc/kubernetes/ssl/
mv controller-manager.conf /etc/kubernetes/controller-manager.conf
mv kube-controller-manager.conf /etc/kubernetes/cfg/kube-controller-manager.conf
mv kube-controller-manager.service /lib/systemd/system/kube-controller-manager.service

步骤 06.重载 systemd 以及自动启用 kube-controller-manager 服务。

systemctl daemon-reload 
systemctl enable --now kube-controller-manager
systemctl status kube-controller-manager

步骤 07.启动运行 kube-controller-manager 服务后查看集群组件状态, 发现原本 controller-manager 是 Unhealthy 的状态已经变成了 Healthy 状态。

kubectl get componentstatu
  # NAME                 STATUS      MESSAGE                                                                                        ERROR
  # scheduler            Unhealthy   Get "https://127.0.0.1:10259/healthz": dial tcp 127.0.0.1:10259: connect: connection refused
  # controller-manager   Healthy     ok
  # etcd-2               Healthy     {"health":"true","reason":""}
  # etcd-0               Healthy     {"health":"true","reason":""}
  # etcd-1               Healthy     {"health":"true","reason":""}

至此 kube-controller-manager 服务的安装、配置完毕!

5) 部署配置 kube-scheduler

描述: 在集群中kube-scheduler调度器组件, 负责任务调度选择合适的节点进行分配任务.

步骤 01.【master-225】创建 kube-scheduler 证书请求文件CSR并生成证书

tee scheduler-csr.json <<'EOF'
{
  "CN": "system:kube-scheduler",
  "hosts": [
    "127.0.0.1",
    "10.10.107.223",
    "10.10.107.224",
    "10.10.107.225"
  ],
  "key": {
    "algo": "rsa",
    "size": 2048
  },
  "names": [
    {
      "C": "CN",
      "L": "ChongQing",
      "ST": "ChongQing",
      "O": "system:kube-scheduler",
      "OU": "System"
    }
  ]
}
EOF

# 利用 CA 颁发 kube-scheduler 证书文件
cfssl gencert -ca=ca.pem -ca-key=ca-key.pem -config=ca-config.json -profile=kubernetes scheduler-csr.json | cfssljson -bare scheduler

$ ls scheduler*
scheduler-csr.json scheduler.csr  scheduler-key.pem  scheduler.pem

# 复制证书
cp scheduler*.pem /etc/kubernetes/ssl

步骤 02.完成后我们需要创建 kube-scheduler 的 kubeconfig 配置文件。

cd /etc/kubernetes/

# 设置集群 也可采用 (https://weiyigeek.cluster.k8s:16443 ) 域名形式。
kubectl config set-cluster kubernetes --certificate-authority=/etc/kubernetes/ssl/ca.pem --embed-certs=true --server=https://10.10.107.222:16443 --kubeconfig=scheduler.conf

# 设置认证
kubectl config set-credentials system:kube-scheduler --client-certificate=/etc/kubernetes/ssl/scheduler.pem --client-key=/etc/kubernetes/ssl/scheduler-key.pem --embed-certs=true --kubeconfig=scheduler.conf

# 设置上下文
kubectl config set-context system:kube-scheduler --cluster=kubernetes --user=system:kube-scheduler --kubeconfig=scheduler.conf

# 且换上下文
kubectl config use-context system:kube-scheduler --kubeconfig=scheduler.conf

步骤 03.创建 kube-scheduler 服务配置文件

cat > /etc/kubernetes/cfg/kube-scheduler.conf << "EOF"
KUBE_SCHEDULER_OPTS="--address=127.0.0.1 \
--secure-port=10259 \
--kubeconfig=/etc/kubernetes/scheduler.conf \
--authentication-kubeconfig=/etc/kubernetes/scheduler.conf \
--authorization-kubeconfig=/etc/kubernetes/scheduler.conf \
--client-ca-file=/etc/kubernetes/ssl/ca.pem \
--tls-cert-file=/etc/kubernetes/ssl/scheduler.pem \
--tls-private-key-file=/etc/kubernetes/ssl/scheduler-key.pem \
--leader-elect=true \
--alsologtostderr=true \
--logtostderr=false \
--log-dir=/var/log/kubernetes \
--v=2"
EOF

步骤 04.创建 kube-scheduler 服务启动配置文件

cat > /lib/systemd/system/kube-scheduler.service << "EOF"
[Unit]
Description=Kubernetes Scheduler
Documentation=https://github.com/kubernetes/kubernetes

[Service]
EnvironmentFile=-/etc/kubernetes/cfg/kube-scheduler.conf
ExecStart=/usr/local/bin/kube-scheduler $KUBE_SCHEDULER_OPTS
Restart=on-failure
RestartSec=5

[Install]
WantedBy=multi-user.target
EOF

步骤 05.【master-225】同样分发上述文件到其它master节点中。

# scheduler 证书 及 kube-scheduler.conf 配置文件、kube-scheduler.service 服务管理配置文件
scp -P 20211 /etc/kubernetes/ssl/scheduler.pem /etc/kubernetes/ssl/scheduler-key.pem /etc/kubernetes/scheduler.conf /etc/kubernetes/cfg/kube-scheduler.conf /lib/systemd/system/kube-scheduler.service weiyigeek@master-223:/tmp
scp -P 20211 /etc/kubernetes/ssl/scheduler.pem /etc/kubernetes/ssl/scheduler-key.pem /etc/kubernetes/scheduler.conf /etc/kubernetes/cfg/kube-scheduler.conf /lib/systemd/system/kube-scheduler.service weiyigeek@master-224:/tmp

# 【master-223】 【master-224】 执行如下命令将上传到/tmp相关文件放入指定目录中
mv scheduler*.pem /etc/kubernetes/ssl/
mv scheduler.conf /etc/kubernetes/scheduler.conf
mv kube-scheduler.conf /etc/kubernetes/cfg/kube-scheduler.conf
mv kube-scheduler.service /lib/systemd/system/kube-scheduler.service

步骤 06.【所有master节点】重载 systemd 以及自动启用 kube-scheduler 服务。

systemctl daemon-reload 
systemctl enable --now kube-scheduler
systemctl status kube-scheduler

步骤 07.【所有master节点】验证所有master节点各个组件状态, 正常状态下如下, 如有错误请排查通过后在进行后面操作。

kubectl get componentstatuses
  # Warning: v1 ComponentStatus is deprecated in v1.19+
  # NAME                 STATUS    MESSAGE                         ERROR
  # controller-manager   Healthy   ok
  # scheduler            Healthy   ok
  # etcd-0               Healthy   {"health":"true","reason":""}
  # etcd-1               Healthy   {"health":"true","reason":""}
  # etcd-2               Healthy   {"health":"true","reason":""}

6) 部署配置 kubelet

步骤 01.【master-225】读取BOOTSTRAP_TOKE 并 创建 kubelet 的 kubeconfig 配置文件 kubelet.conf。

cd /etc/kubernetes/

# 读取 bootstrap-token 值
BOOTSTRAP_TOKEN=$(awk -F "," '{print $1}' /etc/kubernetes/bootstrap-token.csv)

# BOOTSTRAP_TOKEN="123456.httpweiyigeektop"

# 设置集群 也可采用 (https://weiyigeek.cluster.k8s:16443 ) 域名形式。
kubectl config set-cluster kubernetes --certificate-authority=/etc/kubernetes/ssl/ca.pem --embed-certs=true --server=https://weiyigeek.cluster.k8s:16443 --kubeconfig=/etc/kubernetes/kubelet-bootstrap.kubeconfig

# 设置认证
kubectl config set-credentials kubelet-bootstrap --token=${BOOTSTRAP_TOKEN} --kubeconfig=/etc/kubernetes/kubelet-bootstrap.kubeconfig

# 设置上下文
kubectl config set-context default --cluster=kubernetes --user=kubelet-bootstrap --kubeconfig=/etc/kubernetes/kubelet-bootstrap.kubeconfig

# 且换上下文
kubectl config use-context default  --kubeconfig=/etc/kubernetes/kubelet-bootstrap.kubeconfig

步骤 02.完成后我们需要进行集群角色绑定。

# 角色授权
kubectl create clusterrolebinding cluster-system-anonymous --clusterrole=cluster-admin --user=kubelet-bootstrap
kubectl create clusterrolebinding kubelet-bootstrap --clusterrole=system:node-bootstrapper --user=kubelet-bootstrap  --kubeconfig=/etc/kubernetes/kubelet-bootstrap.kubeconfig

# 授权 kubelet 创建 CSR 
kubectl create clusterrolebinding create-csrs-for-bootstrapping --clusterrole=system:node-bootstrapper --group=system:bootstrappers

# 对 CSR 进行批复
# 允许 kubelet 请求并接收新的证书
kubectl create clusterrolebinding auto-approve-csrs-for-group --clusterrole=system:certificates.k8s.io:certificatesigningrequests:nodeclient --group=system:bootstrappers

# 允许 kubelet 对其客户端证书执行续期操作
kubectl create clusterrolebinding auto-approve-renewals-for-nodes --clusterrole=system:certificates.k8s.io:certificatesigningrequests:selfnodeclient --group=system:bootstrappers

步骤 03.创建 kubelet 配置文件,配置参考地址(https://kubernetes.io/docs/reference/config-api/kubelet-config.v1beta1/)

# yaml 格式
# 此处为了通用性推荐使用如下格式
cat > /etc/kubernetes/cfg/kubelet-config.yaml <<'EOF'
apiVersion: kubelet.config.k8s.io/v1beta1
kind: KubeletConfiguration
address: 0.0.0.0
port: 10250
readOnlyPort: 10255
authentication:
  anonymous:
    enabled: false
  webhook:
    cacheTTL: 2m0s
    enabled: true
  x509:
    clientCAFile: /etc/kubernetes/ssl/ca.pem
authorization:
  mode: AlwaysAllow
  webhook:
    cacheAuthorizedTTL: 5m0s
    cacheUnauthorizedTTL: 30s
cgroupDriver: systemd
cgroupsPerQOS: true
clusterDNS:
- 10.96.0.10
clusterDomain: cluster.local
containerLogMaxFiles: 5
containerLogMaxSize: 10Mi
contentType: application/vnd.kubernetes.protobuf
cpuCFSQuota: true
cpuManagerPolicy: none
cpuManagerReconcilePeriod: 10s
enableControllerAttachDetach: true
enableDebuggingHandlers: true
enforceNodeAllocatable:
- pods
eventBurst: 10
eventRecordQPS: 5
evictionHard:
  imagefs.available: 15%
  memory.available: 100Mi
  nodefs.available: 10%
  nodefs.inodesFree: 5%
evictionPressureTransitionPeriod: 5m0s
failSwapOn: true
fileCheckFrequency: 20s
hairpinMode: promiscuous-bridge
healthzBindAddress: 127.0.0.1
healthzPort: 10248
httpCheckFrequency: 20s
imageGCHighThresholdPercent: 85
imageGCLowThresholdPercent: 80
imageMinimumGCAge: 2m0s
iptablesDropBit: 15
iptablesMasqueradeBit: 14
kubeAPIBurst: 10
kubeAPIQPS: 5
makeIPTablesUtilChains: true
maxOpenFiles: 1000000
maxPods: 110
nodeStatusUpdateFrequency: 10s
oomScoreAdj: -999
podPidsLimit: -1
registryBurst: 10
registryPullQPS: 5
resolvConf: /etc/resolv.conf
rotateCertificates: true
runtimeRequestTimeout: 2m0s
serializeImagePulls: true
staticPodPath: /etc/kubernetes/manifests
streamingConnectionIdleTimeout: 4h0m0s
syncFrequency: 1m0s
volumeStatsAggPeriod: 1m0s
EOF
# [master-225] 主机执行则替换为其主机地址, 其它节点也是对应的IP地址, 或者直接 0.0.0.0 全地址监听这样也就不用修改了。
# sed -i 's#__IP__#10.10.107.225#g' /etc/kubernetes/cfg/kubelet-config.yaml

# json 格式 (不推荐了解就好)
cat > /etc/kubernetes/cfg/kubelet.json << "EOF"
{
  "kind": "KubeletConfiguration",
  "apiVersion": "kubelet.config.k8s.io/v1beta1",
  "authentication": {
    "x509": {
      "clientCAFile": "/etc/kubernetes/ssl/ca.pem"
    },
    "webhook": {
      "enabled": true,
      "cacheTTL": "2m0s"
    },
    "anonymous": {
      "enabled": false
    }
  },
  "authorization": {
    "mode": "Webhook",
    "webhook": {
      "cacheAuthorizedTTL": "5m0s",
      "cacheUnauthorizedTTL": "30s"
    }
  },
  "address": "__IP__",
  "port": 10250,
  "readOnlyPort": 10255,
  "cgroupDriver": "systemd",                    
  "hairpinMode": "promiscuous-bridge",
  "serializeImagePulls": false,
  "clusterDomain": "cluster.local.",
  "clusterDNS": ["10.96.0.10"]
}
EOF

温馨提示: 上述 kubelet.json 中 address 需要修改为当前主机IP地址, 例如在master-225主机中应该更改为 10.10.107.225 , 此处我设置为了 0.0.0.0 表示监听所有网卡,若有指定网卡的需求请指定其IP地址。

步骤 04.【master-225】创建 kubelet 服务启动管理文件

# 工作目录创建
mkdir -vp /var/lib/kubelet

# 创建 kubelet.service 服务
cat > /lib/systemd/system/kubelet.service << "EOF"
[Unit]
Description=kubelet: The Kubernetes Node Agent
Documentation=https://kubernetes.io/docs/home/
Wants=network-online.target
After=network-online.target

[Service]
WorkingDirectory=/var/lib/kubelet
ExecStart=/usr/local/bin/kubelet \
--bootstrap-kubeconfig=/etc/kubernetes/kubelet-bootstrap.kubeconfig \
--config=/etc/kubernetes/cfg/kubelet-config.yaml \
--cert-dir=/etc/kubernetes/ssl \
--kubeconfig=/etc/kubernetes/kubelet.conf \
--network-plugin=cni \
--root-dir=/etc/cni/net.d \
--cni-bin-dir=/opt/cni/bin \
--cni-conf-dir=/etc/cni/net.d \
--container-runtime=remote \
--container-runtime-endpoint=unix:///run/containerd/containerd.sock \
--rotate-certificates \
--pod-infra-container-image=registry.cn-hangzhou.aliyuncs.com/google_containers/pause:3.6 \
--image-pull-progress-deadline=15m \
--alsologtostderr=true \
--logtostderr=false \
--log-dir=/var/log/kubernetes \
--v=2 \
--node-labels=node.kubernetes.io/node=''
StartLimitInterval=0
RestartSec=10
LimitNOFILE=65536

[Install]
WantedBy=multi-user.target
EOF

步骤 05.将master-255节点上的kubelet相关文件复制到所有节点之中。

# 复制配置文件到其它节点 /tmp 目录
scp -P 20211 /etc/kubernetes/kubelet-bootstrap.kubeconfig /etc/kubernetes/cfg/kubelet-config.yaml /lib/systemd/system/kubelet.service weiyigeek@master-223:/tmp
scp -P 20211 /etc/kubernetes/kubelet-bootstrap.kubeconfig /etc/kubernetes/cfg/kubelet-config.yaml /lib/systemd/system/kubelet.service weiyigeek@master-224:/tmp
scp -P 20211 /etc/kubernetes/kubelet-bootstrap.kubeconfig /etc/kubernetes/cfg/kubelet-config.yaml /lib/systemd/system/kubelet.service weiyigeek@node-1:/tmp
scp -P 20211 /etc/kubernetes/kubelet-bootstrap.kubeconfig /etc/kubernetes/cfg/kubelet-config.yaml /lib/systemd/system/kubelet.service weiyigeek@node-2:/tmp

# 分别将 /tmp 目录中的kubelet配置文件复制到对应目录
sudo mkdir -vp /var/lib/kubelet /etc/kubernetes/cfg/
cp /tmp/kubelet-bootstrap.kubeconfig /etc/kubernetes/kubelet-bootstrap.kubeconfig
cp /tmp/kubelet-config.yaml /etc/kubernetes/cfg/kubelet-config.yaml
cp /tmp/kubelet.service /lib/systemd/system/kubelet.service

步骤 06.【所有节点】重载 systemd 以及自动启用 kube-scheduler 服务。

systemctl daemon-reload
systemctl enable --now kubelet.service
systemctl status -l kubelet.service
# systemctl restart kubelet.service

步骤 07.利用kubectl工具查看集群中所有节点信息。

$ kubectl get node -o wide
NAME         STATUS     ROLES    AGE     VERSION   INTERNAL-IP     EXTERNAL-IP   OS-IMAGE             KERNEL-VERSION      CONTAINER-RUNTIME
master-223   NotReady   <none>   17h     v1.23.6   10.10.107.223   <none>        Ubuntu 20.04.3 LTS   5.4.0-92-generic    containerd://1.6.4
master-224   NotReady   <none>   19s     v1.23.6   10.10.107.224   <none>        Ubuntu 20.04.3 LTS   5.4.0-92-generic    containerd://1.6.4
master-225   NotReady   <none>   3d22h   v1.23.6   10.10.107.225   <none>        Ubuntu 20.04.3 LTS   5.4.0-109-generic   containerd://1.6.4
node-1       NotReady   <none>   17h     v1.23.6   10.10.107.226   <none>        Ubuntu 20.04.2 LTS   5.4.0-96-generic    containerd://1.6.4
node-2       NotReady   <none>   17h     v1.23.6   10.10.107.227   <none>        Ubuntu 20.04.2 LTS   5.4.0-96-generic    containerd://1.6.4

温馨提示: 由上述结果可知各个节点 STATUS 处于 NotReady , 这是由于节点之间的POD无法通信还缺少网络插件,当我们安装好 kube-proxy 与 calico 就可以变成 Ready 状态了。

步骤 08.确认kubelet服务启动成功后,我们可以执行如下命令查看kubelet-bootstrap申请颁发的证书, 如果CONDITION不为Approved,Issued则需要进行排查是否有错误。

$ kubectl get csr
NAME        AGE     SIGNERNAME                                    REQUESTOR           CONDITION
csr-b949p   7m55s   kubernetes.io/kube-apiserver-client-kubelet   kubelet-bootstrap   Approved,Issued
csr-c9hs4   3m34s   kubernetes.io/kube-apiserver-client-kubelet   kubelet-bootstrap   Approved,Issued
csr-r8vhp   5m50s   kubernetes.io/kube-apiserver-client-kubelet   kubelet-bootstrap   Approved,Issued
csr-zb4sr   3m40s   kubernetes.io/kube-apiserver-client-kubelet   kubelet-bootstrap   Approved,Issued

7) 部署配置 kube-proxy

描述: 在集群中kube-proxy组件, 其负责节点上的网络规则使得您可以在集群内、集群外正确地与 Pod 进行网络通信, 同时它也是负载均衡中最重要的点,进行流量转发

步骤 01.【master-225】创建 kube-proxy 证书请求文件CSR并生成证书

tee proxy-csr.json <<'EOF'
{
  "CN": "system:kube-proxy",
  "hosts": [],
  "key": {
    "algo": "rsa",
    "size": 2048
  },
  "names": [
    {
      "C": "CN",
      "L": "ChongQing",
      "ST": "ChongQing",
      "O": "system:kube-proxy",
      "OU": "System"
    }
  ]
}
EOF

# 利用 CA 颁发 kube-scheduler 证书文件
cfssl gencert -ca=ca.pem -ca-key=ca-key.pem -config=ca-config.json -profile=kubernetes proxy-csr.json | cfssljson -bare proxy

$ ls proxy*
proxy.csr  proxy-csr.json  proxy-key.pem  proxy.pem

# 复制证书
cp proxy*.pem /etc/kubernetes/ssl

步骤 02.完成后我们需要创建 kube-scheduler 的 kubeconfig 配置文件。

cd /etc/kubernetes/

# 设置集群 也可采用 (https://weiyigeek.cluster.k8s:16443 ) 域名形式。
kubectl config set-cluster kubernetes --certificate-authority=/etc/kubernetes/ssl/ca.pem --embed-certs=true --server=https://10.10.107.222:16443 --kubeconfig=kube-proxy.kubeconfig

# 设置认证
kubectl config set-credentials kube-proxy --client-certificate=/etc/kubernetes/ssl/proxy.pem --client-key=/etc/kubernetes/ssl/proxy-key.pem --embed-certs=true --kubeconfig=kube-proxy.kubeconfig

# 设置上下文
kubectl config set-context default --cluster=kubernetes --user=kube-proxy --kubeconfig=kube-proxy.kubeconfig

# 且换上下文
kubectl config use-context default --kubeconfig=kube-proxy.kubeconfig

步骤 03.创建 kube-proxy 服务配置文件

cat > /etc/kubernetes/cfg/kube-proxy.yaml << "EOF"
apiVersion: kubeproxy.config.k8s.io/v1alpha1
kind: KubeProxyConfiguration
bindAddress: 0.0.0.0
healthzBindAddress: 0.0.0.0:10256
metricsBindAddress: 0.0.0.0:10249
hostnameOverride: __HOSTNAME__
clusterCIDR: 10.128.0.0/16
clientConnection:
  kubeconfig: /etc/kubernetes/kube-proxy.kubeconfig
mode: ipvs
ipvs:
  excludeCIDRs:
  - 10.128.0.1/32
EOF

温馨提示: ProxyMode 表示的是 Kubernetes 代理服务器所使用的模式.

  • 目前在 Linux 平台上有三种可用的代理模式:'userspace'(相对较老,即将被淘汰)、 'iptables'(相对较新,速度较快)、'ipvs'(最新,在性能和可扩缩性上表现好)。
  • 目前在 Windows 平台上有两种可用的代理模式:'userspace'(相对较老,但稳定)和 'kernelspace'(相对较新,速度更快)。

步骤 04.创建 kube-proxy.service 服务启动管理文件

mkdir -vp /var/lib/kube-proxy
cat > /lib/systemd/system/kube-proxy.service << "EOF"
[Unit]
Description=Kubernetes Kube-proxy Server
Documentation=https://github.com/kubernetes/kubernetes
After=network.target

[Service]
WorkingDirectory=/var/lib/kube-proxy
ExecStart=/usr/local/bin/kube-proxy \
  --config=/etc/kubernetes/cfg/kube-proxy.yaml \
  --alsologtostderr=true \
  --logtostderr=false \
  --log-dir=/var/log/kubernetes \
  --v=2
Restart=on-failure
RestartSec=5
LimitNOFILE=65536

[Install]
WantedBy=multi-user.target
EOF

步骤 05.同步【master-225】节点中如下文件到其它节点对应目录之上。

scp -P 20211 /etc/kubernetes/kube-proxy.kubeconfig  /etc/kubernetes/ssl/proxy.pem /etc/kubernetes/ssl/proxy-key.pem /etc/kubernetes/cfg/kube-proxy.yaml /lib/systemd/system/kube-proxy.service weiyigeek@master-223:/tmp
scp -P 20211 /etc/kubernetes/kube-proxy.kubeconfig  /etc/kubernetes/ssl/proxy.pem /etc/kubernetes/ssl/proxy-key.pem /etc/kubernetes/cfg/kube-proxy.yaml /lib/systemd/system/kube-proxy.service weiyigeek@master-224:/tmp
scp -P 20211 /etc/kubernetes/kube-proxy.kubeconfig  /etc/kubernetes/ssl/proxy.pem /etc/kubernetes/ssl/proxy-key.pem /etc/kubernetes/cfg/kube-proxy.yaml /lib/systemd/system/kube-proxy.service weiyigeek@node-1:/tmp
scp -P 20211 /etc/kubernetes/kube-proxy.kubeconfig  /etc/kubernetes/ssl/proxy.pem /etc/kubernetes/ssl/proxy-key.pem /etc/kubernetes/cfg/kube-proxy.yaml /lib/systemd/system/kube-proxy.service weiyigeek@node-2:/tmp

# 复制到对应节点上并创建对应目录
cd /tmp
cp kube-proxy.kubeconfig /etc/kubernetes/kube-proxy.kubeconfig
cp proxy*.pem /etc/kubernetes/ssl/
cp kube-proxy.yaml /etc/kubernetes/cfg/kube-proxy.yaml 
cp kube-proxy.service /lib/systemd/system/kube-proxy.service
mkdir -vp /var/lib/kube-proxy

# 温馨提示: 非常注意各节点主机名称,请修改 kube-proxy.yaml 中hostnameOverride为当前执行主机名称.
sed -i "s#__HOSTNAME__#$(hostname)#g" /etc/kubernetes/cfg/kube-proxy.yaml

步骤 06.同样重载systemd服务与设置kube-proxy服务自启,启动后查看其服务状态

systemctl daemon-reload
systemctl enable --now kube-proxy
# systemctl restart kube-proxy
systemctl status kube-proxy

8) 部署配置 Calico 插件

描述: 前面在节点加入到集群时,你会发现其节点状态为 NotReady , 我们说过部署calico插件可以让Pod与集群正常通信。

步骤 01.在【master-225】节点上拉取最新版本的 calico 当前最新版本为 v3.22, 官方项目地址 (https://github.com/projectcalico/calico)

# 拉取 calico 部署清单
wget https://docs.projectcalico.org/v3.22/manifests/calico.yaml

步骤 02.修改 calico.yaml 文件的中如下 K/V, 即 Pod 获取IP地址的地址池, 从网络规划中我们设置为 10.128.0.0/16, 注意默认情况下如下字段是注释的且默认地址池为192.168.0.0/16。

$ vim calico.yaml
# The default IPv4 pool to create on startup if none exists. Pod IPs will be
# chosen from this range. Changing this value after installation will have
# no effect. This should fall within `--cluster-cidr`.
- name: CALICO_IPV4POOL_CIDR
  value: "10.128.0.0/16"

步骤 03.部署 calico 到集群之中。

kubectl apply -f calico.yaml
  # configmap/calico-config created
  # ....
  # poddisruptionbudget.policy/calico-kube-controllers created

步骤 04.查看calico网络插件在各节点上部署结果,状态为Running表示部署成功。

$ kubectl get pod -A -o wide
  # NAMESPACE     NAME                                       READY   STATUS              RESTARTS   AGE   IP              NODE         NOMINATED NODE   READINESS GATES
  # kube-system   calico-kube-controllers-6f7b5668f7-m55gf   1/1     Running             0          56m   <none>          master-224   <none>           <none>
  # kube-system   calico-node-6cmgl                          1/1     Running             0          48m   10.10.107.226   node-1       <none>           <none>
  # kube-system   calico-node-bx59c                          1/1     Running             0          50m   10.10.107.227   node-2       <none>           <none>
  # kube-system   calico-node-ft29c                          1/1     Running             0          50m   10.10.107.225   master-225   <none>           <none>
  # kube-system   calico-node-gkz76                          1/1     Running             0          49m   10.10.107.223   master-223   <none>           <none>
  # kube-system   calico-node-nbnwx                          1/1     Running             0          47m   10.10.107.224   master-224   <none>           <none>

步骤 05.查看集群中各个节点状态是否Ready,状态为Running表示节点正常。

kubectl get node
  # NAME         STATUS   ROLES    AGE     VERSION
  # master-223   Ready    <none>   3d      v1.23.6
  # master-224   Ready    <none>   2d6h    v1.23.6
  # master-225   Ready    <none>   6d5h    v1.23.6
  # node-1       Ready    <none>   2d23h   v1.23.6
  # node-2       Ready    <none>   2d23h   v1.23.6

步骤 06.此处我们扩展一哈,您可能会发现在二进制安装Kubernetes集群时master节点ROLES为,而不是我们常见的master名称, 并且由于 K8s 1.24版本以后kubeadm安装Kubernetes集群时,不再给运行控制面组件的节点标记为“master”,只是因为这个词被认为是“具有攻击性的、无礼的(offensive)”。近几年一些用master-slave来表示主-从节点的计算机系统纷纷改掉术语,slave前两年就已经销声匿迹了,现在看来master也不能用, 但我们可以通过如下方式进行自定义设置节点角色名称。

# 分别将主节点设置为control-plane,工作节点角色设置为work、
# node-role.kubernetes.io/节点角色名称=
kubectl label nodes master-223 node-role.kubernetes.io/control-plane=
kubectl label nodes master-224 node-role.kubernetes.io/control-plane=
kubectl label nodes master-225 node-role.kubernetes.io/control-plane=
kubectl label nodes node-1 node-role.kubernetes.io/work=
kubectl label nodes node-2 node-role.kubernetes.io/work=
node/node-2 labeled

# 此时再次执行查看
$ kubectl get node
NAME         STATUS   ROLES           AGE    VERSION
master-223   Ready    control-plane   3d     v1.23.6
master-224   Ready    control-plane   2d6h   v1.23.6
master-225   Ready    control-plane   6d5h   v1.23.6
node-1       Ready    work            3d     v1.23.6
node-2       Ready    work            3d     v1.23.6

9) 部署配置 CoreDNS 插件

描述: 通过该CoreDNS插件从名称可以看出其用于 DNS和服务发现, 使得集群中可以使用服务名称进行访问相应的后端Pod,不管后端Pod地址如何改变其总是第一时间会更新绑定到对应服务域名, 该项目也是毕业于CNCF。

官网地址: https://coredns.io/
项目地址: https://github.com/coredns/coredns

步骤 01.通过参考Github(https://github.com/coredns/deployment/tree/master/kubernetes)中coredns项目在K8S部署说明,我们需要下载 deploy.sh 与 coredns.yaml.sed 两个文件.

wget -L https://github.com/coredns/deployment/raw/master/kubernetes/deploy.sh
wget -L https://github.com/coredns/deployment/raw/master/kubernetes/coredns.yaml.sed
chmox +x deploy.sh

步骤 02.生成coredns部署清单yaml文件

# -i DNS-IP 参数指定 集群 dns IP 地址, 注意需要与前面 kubelet-config.yaml 中的 clusterDNS 字段值保持一致
# -d CLUSTER-DOMAIN 参数指定 集群域名名称, 注意需要与前面 kubelet-config.yaml 中的 clusterDomain 字段值保持一致
./deploy.sh -i 10.96.0.10 >> coredns.yaml

温馨提示: 实际上我们可以手动替换 coredns.yaml.sed 内容。

CLUSTER_DNS_IP=10.96.0.10
CLUSTER_DOMAIN=cluster.local
REVERSE_CIDRS=in-addr.arpa ip6.arpa
STUBDOMAINS=""
UPSTREAMNAMESERVER=UPSTREAMNAMESERVER
YAML_TEMPLATE=coredns.yaml.sed

sed -e "s/CLUSTER_DNS_IP/$CLUSTER_DNS_IP/g" \
    -e "s/CLUSTER_DOMAIN/$CLUSTER_DOMAIN/g" \
    -e "s?REVERSE_CIDRS?$REVERSE_CIDRS?g" \
    -e "s@STUBDOMAINS@${STUBDOMAINS//$orig/$replace}@g" \
    -e "s/UPSTREAMNAMESERVER/$UPSTREAM/g" \
    "${YAML_TEMPLATE}"

步骤 03.查看生成的coredns部署清单yaml文件(coredns.yml)

apiVersion: v1
kind: ServiceAccount
metadata:
  name: coredns
  namespace: kube-system
---
apiVersion: rbac.authorization.k8s.io/v1
kind: ClusterRole
metadata:
  labels:
    kubernetes.io/bootstrapping: rbac-defaults
  name: system:coredns
rules:
  - apiGroups:
    - ""
    resources:
    - endpoints
    - services
    - pods
    - namespaces
    verbs:
    - list
    - watch
  - apiGroups:
    - discovery.k8s.io
    resources:
    - endpointslices
    verbs:
    - list
    - watch
---
apiVersion: rbac.authorization.k8s.io/v1
kind: ClusterRoleBinding
metadata:
  annotations:
    rbac.authorization.kubernetes.io/autoupdate: "true"
  labels:
    kubernetes.io/bootstrapping: rbac-defaults
  name: system:coredns
roleRef:
  apiGroup: rbac.authorization.k8s.io
  kind: ClusterRole
  name: system:coredns
subjects:
- kind: ServiceAccount
  name: coredns
  namespace: kube-system
---
apiVersion: v1
kind: ConfigMap
metadata:
  name: coredns
  namespace: kube-system
data:
  Corefile: |
    .:53 {
        errors
        health {
          lameduck 5s
        }
        ready
        kubernetes cluster.local in-addr.arpa ip6.arpa {
          fallthrough in-addr.arpa ip6.arpa
        }
        prometheus :9153
        forward . /etc/resolv.conf {
          max_concurrent 1000
        }
        cache 30
        loop
        reload
        loadbalance
    }
---
apiVersion: apps/v1
kind: Deployment
metadata:
  name: coredns
  namespace: kube-system
  labels:
    k8s-app: kube-dns
    kubernetes.io/name: "CoreDNS"
spec:
  # replicas: not specified here:
  # 1. Default is 1.
  # 2. Will be tuned in real time if DNS horizontal auto-scaling is turned on.
  strategy:
    type: RollingUpdate
    rollingUpdate:
      maxUnavailable: 1
  selector:
    matchLabels:
      k8s-app: kube-dns
  template:
    metadata:
      labels:
        k8s-app: kube-dns
    spec:
      priorityClassName: system-cluster-critical
      serviceAccountName: coredns
      tolerations:
        - key: "CriticalAddonsOnly"
          operator: "Exists"
      nodeSelector:
        kubernetes.io/os: linux
      affinity:
         podAntiAffinity:
           requiredDuringSchedulingIgnoredDuringExecution:
           - labelSelector:
               matchExpressions:
               - key: k8s-app
                 operator: In
                 values: ["kube-dns"]
             topologyKey: kubernetes.io/hostname
      containers:
      - name: coredns
        image: coredns/coredns:1.9.1
        imagePullPolicy: IfNotPresent
        resources:
          limits:
            memory: 170Mi
          requests:
            cpu: 100m
            memory: 70Mi
        args: [ "-conf", "/etc/coredns/Corefile" ]
        volumeMounts:
        - name: config-volume
          mountPath: /etc/coredns
          readOnly: true
        ports:
        - containerPort: 53
          name: dns
          protocol: UDP
        - containerPort: 53
          name: dns-tcp
          protocol: TCP
        - containerPort: 9153
          name: metrics
          protocol: TCP
        securityContext:
          allowPrivilegeEscalation: false
          capabilities:
            add:
            - NET_BIND_SERVICE
            drop:
            - all
          readOnlyRootFilesystem: true
        livenessProbe:
          httpGet:
            path: /health
            port: 8080
            scheme: HTTP
          initialDelaySeconds: 60
          timeoutSeconds: 5
          successThreshold: 1
          failureThreshold: 5
        readinessProbe:
          httpGet:
            path: /ready
            port: 8181
            scheme: HTTP
      dnsPolicy: Default
      volumes:
        - name: config-volume
          configMap:
            name: coredns
            items:
            - key: Corefile
              path: Corefile
---
apiVersion: v1
kind: Service
metadata:
  name: kube-dns
  namespace: kube-system
  annotations:
    prometheus.io/port: "9153"
    prometheus.io/scrape: "true"
  labels:
    k8s-app: kube-dns
    kubernetes.io/cluster-service: "true"
    kubernetes.io/name: "CoreDNS"
spec:
  selector:
    k8s-app: kube-dns
  clusterIP: 10.96.0.10
  ports:
  - name: dns
    port: 53
    protocol: UDP
  - name: dns-tcp
    port: 53
    protocol: TCP
  - name: metrics
    port: 9153
    protocol: TCP

步骤 04.在集群中部署coredns,并查看其部署状态。

kubectl apply -f coredns.yaml
  # serviceaccount/coredns created
  # clusterrole.rbac.authorization.k8s.io/system:coredns created
  # clusterrolebinding.rbac.authorization.k8s.io/system:coredns created
  # configmap/coredns created
  # deployment.apps/coredns created
  # service/kube-dns created

# 温馨提示: 为了加快部署可以先拉取coredns镜像到master节点 
ctr -n k8s.io i pull docker.io/coredns/coredns:1.9.1

应用部署测试

1.部署Nginx Web服务

方式1.利用create快速部署由deployment管理的nginx应用

# 在默认的名称空间中创建由 deployment 资源控制器管理的Nginx服务
$ kubectl create deployment --image=nginx:latest --port=5701 --replicas 1 hello-nginx
deployment.apps/hello-nginx created

# 创建的POD
$ kubectl get pod -o wide --show-labels
NAME                          READY   STATUS    RESTARTS   AGE   IP              NODE         NOMINATED NODE   READINESS GATES   LABELS
hello-nginx-7f4ff84cb-mjw79   1/1     Running   0          65s   10.128.17.203   master-225   <none>           <none>            app=hello-nginx,pod-template-hash=7f4ff84cb

# 为hello-nginx部署创建一个服务,该部署服务于端口80,并连接到端口80上的Pod容器
$ kubectl expose deployment hello-nginx --port=80 --target-port=80
service/hello-nginx exposed

$ kubectl get svc hello-nginx
NAME          TYPE        CLUSTER-IP      EXTERNAL-IP   PORT(S)   AGE
hello-nginx   ClusterIP   10.96.122.135   <none>        80/TCP    23s

# 服务验证
$ curl -I 10.96.122.135
HTTP/1.1 200 OK
Server: nginx/1.21.5
Date: Sun, 08 May 2022 09:58:27 GMT
Content-Type: text/html
Content-Length: 615
Last-Modified: Tue, 28 Dec 2021 15:28:38 GMT
Connection: keep-alive
ETag: "61cb2d26-267"
Accept-Ranges: bytes

方式2.当我们也可以利用资源清单创建nginx服务用于部署我的个人主页

步骤 01.准备StatefulSet控制器管理的资源清单

tee nginx.yaml <<'EOF'
apiVersion:  apps/v1
kind: StatefulSet
metadata:
  name: nginx-web
  namespace: default
  labels:
    app: nginx
spec:
  replicas: 2
  selector:
    matchLabels:
      app: nginx
  serviceName: "nginx-service" 

  template:
    metadata:
      labels:
        app: nginx
    spec:
      volumes:
      - name: workdir
        emptyDir: {}
      containers:
      - name: nginx
        image: nginx:latest
        ports:
        - name: http
          protocol: TCP
          containerPort: 80
        volumeMounts:
        - name: workdir
          mountPath: "/usr/share/nginx/html"
      initContainers:
      - name: init-html
        image: bitnami/git:2.36.1
        command: ['sh', '-c', "git clone --depth=1 https://github.com/WeiyiGeek/WeiyiGeek.git /app/html"]
        volumeMounts:
        - name: workdir
          mountPath: "/app/html"
---
apiVersion: v1
kind: Service
metadata:
  name: nginx-service
spec:
  type: NodePort
  ports:
    - port: 80
      targetPort: 80
      nodePort: 30001
      protocol: TCP
  selector:
    app: nginx
EOF

步骤 02.部署资源以及Pod信息查看

$ kubectl apply -f nginx.yaml
statefulset.apps/nginx-web created
service/nginx-service created

# 初始化Pod容器
$ kubectl get pod nginx-web-0
NAME                          READY   STATUS     RESTARTS   AGE
nginx-web-0                   0/1     Init:0/1   0          3m19s

# 正常运行
$ kubectl get pod nginx-web-0 -o wide
NAME          READY   STATUS    RESTARTS   AGE   IP              NODE         NOMINATED NODE   READINESS GATES
nginx-web-0   1/1     Running   0          12m   10.128.251.72   master-224   <none>           <none>

# nginx-service 服务后端查看
$ kubectl describe svc nginx-service
Name:                     nginx-service
Namespace:                default
Labels:                   <none>
Annotations:              <none>
Selector:                 app=nginx
Type:                     NodePort  # 此处为 NodePort 暴露服务
IP Family Policy:         SingleStack
IP Families:              IPv4
IP:                       10.96.14.218
IPs:                      10.96.14.218
Port:                     <unset>  80/TCP
TargetPort:               80/TCP
NodePort:                 <unset>  30001/TCP  # 暴露的外部端口
Endpoints:                10.128.251.72:80
Session Affinity:         None
External Traffic Policy:  Cluster
Events:                   <none>

步骤 03.通过客户端浏览器访问10.10.107.225:30001即可访问我们部署的nginx应用,同时通过kubectl logs -f nginx-web-0查看pod日志.

部署K8s原生Dashboard UI

描述:Kubernetes Dashboard是Kubernetes集群的通用、基于web的UI。它允许用户管理集群中运行的应用程序并对其进行故障排除,以及管理集群本身。

项目地址: https://github.com/kubernetes/dashboard/

步骤 01.从Github中拉取dashboard部署资源清单,当前最新版本v2.5.1

# 下载部署
wget -L https://raw.githubusercontent.com/kubernetes/dashboard/v2.5.1/aio/deploy/recommended.yaml
kubectl apply -f recommended.yaml
grep "image:" recommended.yaml
  # image: kubernetesui/dashboard:v2.5.1
  # image: kubernetesui/metrics-scraper:v1.0.7

# 或者一条命令搞定部署
kubectl apply -f https://raw.githubusercontent.com/kubernetes/dashboard/v2.5.1/aio/deploy/recommended.yaml
  # serviceaccount/kubernetes-dashboard created
  # service/kubernetes-dashboard created
  # secret/kubernetes-dashboard-certs created
  # secret/kubernetes-dashboard-csrf created
  # secret/kubernetes-dashboard-key-holder created
  # configmap/kubernetes-dashboard-settings created
  # role.rbac.authorization.k8s.io/kubernetes-dashboard created
  # clusterrole.rbac.authorization.k8s.io/kubernetes-dashboard created
  # rolebinding.rbac.authorization.k8s.io/kubernetes-dashboard created
  # clusterrolebinding.rbac.authorization.k8s.io/kubernetes-dashboard created
  # deployment.apps/kubernetes-dashboard created
  # service/dashboard-metrics-scraper created
  # deployment.apps/dashboard-metrics-scraper created

步骤 02.查看部署的dashboard相关资源是否正常。

$ kubectl get deploy,svc -n kubernetes-dashboard  -o wide
NAME                                        READY   UP-TO-DATE   AVAILABLE   AGE     CONTAINERS                  IMAGES                                SELECTOR
deployment.apps/dashboard-metrics-scraper   1/1     1            1           7m45s   dashboard-metrics-scraper   kubernetesui/metrics-scraper:v1.0.7   k8s-app=dashboard-metrics-scraper
deployment.apps/kubernetes-dashboard        1/1     1            1           7m45s   kubernetes-dashboard        kubernetesui/dashboard:v2.5.1         k8s-app=kubernetes-dashboard

NAME                                TYPE        CLUSTER-IP     EXTERNAL-IP   PORT(S)    AGE     SELECTOR
service/dashboard-metrics-scraper   ClusterIP   10.96.37.134   <none>        8000/TCP   7m45s   k8s-app=dashboard-metrics-scraper
service/kubernetes-dashboard        ClusterIP   10.96.26.57    <none>        443/TCP    7m45s   k8s-app=kubernetes-dashboard

# 编辑 service/kubernetes-dashboard 服务将端口通过nodePort方式进行暴露为30443。
$ kubectl edit svc -n kubernetes-dashboard kubernetes-dashboard
# service/kubernetes-dashboard edited
apiVersion: v1
kind: Service
.....
spec:
.....
  ports:
  - port: 443
    protocol: TCP
    targetPort: 8443
    nodePort: 30443  # 新增
  selector:
    k8s-app: kubernetes-dashboard
  sessionAffinity: None
  type: NodePort     # 修改

步骤 03.默认仪表板部署包含运行所需的最小RBAC权限集,而要想使用dashboard操作集群中的资源,通常我们还需要自定义创建kubernetes-dashboard管理员角色。
权限控制参考地址: https://github.com/kubernetes/dashboard/blob/master/docs/user/access-control/README.md

# 创建后最小权限的Token(只能操作kubernetes-dashboard名称空间下的资源)
kubectl get sa -n kubernetes-dashboard kubernetes-dashboard
kubectl describe secrets -n kubernetes-dashboard kubernetes-dashboard-token-jhdpb | grep '^token:'|awk '{print $2}'

Kubernetes Dashboard 支持几种不同的用户身份验证方式:

  • Authorization header
  • Bearer Token (默认)
  • Username/password
  • Kubeconfig file (默认)

温馨提示: 此处使用Bearer Token方式, 为了方便演示我们向 Dashboard 的服务帐户授予管理员权限 (Admin privileges), 而在生产环境中通常不建议如此操作, 而是指定一个或者多个名称空间下的资源进行操作。

tee rbac-dashboard-admin.yaml <<'EOF'
apiVersion: v1
kind: ServiceAccount
metadata:
  name: dashboard-admin
  namespace: kubernetes-dashboard
---
apiVersion: rbac.authorization.k8s.io/v1
kind: ClusterRoleBinding
metadata:
  name: dashboard-admin
  namespace: kubernetes-dashboard
roleRef:
  apiGroup: rbac.authorization.k8s.io
  kind: ClusterRole
  name: cluster-admin
subjects:
  - kind: ServiceAccount
    name: dashboard-admin
    namespace: kubernetes-dashboard
EOF

kubectl apply -f rbac-dashboard-admin.yaml
  # serviceaccount/dashboard-admin created
  # clusterrolebinding.rbac.authorization.k8s.io/dashboard-admin created

步骤 04.获取 sa 创建的 dashboard-admin 用户的 secrets 名称并获取认证 token ,用于上述搭建的dashboard 认证使用。

kubectl get sa -n kubernetes-dashboard dashboard-admin -o yaml | grep "\- name" | awk '{print $3}'
  # dashboard-admin-token-crh7v
kubectl describe secrets -n kubernetes-dashboard dashboard-admin-token-crh7v | grep "^token:" | awk '{print $2}'
  #  获取到认证Token
eyJhbGciOiJSUzI1NiIsImtpZCI6IkJXdm1YSGNSQ3VFSEU3V0FTRlJKcU10bWxzUDZPY3lfU0lJOGJjNGgzRXMifQ.eyJpc3MiOiJrdWJlcm5ldGVzL3NlcnZpY2VhY2NvdW50Iiwia3ViZXJuZXRlcy5pby9zZXJ2aWNlYWNjb3VudC9uYW1lc3BhY2UiOiJrdWJlcm5ldGVzLWRhc2hib2FyZCIsImt1YmVybmV0ZXMuaW8vc2VydmljZWFjY291bnQvc2VjcmV0Lm5hbWUiOiJkYXNoYm9hcmQtYWRtaW4tdG9rZW4tY3JoN3YiLCJrdWJlcm5ldGVzLmlvL3NlcnZpY2VhY2NvdW50L3NlcnZpY2UtYWNjb3VudC5uYW1lIjoiZGFzaGJvYXJkLWFkbWluIiwia3ViZXJuZXRlcy5pby9zZXJ2aWNlYWNjb3VudC9zZXJ2aWNlLWFjY291bnQudWlkIjoiNDY3ODEwMDMtM2MzNi00NWE1LTliYTQtNDY3MTQ0OWE2N2E0Iiwic3ViIjoic3lzdGVtOnNlcnZpY2VhY2NvdW50Omt1YmVybmV0ZXMtZGFzaGJvYXJkOmRhc2hib2FyZC1hZG1pbiJ9.X10AzWBxaHObYGoOqjfw3IYkhn8L5E7najdGSeLavb94LX5BY8_rCGizkWgNgNyvUe39NRP8r8YBU5sy9F2K-kN9_5cxUX125cj1drLDmgPJ-L-1m9-fs-luKnkDLRE5ENS_dgv7xsFfhtN7s9prgdqLw8dIrhshHVwflM_VOXW5D26QR6izy2AgPNGz9cRh6x2znrD-dpUNHO1enzvGzlWj7YhaOUFl310V93hh6EEc57gAwmDQM4nWP44KiaAiaW1cnC38Xs9CbWYxjsfxd3lObWShOd3knFk5PUVSBHo0opEv3HQ_-gwu6NGV6pLMY52p_JO1ECPSDnblVbVtPQ

步骤 05.利用上述 Token 进行登陆Kubernetes-dashboard的UI。

3.部署K9s工具进行管理集群

描述: k9s 是用于管理 Kubernetes 集群的 CLI, K9s 提供了一个终端 UI 来与您的 Kubernetes 集群进行交互。通过封装 kubectl 功能 k9s 持续监视 Kubernetes 的变化并提供后续命令来与您观察到的资源进行交互,直白的说就是k9s可以让开发者快速查看并解决运行 Kubernetes 时的日常问题。
官网地址: https://k9scli.io/
参考地址: https://github.com/derailed/k9s

此处,以安装二进制包为例进行实践。

# 1. 利用 wget 命令 -c 短点续传和 -b 后台下载
wget -b -c https://github.com/derailed/k9s/releases/download/v0.25.18/k9s_Linux_x86_64.tar.gz

# 2.解压并删除多余文件
tar -zxf k9s_linux_x86_64.tar.gz
rm  k9s_linux_x86_64.tar.gz  LICENSE  README.md

# 3.拷贝 kubernetes 控制配置文件到加目录中
mkdir -p $HOME/.kube
sudo cp -i /etc/kubernetes/admin.conf $HOME/.kube/config
sudo chown $(id -u):$(id -g) $HOME/.kube/config
export KUBECONFIG=/etc/kubernetes/admin.conf

# 4.直接运行即可,如果你对vim操作比较熟悉,那么恭喜你了你很快能上手k9s.
/nfsdisk-31/newK8s-Backup/tools# ./k9s

# 5.退出k9s指令
:quit

入坑出坑

二进制搭建K8S集群部署calico网络插件始终有一个calico-node-xxx状态为Running但是READY为0/1解决方法

  • 错误信息:
kubectl get pod -A
NAMESPACE     NAME                                       READY   STATUS              RESTARTS   AGE
kube-system   calico-kube-controllers-6f7b5668f7-52247   1/1     Running   0          13m
kube-system   calico-node-kb27z                          0/1     Running             0          13m

$ kubectl describe pod -n kube-system calico-node-kb27z
Warning  Unhealthy 14m  kubelet Readiness probe failed: 2022-05-07 13:15:12.460 [INFO][204] confd/health.go 180: Number of node(s) with BGP peering established = 0
calico/node is not ready: BIRD is not ready: BGP not established with 10.10.107.223,10.10.107.224,10.10.107.225,10.10.107.226  # 关键点
  • 错误原因: calico-node daemonset 默认的策略是获取第一个取到的网卡的 ip 作为 calico node 的ip, 由于集群中网卡名称不统一所以可能导致calico获取的网卡IP不对, 所以出现此种情况下就只能 IP_AUTODETECTION_METHOD 字段指定通配符网卡名称或者IP地址。

  • 解决办法:

# 通信网卡名称查看(为ens32)
ip addr
  # 2: ens32: <BROADCAST,MULTICAST,UP,LOWER_UP> mtu 1500 qdisc fq_codel state UP group default qlen 1000
  #     link/ether 00:0c:29:a5:66:de brd ff:ff:ff:ff:ff:ff
  #     inet 10.10.107.227/24 brd 10.10.107.255 scope global ens32

# 编辑 calico 资源,添加 IP_AUTODETECTION_METHOD 键与值, 其中interface 是正则表达式为了成功匹配此处设置为en.* 
kubectl edit daemonsets.apps -n kube-system calico-node
- name: IP
  value: autodetect
- name: IP_AUTODETECTION_METHOD
  value: interface=en.*
# IP_AUTODETECTION_METHOD	
# The method to use to autodetect the IPv4 address for this host. This is only used when the IPv4 address is being autodetected. See IP Autodetection methods for details of the valid methods. [Default: first-found]

- name: IP_AUTODETECTION_METHOD
value: can-reach=114.114.114.114

二进制部署 containerd 1.6.3 将用于环回的 CNI 配置版本提高到 1.0.0 导致 CNI loopback 无法启用

  • 错误信息:
Warning  FailedCreatePodSandBox  4m19s (x1293 over 34m)  kubelet            (combined from similar events): Failed to create pod sandbox: rpc error: code = Unknown desc = failed to setup network for sandbox "ae7a5d4c6906fac5114679c4b375ecc02e0cebcf4406962f01b40220064a8c1c": plugin type="loopback" failed (add): incompatible CNI versions; config is "1.0.0", plugin supports ["0.1.0" "0.2.0" "0.3.0" "0.3.1" "0.4.0"]
  • 问题原因: 版本containerd 1.6.3 的 Bug

  • 解决办法: 重新安装部署cri-containerd-cni-1.6.4-linux-amd64

# containerd -version
containerd github.com/containerd/containerd v1.6.4 212e8b6fa2f44b9c21b2798135fc6fb7c53efc16

在集群中部署coredns时显示 CrashLoopBackOff 并且报 Readiness probe failed 8181 端口 connection refused

  • 错误信息:
kubectl get pod -n kube-system -o wide
    # coredns-659648f898-kvgrl                  0/1     CrashLoopBackOff   5 (73s ago)   4m14s   10.128.17.198   master-225   <none>           <none>

kubectl describe pod -n kube-system coredns-659648f898-crttw 
  # Warning  Unhealthy  2m34s                 kubelet            Readiness probe failed: HTTP probe failed with statuscode: 503
  # Warning  Unhealthy  95s                   kubelet            Readiness probe failed: Get "http://10.128.251.66:8181/ready": dial tcp 10.128.251.66:8181: connect: connection refused
  • 问题原因: 由于 coredns 是映射宿主机中的 /etc/resolv.conf 到容器中在加载该配置是并不能访问 nameserver 127.0.0.53 这个dns地址导致coredns容器pod无法正常启动,并且在我们手动修改该 /etc/resolv.conf 后systemd-resolved.service会定时刷新覆盖我们的修改,所以在百度上的一些方法只能临时解决,在Pod容器下一次重启后将会又处于该异常状态。
$ ls -alh /etc/resolv.conf
  # lrwxrwxrwx 1 root root 39 Feb  2  2021 /etc/resolv.conf -> ../run/systemd/resolve/stub-resolv.conf

$ cat /etc/resolv.conf
  # nameserver 127.0.0.53
  # options edns0 trust-ad
  • 解决办法: 推荐方式将/etc/resolv.conf建立软连接到 /run/systemd/resolve/resolv.conf或者不使用软连接直接删除。
# systemd-resolve 服务DNS设定后续便可以防止/etc/resolv.conf被误修改。
tee -a /etc/systemd/resolved.conf <<'EOF'
DNS=223.6.6.6
DNS=10.96.0.10
EOF
systemctl restart systemd-resolved.service
systemd-resolve --status

# Delete the symbolic link
sudo rm -f /etc/resolv.conf 
ln -s /run/systemd/resolve/resolv.conf /etc/resolv.conf

# 删除错误 coredns POD 此处利用pod标签
kubectl delete pod -n kube-system -l k8s-app=kube-dns

# coredns 副本扩展为3个查看是否正常了。
kubectl scale deployment -n kube-system coredns --replicas 3
# deployment.apps/coredns scaled

$ cat /etc/resolv.conf
nameserver 223.6.6.6
nameserver 10.96.0.10
nameserver 192.168.10.254

在进行授予管理员权限时利用 Token 登陆 kubernetes-dashboard 认证时始终报 Unauthorized (401): Invalid credentials provided

  • 错误信息: Unauthorized (401): Invalid credentials provided
$ kubectl logs -f --tail 50 -n kubernetes-dashboard kubernetes-dashboard-fb8648fd9-8pl5c
2022/05/09 00:48:50 [2022-05-09T00:48:50Z] Incoming HTTP/2.0 POST /api/v1/login request from 10.128.17.194:54676: { contents hidden }
2022/05/09 00:48:50 Non-critical error occurred during resource retrieval: the server has asked for the client to provide credentials
  • 问题原因: 创键的rbac本身是没有问题,只是我们采用 kubectl get secrets 获取的是经过base64编码后的,我们应该执行 kubectl describe secrets 进行获取。

  • 解决办法:

kubectl get sa -n kubernetes-dashboard dashboard-admin -o yaml | grep "\- name" | awk '{print $3}'
  # dashboard-admin-token-crh7v
kubectl describe secrets -n kubernetes-dashboard dashboard-admin-token-crh7v | grep "^token:" | awk '{print $2}'
posted @ 2022-06-30 11:27  哈喽哈喽111111  阅读(825)  评论(0编辑  收藏  举报