k8s集群二进制部署过程
一 、环境概述:
cat >> /etc/hosts <<EOF
10.0.0.202 master
10.0.0.197 node1
10.0.0.163 node2
EOF
挂载数据盘:
mkdir /data
mkfs.xfs -f /dev/vdb
mount /dev/vdb /data
- 安装部署Docker,并修改Docker的数据存放位置。
- 准备Kubernetes的v1.16.4的相关二进制文件,具体可参考CHANGELOG-1.16。(略)
- 系统主机名初始化,主机SSH授信免密登录。
ssh-keygen -t rsa
ssh-copy-id -i ~/.ssh/id_rsa.pub node1
ssh-copy-id -i ~/.ssh/id_rsa.pub node2
- 关闭selinux,关闭iptables
- 准备部署目录,并把/data/kubernetes/bin加入到环境变量PATH中。
mkdir -p /data/kubernetes/{cfg,bin,ssl,log}
echo "export PATH=$PATH:/data/kubernetes/bin" >> /etc/profile
source /etc/profile
二、手动制作CA证书:
本次生成证书的工具采用CFSSL,CFSSL是CloudFlare开源的一款PKI/TLS工具。CFSSL包含一个命令行工具和一个用于签名,验证并且捆绑TLS证书的HTTP API服务。使用Go语言编写。我们可以使用JSON去定义证书相关内容,看起来更加直观。
CFSSL 包括:
- 一组用于生成自定义TLS PKI的工具
cfssl
程序,是CFSSL的命令行工具multirootca
程序是可以使用多个签名密钥的证书颁发机构服务器mkbundle
程序用于构建证书池cfssljson
程序,从cfssl
和multirootca
程序获取 JSON 输出,并将证书,密钥,CSR 和 bundle 写入磁盘
2.1 准备 cfssl 的二进制文件(所有节点)
wget https://pkg.cfssl.org/R1.2/cfssl_linux-amd64
wget https://pkg.cfssl.org/R1.2/cfssljson_linux-amd64
wget https://pkg.cfssl.org/R1.2/cfssl-certinfo_linux-amd64
chmod +x cfssl*
mv cfssl-certinfo_linux-amd64 /data/kubernetes/bin/cfssl-certinfo
mv cfssljson_linux-amd64 /data/kubernetes/bin/cfssljson
mv cfssl_linux-amd64 /data/kubernetes/bin/cfssl
- 分发到所有的 k8s 节点,因为设置了免密因此无需输入密码,其余几个节点的分发省略,只写一个。
scp /data/kubernetes/bin/cfssl* root@node1:/data/kubernetes/bin/
2.1.1 初始化 cfssl
mkdir /data/src && cd /data/src
cfssl print-defaults config > config.json
cfssl print-defaults csr > csr.json
2.1.2 创建用来生成 CA 文件的 JSON 配置文件
cat > ca-config.json <<EOF
{
"signing": {
"default": {
"expiry": "8760h"
},
"profiles": {
"kubernetes": {
"usages": [
"signing",
"key encipherment",
"server auth",
"client auth"
],
"expiry": "8760h"
}
}
}
}
EOF
说明:
- ca-config.json:可以定义多个 profiles,分别指定不同的过期时间,使用场景等参数。后面再签名某个证书时使用。
- signing:表示该证书可用于签名其他证书,生成的 ca.perm 证书中 CA=TRUE。
- server auth:表示 client 可以用该 CA 对 server 提供的证书进行验证。
- client auth:表示 server 可以用该 CA 对 client 提供的证书进行验证。
2.1.3 创建用来生成 CA 证书签名请求(CSR)的 JSON 配置文件
[root@master src]# cat ca-csr.json
{
"CN": "kubernetes",
"key": {
"algo": "rsa",
"size": 2048
},
"names": [
{
"C": "CN",
"ST": "BeiJing",
"L": "BeiJing",
"O": "k8s",
"OU": "System"
}
]
}
说明:
"CN" :Common Name ,kube-apiserver** 从证书中提取该字段作为请求的用户名 (User Name) ;浏览器使用该字段验证网站是否合法;**
"O" :Organization**** ,****kube-apiserver** 从证书中提取该字段作为请求用户所属的组 (Group) ;**
2.1.4 生成 CA 证书(ca.pem)和密钥(ca-key.pem)
cfssl gencert -initca ca-csr.json | cfssljson -bare ca
2.1.5 分发证书
cp ca.csr ca.pem ca-key.pem ca-config.json /data/kubernetes/ssl/
2.1.6 发布到其他节点(所有节点)
scp ca.csr ca.pem ca-key.pem ca-config.json node1:/data/kubernetes/ssl/
scp ca.csr ca.pem ca-key.pem ca-config.json node2:/data/kubernetes/ssl/
三、etcd 集群
到 https://github.com/coreos/etcd/releases 页面下载最新版本的发布包:
wget https://github.com/etcd-io/etcd/releases/download/v3.3.18/etcd-v3.3.18-linux-amd64.tar.gz
tar zxf etcd-v3.3.18-linux-amd64.tar.gz
#分发etcd软件
chmod +x etcd*
cp etcd etcdctl /data/kubernetes/bin/
scp etcd* node1:/data/kubernetes/bin/
scp etcd* node2:/data/kubernetes/bin/
3.1 创建 etcd 证书签名请求
[root@etcd1 ssl]# cat etcd-csr.json
{
"CN": "etcd",
"hosts": [
"127.0.0.1",
"10.0.0.202",
"10.0.0.163",
"10.0.0.197"
],
"key": {
"algo": "rsa",
"size": 2048
},
"names": [
{
"C": "CN",
"ST": "BeiJing",
"L": "BeiJing",
"O": "k8s",
"OU": "System"
}
]
}
- hosts 字段指定授权使用该证书的 etcd 节点 IP;
3.2 生成 etcd 证书和私钥:
cfssl gencert -ca=/data/kubernetes/ssl/ca.pem \
-ca-key=/data/kubernetes/ssl/ca-key.pem \
-config=/data/kubernetes/ssl/ca-config.json \
-profile=kubernetes etcd-csr.json | cfssljson -bare etcd
[root@etcd1 ssl]# ls etcd*
etcd.csr etcd-csr.json etcd-key.pem etcd.pem
[root@etcd1 ssl]#
3.3 分发生成的证书和私钥到各 etcd 节点
scp etcd*.pem root@node1:/data/kubernetes/ssl
scp etcd*.pem root@node2:/data/kubernetes/ssl
3.4 设置 ETCD 配置文件
[root@master cfg]# mkdir -p /var/lib/etcd/default.etcd
[root@etcd1 cfg]# cat etcd.conf
#[member]
ETCD_NAME="etcd1"
ETCD_DATA_DIR="/var/lib/etcd/default.etcd"
#ETCD_SNAPSHOT_COUNTER="10000"
#ETCD_HEARTBEAT_INTERVAL="100"
#ETCD_ELECTION_TIMEOUT="1000"
ETCD_LISTEN_PEER_URLS="https://10.0.0.202:2380"
ETCD_LISTEN_CLIENT_URLS="https://10.0.0.202:2379,https://127.0.0.1:2379"
#ETCD_MAX_SNAPSHOTS="5"
#ETCD_MAX_WALS="5"
#ETCD_CORS=""
#[cluster]
ETCD_INITIAL_ADVERTISE_PEER_URLS="https://10.0.0.202:2380"
# if you use different ETCD_NAME (e.g. test),
# set ETCD_INITIAL_CLUSTER value for this name, i.e. "test=http://..."
ETCD_INITIAL_CLUSTER="etcd1=https://10.0.0.202:2380,etcd2=https://10.0.0.197:2380,etcd3=https://10.0.0.163:2380"
ETCD_INITIAL_CLUSTER_STATE="new"
ETCD_INITIAL_CLUSTER_TOKEN="k8s-etcd-cluster"
ETCD_ADVERTISE_CLIENT_URLS="https://10.0.0.202:2379"
#[security]
CLIENT_CERT_AUTH="true"
ETCD_CA_FILE="/data/kubernetes/ssl/ca.pem"
ETCD_CERT_FILE="/data/kubernetes/ssl/etcd.pem"
ETCD_KEY_FILE="/data/kubernetes/ssl/etcd-key.pem"
PEER_CLIENT_CERT_AUTH="true"
ETCD_PEER_CA_FILE="/data/kubernetes/ssl/ca.pem"
ETCD_PEER_CERT_FILE="/data/kubernetes/ssl/etcd.pem"
ETCD_PEER_KEY_FILE="/data/kubernetes/ssl/etcd-key.pem"
3.5 创建 etcd 的系统服务
cat /etc/systemd/system/etcd.service
[Unit]
Description=Etcd Server
After=network.target
[Service]
Type=simple
WorkingDirectory=/var/lib/etcd
EnvironmentFile=/data/kubernetes/cfg/etcd.conf
# set GOMAXPROCS to number of processors
ExecStart=/bin/bash -c "GOMAXPROCS=$(nproc) /data/kubernetes/bin/etcd"
Type=notify
[Install]
WantedBy=multi-user.target
- 重载系统服务:
systemctl daemon-reload
- 分发配置文件到 etcd 各个节点:
scp etcd.conf etcd2:/data/kubernetes/cfg/
scp etcd.conf etcd3:/data/kubernetes/cfg/
scp /etc/systemd/system/etcd.service node1:/etc/systemd/system
scp /etc/systemd/system/etcd.service node2:/etc/systemd/system
- 修改其他 etcd 节点的配置文件(etcd 集群中的每一个节点都要有,记得要改 etcd 的 name,以及监听的地址。)
这里只展示一个
[root@etcd2 bin]# vim /data/kubernetes/cfg/etcd.conf
[member]
ETCD_NAME="node1"
ETCD_DATA_DIR="/var/lib/etcd/default.etcd"
#ETCD_SNAPSHOT_COUNTER="10000"
#ETCD_HEARTBEAT_INTERVAL="100"
#ETCD_ELECTION_TIMEOUT="1000"
ETCD_LISTEN_PEER_URLS="https://10.0.0.197:2380"
ETCD_LISTEN_CLIENT_URLS="https://10.0.0.197:2379,https://127.0.0.1:2379"
#ETCD_MAX_SNAPSHOTS="5"
#ETCD_MAX_WALS="5"
#ETCD_CORS=""
#[cluster]
ETCD_INITIAL_ADVERTISE_PEER_URLS="https://10.0.0.197:2380"
# if you use different ETCD_NAME (e.g. test),
# set ETCD_INITIAL_CLUSTER value for this name, i.e. "test=http://..."
ETCD_INITIAL_CLUSTER="etcd1=https://10.0.0.202:2380,etcd2=https://10.0.0.197:2380,etcd3=https://10.0.0.163:2380"
ETCD_INITIAL_CLUSTER_STATE="new"
ETCD_INITIAL_CLUSTER_TOKEN="k8s-etcd-cluster"
ETCD_ADVERTISE_CLIENT_URLS="https://10.0.0.197:2379"
#[security]
CLIENT_CERT_AUTH="true"
ETCD_CA_FILE="/data/kubernetes/ssl/ca.pem"
ETCD_CERT_FILE="/data/kubernetes/ssl/etcd.pem"
ETCD_KEY_FILE="/data/kubernetes/ssl/etcd-key.pem"
PEER_CLIENT_CERT_AUTH="true"
ETCD_PEER_CA_FILE="/data/kubernetes/ssl/ca.pem"
ETCD_PEER_CERT_FILE="/data/kubernetes/ssl/etcd.pem"
ETCD_PEER_KEY_FILE="/data/kubernetes/ssl/etcd-key.pem"
3.5.1 启动 etcd 服务
[root@etcd2 bin]# systemctl daemon-reload
[root@etcd2 bin]# systemctl enable etcd
Created symlink from /etc/systemd/system/multi-user.target.wants/etcd.service to /etc/systemd/system/etcd.service.
[root@etcd2 bin]# systemctl start etcd
另外两台机器配置完成后,启动 etcd 服务,三台都需启动。
3.6 验证集群
etcdctl --endpoints=https://10.0.0.202:2379 \
--ca-file=/data/kubernetes/ssl/ca.pem \
--cert-file=/data/kubernetes/ssl/etcd.pem \
--key-file=/data/kubernetes/ssl/etcd-key.pem cluster-health
结果:
[root@etcd1 bin]# etcdctl --endpoints=https://10.0.0.202:2379 \
> --ca-file=/data/kubernetes/ssl/ca.pem \
> --cert-file=/data/kubernetes/ssl/etcd.pem \
> --key-file=/data/kubernetes/ssl/etcd-key.pem cluster-health
member 4cf18011db57d17a is healthy: got healthy result from https://10.0.0.202:2379
member 85fd5487299e2fbd is healthy: got healthy result from https://10.0.0.197:2379
member f96d77d9089bd1e3 is healthy: got healthy result from https://10.0.0.163:2379
cluster is healthy
3.7 etcd 集群排障
- 出现如下报错:
[root@master ~]# systemctl status etcd
● etcd.service - Etcd Server
Loaded: loaded (/usr/lib/systemd/system/etcd.service; enabled; vendor preset: disabled)
Active: active (running) since Fri 2019-12-27 22:23:49 EST; 29min ago
Main PID: 19730 (etcd)
CGroup: /system.slice/etcd.service
└─19730 /data/kubernetes/bin/etcd
Dec 27 22:24:04 master etcd[19730]: health check for peer c91b0b181670aad could not connect: dial tcp 10.0.0.163:2380: connect: connection refused (prober "ROUND_TRIPPER_RAFT_MESSAGE")
Dec 27 22:24:04 master etcd[19730]: health check for peer c91b0b181670aad could not connect: dial tcp 10.0.0.163:2380: connect: connection refused (prober "ROUND_TRIPPER_SNAPSHOT")
Dec 27 22:24:05 master etcd[19730]: updated the cluster version from 3.0 to 3.3
- 解决方法:
[root@master ~]# cd /var/lib/etcd/
[root@master etcd]# ll
total 0
drwxr-xr-x. 3 root root 20 Dec 28 11:23 default.etcd
[root@master etcd]# rm -rf default.etcd/
删除/var/lib/etcd 底下的文件重新启动 etcd 服务。
~在 node 安装 docker~
# step 1: 安装必要的一些系统工具
sudo yum install -y yum-utils device-mapper-persistent-data lvm2
# Step 2: 添加软件源信息
sudo yum-config-manager --add-repo https://mirrors.aliyun.com/docker-ce/linux/centos/docker-ce.repo
# Step 3: 更新并安装Docker-CE
sudo yum makecache fast
sudo yum -y install docker-ce
# Step 4: 开启Docker服务
sudo systemctl start docker
curl -sSL https://get.daocloud.io/daotools/set_mirror.sh | sh -s http://f1361db2.m.daocloud.io
四、 Flannel 网络部署
- 创建 flanneld 的证书 JSON 文件
[root@master ssl]# cat flanneld-csr.json
{
"CN": "flanneld",
"hosts": [],
"key": {
"algo": "rsa",
"size": 2048
},
"names": [
{
"C": "CN",
"ST": "BeiJing",
"L": "BeiJing",
"O": "k8s",
"OU": "System"
}
]
}
4.1 生成证书并且分发证书
cfssl gencert -ca=/data/kubernetes/ssl/ca.pem \
-ca-key=/data/kubernetes/ssl/ca-key.pem \
-config=/data/kubernetes/ssl/ca-config.json \
-profile=kubernetes flanneld-csr.json | cfssljson -bare flanneld
生成 pem 证书文件
注意这个是分发到集群里的每一个节点
cp flanneld*.pem /data/kubernetes/ssl/
scp flanneld*.pem node1:/data/kubernetes/ssl/
scp flanneld*.pem node2:/data/kubernetes/ssl/
wget https://github.com/coreos/flannel/releases/download/v0.11.0/flannel-v0.11.0-linux-amd64.tar.gz
[root@master ~]# tar zxvf flannel-v0.11.0-linux-amd64.tar.gz -C /data/src/flanneld/
flanneld
mk-docker-opts.sh
README.md
[root@master ~]# cd /data/src/flanneld/
[root@master flanneld]# cp flanneld mk-docker-opts.sh /data/kubernetes/bin/
#分发到其他节点
[root@master flanneld]# scp flanneld mk-docker-opts.sh node1:/data/kubernetes/bin/
flanneld 100% 34MB 79.3MB/s 00:00
mk-docker-opts.sh 100% 2139 1.3MB/s 00:00
[root@master flanneld]# scp flanneld mk-docker-opts.sh node2:/data/kubernetes/bin/
flanneld 100% 34MB 98.5MB/s 00:00
mk-docker-opts.sh 100% 2139 1.6MB/s 00:00
- 配置 flannel,配置完成以后进行分发,保证每一个节点都有一个这个配置文件供 flannel 使用。
[root@master cfg]# cat flannel
FLANNEL_ETCD="-etcd-endpoints=https://10.0.0.99:2379,https://10.0.0.111:2379,https://10.0.0.11:2379"
FLANNEL_ETCD_KEY="-etcd-prefix=/kubernetes/network"
FLANNEL_ETCD_CAFILE="--etcd-cafile=/data/kubernetes/ssl/ca.pem"
FLANNEL_ETCD_CERTFILE="--etcd-certfile=/data/kubernetes/ssl/flanneld.pem"
FLANNEL_ETCD_KEYFILE="--etcd-keyfile=/data/kubernetes/ssl/flanneld-key.pem"
scp /data/kubernetes/cfg/flannel node1:/data/kubernetes/cfg/flannel
scp /data/kubernetes/cfg/flannel node2:/data/kubernetes/cfg/flannel
scp /usr/lib/systemd/system/flannel.service node1:/usr/lib/systemd/system/flannel.service
scp /usr/lib/systemd/system/flannel.service node2:/usr/lib/systemd/system/flannel.service
- 设置 flannel 的系统服务配置
cat /usr/lib/systemd/system/flannel.service
[Unit]
Description=Flanneld overlay address etcd agent
After=network.target
Before=docker.service
[Service]
EnvironmentFile=-/data/kubernetes/cfg/flannel
ExecStart=/data/kubernetes/bin/flanneld ${FLANNEL_ETCD} ${FLANNEL_ETCD_KEY} ${FLANNEL_ETCD_CAFILE} ${FLANNEL_ETCD_CERTFILE} ${FLANNEL_ETCD_KEYFILE}
ExecStartPost=/data/kubernetes/bin/mk-docker-opts.sh -d /run/flannel/docker
Type=notify
[Install]
WantedBy=multi-user.target
RequiredBy=docker.service
mk-docker-opts.sh 脚本将分配给 flanneld 的 Pod 子网网段信息写入 /run/flannel/docker 文件,后续 docker 启动时 使用这个文件中的环境变量配置 docker0 网桥;
wget https://github.com/containernetworking/plugins/releases/download/v0.8.3/cni-plugins-linux-amd64-v0.8.3.tgz
tar zxf cni-plugins-linux-amd64-v0.8.3.tgz -C /data/kubernetes/bin/cni
scp -r /data/kubernetes/bin/cni/* node1:/data/kubernetes/bin/cni/
scp -r /data/kubernetes/bin/cni/* node2:/data/kubernetes/bin/cni/
设置 etcd 的 key
Falnnel 要用 etcd 存储自身一个子网信息,所以要保证能成功连接 Etcd,写入预定义子网段:
/data/kubernetes/bin/etcdctl --ca-file /data/kubernetes/ssl/ca.pem --cert-file /data/kubernetes/ssl/flanneld.pem --key-file /data/kubernetes/ssl/flanneld-key.pem \
--no-sync -C https://10.0.0.99:2379,https://10.0.0.111:2379,https://10.0.0.11:2379 \
mk /kubernetes/network/config '{ "Network": "10.2.0.0/16", "Backend": { "Type": "vxlan", "VNI": 1 }}' >/dev/null 2>&1
[root@etcd1 ssl]# /data/kubernetes/bin/etcdctl --ca-file /data/kubernetes/ssl/ca.pem --cert-file /data/kubernetes/ssl/flanneld.pem --key-file /data/kubernetes/ssl/flanneld-key.pem \
> --no-sync -C https://10.0.0.99:2379,https://10.0.0.111:2379,https://10.0.0.11:2379 \
> mk /kubernetes/network/config '{ "Network": "10.2.0.0/16", "Backend": { "Type": "vxlan", "VNI": 1 }}'
{ "Network": "10.2.0.0/16", "Backend": { "Type": "vxlan", "VNI": 1 }}
查询写入 ETCD 的自定义网络是否成功:
etcdctl --endpoints=https://10.0.0.99:2379 \
--ca-file=/data/kubernetes/ssl/ca.pem \
--cert-file=/data/kubernetes/ssl/etcd.pem \
--key-file=/data/kubernetes/ssl/etcd-key.pem get /kubernetes/network/config
五、二进制部署docker
5.1 下载docker
wget https://download.docker.com/linux/static/stable/x86_64/docker-19.03.5.tgz
tar -xf docker-19.03.5.tgz
mkdir -p /usr/local/docker/bin
cp docker/docker* /usr/local/docker/bin/
yum install -y yum-utils device-mapper-persistent-data lvm2
5.1.2 编辑 docker 系统服务(配置 docker 使用 flannel)
cat /usr/lib/systemd/system/docker.service
[Unit]
Description=Docker Application Container Engine
Documentation=http://docs.docker.io
After=network-online.target firewalld.service flannel.service
Wants=network-online.target
Requires=flannel.service
[Service]
Environment="PATH=/usr/local/docker/bin:/bin:/sbin:/usr/bin:/usr/sbin"
EnvironmentFile=-/run/flannel/docker
#ExecStart=/usr/local/docker/bin/dockerd --log-level=error $DOCKER_NETWORK_OPTIONS
ExecStart=/usr/local/docker/bin/dockerd $DOCKER_OPTS
ExecReload=/bin/kill -s HUP $MAINPID
Restart=on-failure
RestartSec=5
LimitNOFILE=infinity
LimitNPROC=infinity
LimitCORE=infinity
Delegate=yes
KillMode=process
[Install]
WantedBy=multi-user.target
- 分发 docker 系统服务和启动文件
scp docker/docker* node1:/usr/local/docker/bin/
scp docker/docker* node2:/usr/local/docker/bin/
scp /usr/lib/systemd/system/docker.service node1:/usr/lib/systemd/system/docker.service
scp /usr/lib/systemd/system/docker.service node2:/usr/lib/systemd/system/docker.service
systemctl daemon-reload
systemctl enable docker
systemctl start docker
这里注意一下,docker 启动会和 flannel 相辅相成,网络起不来,docker 也起不来。同时在 Service 段添加 EnvironmentFile=-/run/flannel/docker,这个文件是由 flannel 启动的时候通过 mk-docker-opts.sh 生成的,可以看一下这个文件的内容,通过—bip 来划分网段。
[root@master ~]# cat /run/flannel/docker
DOCKER_OPT_BIP="--bip=10.2.10.1/24"
DOCKER_OPT_IPMASQ="--ip-masq=true"
DOCKER_OPT_MTU="--mtu=1400"
DOCKER_OPTS=" --bip=10.2.10.1/24 --ip-masq=true --mtu=1400"
- 查看 flannel 和 docker 安装结果
可以看到 docker0 和 flannel.1 的地址是一段的,每个节点分到的网段是不一样的。flannel 部署完成。
5.2 docker 通过 yum 安装也可
- 各个节点安装 yum 源:
cd /etc/yum.repos.d/
wget https://mirrors.aliyun.com/docker-ce/linux/centos/docker-ce.repo
- 配置 docker 使用 flannel:
[root@master ~]# vim /usr/lib/systemd/system/docker.service
[Unit] #在Unit下面修改After和增加Requires
Description=Docker Application Container Engine
Documentation=https://docs.docker.com
BindsTo=containerd.service
After=network-online.target firewalld.service containerd.service flannel.service
Wants=network-online.target
Requires=flannel.service
[Service]
#增加EnvironmentFile=-/run/flannel/docker
Type=notify
EnvironmentFile=-/run/flannel/docker #加载环境文件,设置docker0的ip地址为flannel分配的ip地址
# the default is not to use systemd for cgroups because the delegate issues still
# exists and systemd currently does not support the cgroup feature set required
# for containers run by docker
#ExecStart=/usr/bin/dockerd -H fd:// --containerd=/run/containerd/containerd.sock
ExecStart=/usr/bin/dockerd $DOCKER_OPTS
ExecReload=/bin/kill -s HUP $MAINPID
TimeoutSec=0
RestartSec=2
Restart=always
......
systemctl daemon-reload
systemctl restart docker
六、master 节点部署
kubernetes master 节点运行如下组件:
kube-apiserver
kube-scheduler
kube-controller-manager
kube-scheduler 和 kube-controller-manager 可以以集群模式运行,通过 leader 选举产生一个工作进程,其它进程处于阻塞模式。
目前这三个组件需要部署在同一台机器上:
kube-scheduler、kube-controller-manager 和 kube-apiserver 三者的功能紧密相关;
同时只能有一个 kube-scheduler、kube-controller-manager 进程处于工作状态,如果运行多个,则需要通过选举产生一个 leader;
本文档介绍部署单机 kubernetes master 节点的步骤,没有实现高可用 master 集群。
计划后续再介绍部署 LB 的步骤,客户端 (kubectl、kubelet、kube-proxy) 使用 LB 的 VIP 来访问 kube-apiserver,从而实现高可用 master 集群。
master 节点与 node 节点上的 Pods 通过 Pod 网络通信,所以需要在 master 节点上部署 Flannel 网络。
6.1 准备软件包(科学.上网)
点此链接:https://github.com/kubernetes/kubernetes/blob/master/CHANGELOG-1.17.md
解压后将 kube-apiserver,kube-controller-manager,kube-scheduler,kubctl 拷贝到 master 节点的 /data/kubernetes/bin
目录下。
tar xf kubernetes-server-linux-amd64.tar.gz
cp kube-scheduler kube-apiserver kube-controller-manager kubectl /data/kubernetes/bin/
6.2 配置 kubernetes 相关证书
6.2.1 创建生成 CSR 的 JSON 配置文件
把 k8s 集群中的所有节点都加进去了
[root@master ssl]# cat kubernetes-csr.json
{
"CN": "kubernetes",
"hosts": [
"127.0.0.1",
"10.0.0.99",
"10.0.0.111",
"10.0.0.11",
"10.0.0.163",
"10.0.0.202",
"10.0.0.197",
"kubernetes",
"kubernetes.default",
"kubernetes.default.svc",
"kubernetes.default.svc.cluster",
"kubernetes.default.svc.cluster.local"
],
"key": {
"algo": "rsa",
"size": 2048
},
"names": [
{
"C": "CN",
"ST": "BeiJing",
"L": "BeiJing",
"O": "k8s",
"OU": "System"
}
]
}
6.2.2 生成 Kubernetes 的证书和私钥
cfssl gencert -ca=/data/kubernetes/ssl/ca.pem \
-ca-key=/data/kubernetes/ssl/ca-key.pem \
-config=/data/kubernetes/ssl/ca-config.json \
-profile=kubernetes kubernetes-csr.json | cfssljson -bare kubernetes
- 分发证书到集群所有节点
scp kubernetes*.pem node1:/data/kubernetes/ssl/
6.2.3 部署 kube-apiserver 组件
- apiserver 提供集群管理的 REST API 接口,包括认证授权、数据校验以及集群状态变更等。
- 只有 API Server 才能直接操作 etcd;
- 其他模块通过 API Server 查询或修改数据
- 提供其他模块之间的数据交互和通信枢纽
创建 TLS Bootstrapping Token
[root@master ssl]# head -c 16 /dev/urandom | od -An -t x | tr -d ' '
7b54ba2ddce122d1784ac6a243be7fde
创建 apiserver 配置文件
第一列:随机字符串,自己可生成
第二列:用户名
第三列:UID
第四列:用户组
[root@master ssl]# cat bootstrap-token.csv
7b54ba2ddce122d1784ac6a243be7fde,kubelet-bootstrap,10001,"system:kubelet-bootstrap"
创建基础用户名和密码认证配置
[root@master ssl]# cat basic-auth.csv
admin,admin,1
readonly,readonly,2
分发配置文件到 node
scp /data/kubernetes/ssl/bootstrap-token.csv basic-auth.csv node1:/data/kubernetes/ssl/
6.2.4 创建 kube-apiserver 系统配置文件
cat /usr/lib/systemd/system/kube-apiserver.service
[Unit]
Description=Kubernetes API Server
Documentation=https://github.com/GoogleCloudPlatform/kubernetes
After=network.target
[Service]
ExecStart=/data/kubernetes/bin/kube-apiserver \
--admission-control=NamespaceLifecycle,LimitRanger,ServiceAccount,DefaultStorageClass,ResourceQuota,NodeRestriction \
--bind-address=0.0.0.0 \
--insecure-bind-address=0.0.0.0 \
--authorization-mode=Node,RBAC \
--runtime-config=rbac.authorization.k8s.io/v1 \
--kubelet-https=true \
--anonymous-auth=false \
--basic-auth-file=/data/kubernetes/ssl/basic-auth.csv \
--enable-bootstrap-token-auth \
--token-auth-file=/data/kubernetes/ssl/bootstrap-token.csv \
--service-cluster-ip-range=10.1.0.0/16 \
--service-node-port-range=20000-40000 \
--tls-cert-file=/data/kubernetes/ssl/kubernetes.pem \
--tls-private-key-file=/data/kubernetes/ssl/kubernetes-key.pem \
--client-ca-file=/data/kubernetes/ssl/ca.pem \
--service-account-key-file=/data/kubernetes/ssl/ca-key.pem \
--etcd-cafile=/data/kubernetes/ssl/ca.pem \
--etcd-certfile=/data/kubernetes/ssl/kubernetes.pem \
--etcd-keyfile=/data/kubernetes/ssl/kubernetes-key.pem \
--etcd-servers=https://10.0.0.99:2379,https://10.0.0.111:2379,https://10.0.0.11:2379 \
--enable-swagger-ui=true \
--allow-privileged=true \
--audit-log-maxage=30 \
--audit-log-maxbackup=3 \
--audit-log-maxsize=100 \
--audit-log-path=/data/kubernetes/log/api-audit.log \
--event-ttl=1h \
--v=2 \
--logtostderr=false \
--log-dir=/data/kubernetes/log
Restart=on-failure
RestartSec=5
Type=notify
LimitNOFILE=65536
[Install]
WantedBy=multi-user.target
配置好前面生成的证书,确保能连接 etcd。
参数说明:
–logtostderr 启用日志
—v 日志等级
–etcd-servers etcd 集群地址
–bind-address 监听地址
–secure-port https 安全端口
–advertise-address 集群通告地址
–allow-privileged 启用授权
–service-cluster-ip-range Service 虚拟 IP 地址段
–enable-admission-plugins 准入控制模块
–authorization-mode 认证授权,启用 RBAC 授权和节点自管理
–enable-bootstrap-token-auth 启用 TLS bootstrap 功能
–token-auth-file token 文件
–service-node-port-range Service Node 类型默认分配端口范围
6.2.5 启动 apiserver
systemctl daemon-reload
systemctl enable kube-apiserver
systemctl start kube-apiserver
systemctl status kube-apiserver
[root@master ssl]# ps -aux | grep kube-apiserver
root 25785 3.5 8.4 549756 326552 ? Ssl 21:47 0:22 /data/kubernetes/bin/kube-apiserver --admission-control=NamespaceLifecycle,LimitRanger,ServiceAccount,DefaultStorageClass,ResourceQuota,NodeRestriction --bind-address=0.0.0.0 --insecure-bind-address=0.0.0.0 --authorization-mode=Node,RBAC --runtime-config=rbac.authorization.k8s.io/v1 --kubelet-https=true --anonymous-auth=false --basic-auth-file=/data/kubernetes/ssl/basic-auth.csv --enable-bootstrap-token-auth --token-auth-file=/data/kubernetes/ssl/bootstrap-token.csv --service-cluster-ip-range=10.1.0.0/16 --service-node-port-range=20000-40000 --tls-cert-file=/data/kubernetes/ssl/kubernetes.pem --tls-private-key-file=/data/kubernetes/ssl/kubernetes-key.pem --client-ca-file=/data/kubernetes/ssl/ca.pem --service-account-key-file=/data/kubernetes/ssl/ca-key.pem --etcd-cafile=/data/kubernetes/ssl/ca.pem --etcd-certfile=/data/kubernetes/ssl/kubernetes.pem --etcd-keyfile=/data/kubernetes/ssl/kubernetes-key.pem --etcd-servers=https://10.0.0.99:2379,https://10.0.0.111:2379,https://10.0.0.11:2379 --enable-swagger-ui=true --allow-privileged=true --audit-log-maxage=30 --audit-log-maxbackup=3 --audit-log-maxsize=100 --audit-log-path=/data/kubernetes/log/api-audit.log --event-ttl=1h --v=2 --logtostderr=false --log-dir=/data/kubernetes/log
root 26557 0.0 0.0 112712 964 pts/0 S+ 21:58 0:00 grep --color=auto kube-apiserver
[root@master ~]# netstat -tulnp | grep kube-apiserve
tcp6 0 0 :::6443 :::* LISTEN 18024/kube-apiserve
tcp6 0 0 :::8080 :::* LISTEN 18024/kube-apiserve
从监听端口可以看到 api-server 监听在 6443 端口,同时也监听了本地的 8080 端口,是提供 kube-schduler 和 kube-controller 使用。
6.3 准备 kube-scheduler 的服务配置文件
- scheduler 负责分配调度 Pod 到集群内的 node 节点
- 监听 kube-apiserver,查询还未分配的 Node 的 Pod
- 根据调度策略为这些 Pod 分配节点
cat /usr/lib/systemd/system/kube-scheduler.service
[Unit]
Description=Kubernetes Scheduler
Documentation=https://github.com/GoogleCloudPlatform/kubernetes
[Service]
ExecStart=/data/kubernetes/bin/kube-scheduler \
--address=0.0.0.0 \
--master=http://10.0.0.202:8080 \
--leader-elect=true \
--v=2 \
--logtostderr=false \
--log-dir=/data/kubernetes/log
Restart=on-failure
RestartSec=5
[Install]
WantedBy=multi-user.target
启动 kube-scheduler
[root@master ~]# systemctl daemon-reload
[root@master ~]# systemctl enable kube-scheduler
[root@master ~]# systemctl start kube-scheduler
[root@master ~]# systemctl status kube-scheduler
[root@master ~]# ps -aux | grep kube-scheduler
root 27465 0.4 0.5 147088 22024 ? Ssl 22:10 0:01 /data/kubernetes/bin/kube-scheduler --address=0.0.0.0 --master=http://10.0.0.202:8080 --leader-elect=true --v=2 --logtostderr=false --log-dir=/data/kubernetes/log
root 27829 0.0 0.0 112712 964 pts/0 S+ 22:14 0:00 grep --color=auto kube-scheduler
[root@master ~]# netstat -tulnp |grep kube-sched
tcp6 0 0 :::10251 :::* LISTEN 18081/kube-schedule
tcp6 0 0 :::10259 :::* LISTEN 18081/kube-schedule
从 kube-scheduler 的监听端口上,同样可以看到监听在本地的 10251 端口上,外部无法直接访问,同样是需要通过 api-server 进行访问。
6.4 部署 kube-controller-manager
- controller-manager 由一系列的控制器组成,它通过 apiserver 监控整个集群的状态,并确保集群处于预期的工作状态。
6.4.1 创建 kube-controller-manager 配置文件
cat /usr/lib/systemd/system/kube-controller-manager.service
[Unit]
Description=Kubernetes Controller Manager
Documentation=https://github.com/GoogleCloudPlatform/kubernetes
[Service]
ExecStart=/data/kubernetes/bin/kube-controller-manager \
--address=0.0.0.0 \
--master=http://10.0.0.202:8080 \
--allocate-node-cidrs=true \
--service-cluster-ip-range=10.1.0.0/16 \
--cluster-cidr=10.2.0.0/16 \
--cluster-name=kubernetes \
--cluster-signing-cert-file=/data/kubernetes/ssl/ca.pem \
--cluster-signing-key-file=/data/kubernetes/ssl/ca-key.pem \
--service-account-private-key-file=/data/kubernetes/ssl/ca-key.pem \
--root-ca-file=/data/kubernetes/ssl/ca.pem \
--leader-elect=true \
--v=2 \
--logtostderr=false \
--log-dir=/data/kubernetes/log
Restart=on-failure
RestartSec=5
[Install]
WantedBy=multi-user.target
启动 kube-controller-manager
[root@master ~]# systemctl daemon-reload
[root@master ~]# systemctl enable kube-controller-manager
[root@master ~]# systemctl start kube-controller-manager
[root@master ~]# systemctl status kube-controller-manager
[root@master ~]# ps -aux | grep kube-controller-manager
root 28587 3.6 1.3 221388 51892 ? Ssl 22:24 0:02 /data/kubernetes/bin/kube-controller-manager --address=0.0.0.0 --master=http://10.0.0.202:8080 --allocate-node-cidrs=true --service-cluster-ip-range=10.1.0.0/16 --cluster-cidr=10.2.0.0/16 --cluster-name=kubernetes --cluster-signing-cert-file=/data/kubernetes/ssl/ca.pem --cluster-signing-key-file=/data/kubernetes/ssl/ca-key.pem --service-account-private-key-file=/data/kubernetes/ssl/ca-key.pem --root-ca-file=/data/kubernetes/ssl/ca.pem --leader-elect=true --v=2 --logtostderr=false --log-dir=/data/kubernetes/log
root 28675 0.0 0.0 112712 968 pts/0 S+ 22:25 0:00 grep --color=auto kube-controller-manager
[root@master ~]# netstat -tulnp | grep kube-control
tcp6 0 0 :::10252 :::* LISTEN 18137/kube-controll
tcp6 0 0 :::10257 :::* LISTEN 18137/kube-controll
从监听端口上,可以看到 kube-controller 监听在本地的 10252 端口,外部是无法直接访问 kube-controller,需要通过 api-server 才能进行访问。
6.5 部署 kubectl 服务
kubectl 用于日常直接管理 K8S 集群,那么 kubectl 要进行管理 k8s,就需要和 k8s 的组件进行通信,也就需要用到证书。此时 kubectl 需要单独部署,也是因为 kubectl 也是需要用到证书,而前面的 kube-apiserver、kube-controller、kube-scheduler 都是不需要用到证书,可以直接通过服务进行启动。
首先准备好 kubectl 的二进制文件分发到所有的 master 节点,然后创建 admin 的证书签名请求
[root@master ssl]# cat admin-csr.json
{
"CN": "admin",
"hosts": [],
"key": {
"algo": "rsa",
"size": 2048
},
"names": [
{
"C": "CN",
"ST": "BeiJing",
"L": "BeiJing",
"O": "system:masters",
"OU": "System"
}
]
}
- 生成 admin 的证书和密钥
cfssl gencert -ca=/data/kubernetes/ssl/ca.pem \
-ca-key=/data/kubernetes/ssl/ca-key.pem \
-config=/data/kubernetes/ssl/ca-config.json \
-profile=kubernetes admin-csr.json | cfssljson -bare admin
- 设置集群参数,注意 master 的 ip 为 vip,证书以嵌入的形式生成 config 文件。
kubectl config set-cluster kubernetes \
--certificate-authority=/data/kubernetes/ssl/ca.pem \
--embed-certs=true \
--server=https://10.0.0.202:6443
- 设置客户端认证参数
kubectl config set-credentials admin \
--client-certificate=/data/kubernetes/ssl/admin.pem \
--embed-certs=true \
--client-key=/data/kubernetes/ssl/admin-key.pem
- 设置上下文参数
kubectl config set-context kubernetes \
--cluster=kubernetes \
--user=admin
- 设置默认上下文
kubectl config use-context kubernetes
- 查看当前 config
[root@master ssl]# kubectl config view
apiVersion: v1
clusters:
- cluster:
certificate-authority-data: DATA+OMITTED
server: https://10.0.0.202:6443
name: kubernetes
contexts:
- context:
cluster: kubernetes
user: admin
name: kubernetes
current-context: kubernetes
kind: Config
preferences: {}
users:
- name: admin
user:
client-certificate-data: REDACTED
client-key-data: REDACTED
- 上面过程的配置是为了在家目录下生成 config 文件,之后 kubectl 和 API 通信就需要用到该文件,这也就是说如果在其他节点上需要用到这个 kubectl,就需要将该文件拷贝到其他节点。
[root@master ~]# cat .kube/config
apiVersion: v1
clusters:
- cluster:
certificate-authority-data: LS0tLS1CRUdJTiBDRVJUSUZJQ0FURS0tLS0tCk1JSU
...
- 使用 kubectl 工具
[root@master ssl]# kubectl get cs
NAME AGE
controller-manager <unknown>
scheduler <unknown>
etcd-1 <unknown>
etcd-2 <unknown>
etcd-0 <unknown>
看过前辈的搭建过程,这里显示和之前的版本不一样,我查询相关资料找到你,这里有解释。总体来说这个打印结果不影响使用。1.17.0 后面的版本会解决。
七、node 节点部署
kubernetes node 节点运行如下组件:
docker
kubelet
kube-proxy
准备 node 节点二进制包,将 kubelet,kube-proxy 分发到所有 node 节点的/alidata/kubernetes/bin 下。
7.1 部署 kubelet 组件
认证大致工作流程如图所示:
kublet 运行在每个 node 节点上,接收 kube-apiserver 发送的请求,管理 Pod 容器,执行交互式命令,如 exec、run、logs 等;
分发启动文件到 node 节点:
scp kube-proxy kubelet node1:/data/kubernetes/bin/
scp kube-proxy kubelet node2:/data/kubernetes/bin/
- 创建角色绑定
kubectl create clusterrolebinding kubelet-bootstrap --clusterrole=system:node-bootstrapper --user=kubelet-bootstrap
- 创建 kubelet bootstrapping kubeconfig 文件 设置集群参数
kubectl config set-cluster kubernetes \
--certificate-authority=/data/kubernetes/ssl/ca.pem \
--embed-certs=true \
--server=https://10.0.0.202:6443 \
--kubeconfig=bootstrap.kubeconfig
- 设置客户端认证参数
注意这 token 是 kube-apiserver 使用的客户端 token 文件中的 token,填写在这里。
kubectl config set-credentials kubelet-bootstrap \
--token=7b54ba2ddce122d1784ac6a243be7fde \
--kubeconfig=bootstrap.kubeconfig
- 设置上下文参数
kubectl config set-context default \
--cluster=kubernetes \
--user=kubelet-bootstrap \
--kubeconfig=bootstrap.kubeconfig
- 选择默认上下文
kubectl config use-context default --kubeconfig=bootstrap.kubeconfig
- 其实做了上面这么一堆操作,结果就是生成了一个 bootstarp.kubeconfig 文件,我们需要将这个 kubeconfig 文件分发到其他节点,
kubectl
命令生成文件以后把这个文件分发就可以。
cp bootstrap.kubeconfig /data/kubernetes/cfg/
scp bootstrap.kubeconfig node1:/data/kubernetes/cfg/
scp bootstrap.kubeconfig node2:/data/kubernetes/cfg/
7.1.1 设置 CNI 支持
mkdir -p /etc/cni/net.d
cat /etc/cni/net.d/10-default.conf
{
"name": "flannel",
"type": "flannel",
"delegate": {
"bridge": "docker0",
"isDefaultGateway": true,
"mtu": 1400
}
}
- 创建 kubelet 目录
mkdir /var/lib/kubelet
[root@node1 ~]# cat /usr/lib/systemd/system/kubelet.service
[Unit]
Description=Kubernetes Kubelet
Documentation=https://github.com/GoogleCloudPlatform/kubernetes
After=docker.service
Requires=docker.service
[Service]
WorkingDirectory=/var/lib/kubelet
ExecStart=/data/kubernetes/bin/kubelet \
--address=10.0.0.197 \
--hostname-override=10.0.0.197 \
--pod-infra-container-image=mirrorgooglecontainers/pause-amd64:3.0 \
--experimental-bootstrap-kubeconfig=/data/kubernetes/cfg/bootstrap.kubeconfig \
--kubeconfig=/data/kubernetes/cfg/kubelet.kubeconfig \
--cert-dir=/data/kubernetes/ssl \
--network-plugin=cni \
--cni-conf-dir=/etc/cni/net.d \
--cni-bin-dir=/data/kubernetes/bin/cni \
--cluster-dns=10.1.0.2 \
--cluster-domain=cluster.local. \
--hairpin-mode hairpin-veth \
--allow-privileged=true \
--fail-swap-on=false \
--logtostderr=true \
--v=2 \
--logtostderr=false \
--log-dir=/data/kubernetes/log
Restart=on-failure
RestartSec=5
7.1.2 报错排查
[root@node1 ~]# systemctl daemon-reload
[root@node1 ~]# systemctl start kubelet
[root@node1 ~]# systemctl status kubelet
此时 kubelet 启动失败报错:
[root@node1 ~]# systemctl status kubelet -l
● kubelet.service - Kubernetes Kubelet
Loaded: loaded (/usr/lib/systemd/system/kubelet.service; static; vendor preset: disabled)
Active: activating (auto-restart) (Result: exit-code) since Fri 2020-01-10 15:34:37 CST; 651ms ago
Docs: https://github.com/GoogleCloudPlatform/kubernetes
Process: 31057 ExecStart=/data/kubernetes/bin/kubelet --address=10.0.0.197 --hostname-override=10.0.0.197 --pod-infra-container-im=/data/kubernetes/cfg/bootstrap.kubeconfig --kubeconfig=/data/kubernetes/cfg/kubelet.kubeconfig --cert-dir=/data/kubernetes/ssl --nebin/cni --cluster-dns=10.1.0.2 --cluster-domain=cluster.local. --hairpin-mode hairpin-veth --allow-privileged=true --fail-swap-on=fag (code=exited, status=255)
Main PID: 31057 (code=exited, status=255)
Jan 10 15:34:37 node1 kubelet[31057]: --tls-cipher-suites strings , the default Go cipher suites will be used. Possible values: TLS_ECDHE_ECDSA_WITH_AES_128_CBC_SHA,TLS_ECDHE_ECDSA_WITH_AES_128_CBC_C_SHA,TLS_ECDHE_ECDSA_WITH_AES_256_GCM_SHA384,TLS_ECDHE_ECDSA_WITH_CHACHA20_POLY1305,TLS_ECDHE_ECDSA_WITH_RC4_128_SHA,TLS_ECDHE_RSA_S_128_CBC_SHA256,TLS_ECDHE_RSA_WITH_AES_128_GCM_SHA256,TLS_ECDHE_RSA_WITH_AES_256_CBC_SHA,TLS_ECDHE_RSA_WITH_AES_256_GCM_SHA384,TLS_DES_EDE_CBC_SHA,TLS_RSA_WITH_AES_128_CBC_SHA,TLS_RSA_WITH_AES_128_CBC_SHA256,TLS_RSA_WITH_AES_128_GCM_SHA256,TLS_RSA_WITH_AES_256_CBis parameter should be set via the config file specified by the Kubelet's --config flag. See https://kubernetes.io/docs/tasks/admini
Jan 10 15:34:37 node1 kubelet[31057]: --tls-min-version string rsionTLS11, VersionTLS12, VersionTLS13 (DEPRECATED: This parameter should be set via the config file specified by the Kubelet's --cofig-file/ for more information.)
Jan 10 15:34:37 node1 kubelet[31057]: --tls-private-key-file string ECATED: This parameter should be set via the config file specified by the Kubelet's --config flag. See https://kubernetes.io/docs/ta
Jan 10 15:34:37 node1 kubelet[31057]: --topology-manager-policy string ffort', 'restricted', 'single-numa-node'. (default "none") (DEPRECATED: This parameter should be set via the config file specified b-cluster/kubelet-config-file/ for more information.)
Jan 10 15:34:37 node1 kubelet[31057]: -v, --v Level
Jan 10 15:34:37 node1 kubelet[31057]: --version version[=true]
Jan 10 15:34:37 node1 kubelet[31057]: --vmodule moduleSpec ging
Jan 10 15:34:37 node1 kubelet[31057]: --volume-plugin-dir string third party volume plugins (default "/usr/libexec/kubernetes/kubelet-plugins/volume/exec/")
Jan 10 15:34:37 node1 kubelet[31057]: --volume-stats-agg-period duration disk usage for all pods and volumes. To disable volume calculations, set to 0. (default 1m0s) (DEPRECATED: This parameter should b//kubernetes.io/docs/tasks/administer-cluster/kubelet-config-file/ for more information.)
Jan 10 15:34:37 node1 kubelet[31057]: F0110 15:34:37.575781 31057 server.go:154] unknown flag: --allow-privileged
- 解决方法:
[root@node1 ~]# vim /usr/lib/systemd/system/kubelet.service
[Unit]
Description=Kubernetes Kubelet
Documentation=https://github.com/GoogleCloudPlatform/kubernetes
After=docker.service
Requires=docker.service
[Service]
WorkingDirectory=/var/lib/kubelet
ExecStart=/data/kubernetes/bin/kubelet \
--address=10.0.0.197 \
--hostname-override=10.0.0.197 \
--pod-infra-container-image=mirrorgooglecontainers/pause-amd64:3.0 \
--experimental-bootstrap-kubeconfig=/data/kubernetes/cfg/bootstrap.kubeconfig \
--kubeconfig=/data/kubernetes/cfg/kubelet.kubeconfig \
--cert-dir=/data/kubernetes/ssl \
--network-plugin=cni \
--cni-conf-dir=/etc/cni/net.d \
--cni-bin-dir=/data/kubernetes/bin/cni \
--cluster-dns=10.1.0.2 \
--cluster-domain=cluster.local. \
--hairpin-mode hairpin-veth \
--fail-swap-on=false \
--logtostderr=true \
--v=2 \
--logtostderr=false \
--log-dir=/data/kubernetes/log
Restart=on-failure
RestartSec=5
把参数:
--allow-privileged=true \
删除即可
参考:
https://github.com/wk8/SDN/commit/8db8f06cd5ccdc91eb74ce1d00041597881cd0c1
https://github.com/microsoft/SDN/issues/379
http://www.mamicode.com/info-detail-2738983.html
- 重新加载系统,重新启动 kubelet
[root@node2 ~]# systemctl daemon-reload && systemctl start kubelet && systemctl status kubelet
kubelet 启动参数参考:
https://my.oschina.net/u/3797264/blog/2222877
7.1.3 查看 CSR 请求
[root@master ssl]# kubectl get csr
NAME AGE REQUESTOR CONDITION
node-csr-Sfja1KrymePmfnaQJ9Nh3ZQuL07i9F_2IVoevDOTXm4 15s kubelet-bootstrap Pending
node-csr-Tje055FJKPKKgubNuHL5MSEhyrU-RZ1e0pfXVxSz-dw 13s kubelet-bootstrap Pending
7.1.3.1 批准 kubelet 的 TLS 证书请求
[root@master ssl]# kubectl get csr | grep 'Pend' | awk 'NR>0{print $1}'
node-csr-Sfja1KrymePmfnaQJ9Nh3ZQuL07i9F_2IVoevDOTXm4
node-csr-Tje055FJKPKKgubNuHL5MSEhyrU-RZ1e0pfXVxSz-dw
[root@master ssl]# kubectl get csr | grep 'Pend' | awk 'NR>0{print $1}' | xargs kubectl certificate approve
certificatesigningrequest.certificates.k8s.io/node-csr-Sfja1KrymePmfnaQJ9Nh3ZQuL07i9F_2IVoevDOTXm4 approved
certificatesigningrequest.certificates.k8s.io/node-csr-Tje055FJKPKKgubNuHL5MSEhyrU-RZ1e0pfXVxSz-dw approved
[root@master ssl]# kubectl get csr
NAME AGE REQUESTOR CONDITION
node-csr-Sfja1KrymePmfnaQJ9Nh3ZQuL07i9F_2IVoevDOTXm4 4m9s kubelet-bootstrap Approved,Issued
node-csr-Tje055FJKPKKgubNuHL5MSEhyrU-RZ1e0pfXVxSz-dw 4m7s kubelet-bootstrap Approved,Issued
7.1.3.2 kubectl get csr 显示 No Resources Found 的解决记录
-
kubelet
使用的bootstrap.kubeconfig
文件中 User 是否是kubelet-boostrap
,是否包含token
;
-
token
是否位于kube-apiserver
使用的token.csv
文件中;
7.1.3.3 查看节点状态
[root@master ~]# kubectl get node
NAME STATUS ROLES AGE VERSION
10.0.0.163 NotReady <none> 63m v1.17.0
10.0.0.197 NotReady <none> 63m v1.17.0
现在状态不对,找原因没找到
但是通过
[root@master ~]# kubectl describe node 10.0.0.163
....
Conditions:
Type Status LastHeartbeatTime LastTransitionTime Reason Message
---- ------ ----------------- ------------------ ------ -------
MemoryPressure False Tue, 14 Jan 2020 04:01:04 -0500 Tue, 14 Jan 2020 02:31:35 -0500 KubeletHasSufficientMemory kubelet has sufficient memory available
DiskPressure False Tue, 14 Jan 2020 04:01:04 -0500 Tue, 14 Jan 2020 02:31:35 -0500 KubeletHasNoDiskPressure kubelet has no disk pressure
PIDPressure False Tue, 14 Jan 2020 04:01:04 -0500 Tue, 14 Jan 2020 02:31:35 -0500 KubeletHasSufficientPID kubelet has sufficient PID available
Ready False Tue, 14 Jan 2020 04:01:04 -0500 Tue, 14 Jan 2020 02:31:35 -0500 KubeletNotReady runtime network not ready: NetworkReady=false reason:NetworkPluginNotReady message:docker: network plugin is not ready: cni config uninitialized
...
查出是因为 CNI 插件没有初始化,但是我不知道用什么办法可以让 CNI 可以正常运行起来。
但是,我找到一个见不得人的方法:把 kubelet 启动文件中的 CNI 插件配置--network-plugin=cni<span> </span>
删除,然后 node 节点注册成功。 ?
[root@master ~]# kubectl get node
NAME STATUS ROLES AGE VERSION
10.0.0.163 Ready <none> 117m v1.17.0
10.0.0.197 Ready <none> 117m v1.17.0
这里卡住了,我从v1.16.4安装到现在的v1.17.0又是这里……
7.2 kube-proxy 部署
kube-proxy 同样还是只部署在 node 节点上,在之前的操作中,已经将 kube-proxy 的二进制文件分发到 node 节点。
7.2.1 开始准备证书的 JSON 文件。
{
"CN": "system:kube-proxy",
"hosts": [],
"key": {
"algo": "rsa",
"size": 2048
},
"names": [
{
"C": "CN",
"ST": "BeiJing",
"L": "BeiJing",
"O": "k8s",
"OU": "System"
}
]
}
7.2.2 生成证书
cfssl gencert -ca=/data/kubernetes/ssl/ca.pem \
-ca-key=/data/kubernetes/ssl/ca-key.pem \
-config=/data/kubernetes/ssl/ca-config.json \
-profile=kubernetes kube-proxy-csr.json | cfssljson -bare kube-proxy
7.2.3 分发证书到所有的(Node)节点
[root@master ssl]# scp kube-proxy*.pem node1:/data/kubernetes/ssl/
[root@master ssl]# scp kube-proxy*.pem node2:/data/kubernetes/ssl/
7.2.4 创建 kube-proxy 的配置文件(master 节点)
kubectl config set-cluster kubernetes \
--certificate-authority=/data/kubernetes/ssl/ca.pem \
--embed-certs=true \
--server=https://10.0.0.202:6443 \
--kubeconfig=kube-proxy.kubeconfig
kubectl config set-credentials kube-proxy \
--client-certificate=/data/kubernetes/ssl/kube-proxy.pem \
--client-key=/data/kubernetes/ssl/kube-proxy-key.pem \
--embed-certs=true \
--kubeconfig=kube-proxy.kubeconfig
kubectl config set-context default \
--cluster=kubernetes \
--user=kube-proxy \
--kubeconfig=kube-proxy.kubeconfig
kubectl config use-context default --kubeconfig=kube-proxy.kubeconfig
和 kubelet 一样,上面的操作同样是生成了一个 kube-proxy 的 kubeconfig 文件,把这个 kube-config 文件分发到所有的 node 节点
scp kube-proxy.kubeconfig node1:/data/kubernetes/cfg/
scp kube-proxy.kubeconfig node2:/data/kubernetes/cfg/
7.2.5 创建系统服务配置
[root@node2 ~]# cat !$
cat /usr/lib/systemd/system/kube-proxy.service
[Unit]
Description=Kubernetes Kube-Proxy Server
Documentation=https://github.com/GoogleCloudPlatform/kubernetes
After=network.target
[Service]
WorkingDirectory=/var/lib/kube-proxy
ExecStart=/data/kubernetes/bin/kube-proxy \
--bind-address=10.0.0.163 \
--hostname-override=10.0.0.163 \
--kubeconfig=/data/kubernetes/cfg/kube-proxy.kubeconfig \
--masquerade-all \
--feature-gates=SupportIPVSProxyMode=true \
--proxy-mode=ipvs \
--ipvs-min-sync-period=5s \
--ipvs-sync-period=5s \
--ipvs-scheduler=rr \
--logtostderr=true \
--v=2 \
--logtostderr=false \
--log-dir=/data/kubernetes/log
Restart=on-failure
RestartSec=5
LimitNOFILE=65536
[Install]
WantedBy=multi-user.target
7.2.6 安装依赖并启动
yum install -y ipvsadm ipset conntrack
systemctl daemon-reload
systemctl enable kube-proxy
systemctl start kube-proxy
systemctl status kube-proxy
7.2.7 在 node 节点检查 kube-proxy 的服务状态
ipvsadm -L -n
八、创建 K8S nginx 应用
8.1 创建 pod
- 把 master 设置成为私有仓库
为 kubelet 添加一个额外的参数 // 这样 kubelet 就不会在启动 pod 的时候去墙外的 k8s 仓库拉取 pause-amd64:3.0 镜像了
--pod-infra-container-image=10.0.0.202:5000/mirrorgooglecontainers/pause-amd64:3.0 \
8.2 配置docker本地仓库
[root@master ~]# cat /etc/sysconfig/docker
OPTIONS='--selinux-enabled --log-driver=journald --signature-verification=false --registry-mirror=https://vtbf99sa.mirror.aliyuncs.com --insecure-registry=10.0.0.202:5000'
[root@master ~]# cat /etc/docker/daemon.json
{
"registry-mirrors": ["https://vtbf99sa.mirror.aliyuncs.com"],
"insecure-registries": ["10.0.0.202:5000"]
}
[root@master ~]# docker run -d -p 5000:5000 --restart=always --name registry -v /opt/myregistry:/var/lib/registry registry
- 创建 YAML 文件
[root@master pod]# cat k8s_pod.yaml
apiVersion: v1
kind: Pod
metadata:
name: nginx
labels:
app: web
spec:
containers:
- name: nginx
image: 10.0.0.202:5000/nginx:latest
ports:
- containerPort: 80
- 创建 pod
[root@master pod]# kubectl create -f k8s_pod.yaml
[root@master pod]# kubectl get pod
NAME READY STATUS RESTARTS AGE
nginx 0/1 ContainerCreating 0 8s
[root@master pod]# kubectl describe pod nginx
...
Events:
Type Reason Age From Message
---- ------ ---- ---- -------
Normal Scheduled <unknown> default-scheduler Successfully assigned default/nginx to 10.0.0.163
Warning FailedCreatePodSandBox 12s (x4 over 8m34s) kubelet, 10.0.0.163 Failed to create pod sandbox: rpc error: code = Unknown desc = failed pulling image "mirrorgooglecontainers/pause-amd64:3.0": context canceled
...
以上报错解决(nginx 需要提前上传到私有镜像仓库):
[root@master ~]# docker pull pupudaye/pause-amd64
[root@master ~]# docker tag pupudaye/pause-amd64 10.0.0.202:5000/mirrorgooglecontainers/pause-amd64:3.0
[root@master pod]# docker images
REPOSITORY TAG IMAGE ID CREATED SIZE
10.0.0.202:5000/nginx latest c7460dfcab50 5 days ago 126MB
nginx latest c7460dfcab50 5 days ago 126MB
registry latest f32a97de94e1 10 months ago 25.8MB
pupudaye/pause-amd64 latest a9e33c9ff5e5 2 years ago 747kB
10.0.0.202:5000/mirrorgooglecontainers/pause-amd64 3.0 a9e33c9ff5e5 2 years ago 747kB
[root@master pod]# kubectl get pod
NAME READY STATUS RESTARTS AGE
nginx 1/1 Running 0 3h21m
8.3 以 Deployment YAML 方式创建 Nginx 服务
[root@master deploy]# cat k8s_deploy.yaml
apiVersion: apps/v1
kind: Deployment
metadata:
name: nginx-deployment
labels:
app: nginx
spec:
replicas: 3
selector:
matchLabels:
app: nginx
template:
metadata:
labels:
app: nginx
spec:
containers:
- name: nginx
image: 10.0.0.202:5000/nginx:1.14.2
ports:
- containerPort: 80
resources:
limits:
cpu: 100m
requests:
cpu: 100m
① apiVersion 是当前配置格式的版本。
先执行kubectl api-resources
找到所有的资源
再执行命令kubectl explain deploy
即可获取到版本和类型信息
② kind 是要创建的资源类型,这里是 Deployment。
③ metadata 是该资源的元数据,name 是必需的元数据项。
④ spec 部分是该 Deployment 的规格说明。
⑤ replicas 指明副本数量,默认为 1。
⑥ template 定义 Pod 的模板,这是配置文件的重要部分。
⑦ metadata 定义 Pod 的元数据,至少要定义一个 label。label 的 key 和 value 可以任意指定。
⑧ spec 描述 Pod 的规格,此部分定义 Pod 中每一个容器的属性,name 和 image 是必需的。
[root@master deploy]# kubectl get deploy
NAME READY UP-TO-DATE AVAILABLE AGE
nginx-deployment 3/3 3 3 107s
- 分配 nginx-deployment 配端口:
[root@master deploy]# kubectl expose deployment nginx-deployment --type=NodePort --port=80
service/nginx-deployment exposed
[root@master deploy]# kubectl get svc
NAME TYPE CLUSTER-IP EXTERNAL-IP PORT(S) AGE
kubernetes ClusterIP 10.1.0.1 <none> 443/TCP 2d
myweb NodePort 10.1.143.89 <none> 80:30000/TCP 3h40m
nginx-deployment NodePort 10.1.248.23 <none> 80:28250/TCP 8s
- 测试 :
[root@master deploy]# curl -I 10.0.0.197:28250
HTTP/1.1 200 OK
Server: nginx/1.14.2
Date: Thu, 16 Jan 2020 06:26:29 GMT
Content-Type: text/html
Content-Length: 612
Last-Modified: Tue, 04 Dec 2018 14:44:49 GMT
Connection: keep-alive
ETag: "5c0692e1-264"
Accept-Ranges: bytes
8.3.1 升级以及回滚
8.3.1.1 升级
- 更改 deployment 配置文件中镜像为最新
[root@master deploy]# kubectl edit deployment nginx-deployment
deployment.apps/nginx-deployment edited
....
spec:
containers:
- image: 10.0.0.202:5000/nginx:latest
imagePullPolicy: IfNotPresent
name: nginx
....
[root@master deploy]# curl -I 10.0.0.197:28250
HTTP/1.1 200 OK
Server: nginx/1.17.7
Date: Thu, 16 Jan 2020 06:36:00 GMT
Content-Type: text/html
Content-Length: 612
Last-Modified: Tue, 24 Dec 2019 13:07:53 GMT
Connection: keep-alive
ETag: "5e020da9-264"
Accept-Ranges: bytes
8.3.1.2 回滚
- 第一种:指定回滚版本号:
[root@master deploy]# kubectl rollout history deployment nginx-deployment
deployment.apps/nginx-deployment
REVISION CHANGE-CAUSE
1 <none>
2 <none>
[root@master deploy]# kubectl rollout undo deployment nginx-deployment
deployment.apps/nginx-deployment rolled back
[root@master deploy]# kubectl rollout history deployment nginx-deployment
deployment.apps/nginx-deployment
REVISION CHANGE-CAUSE
2 <none>
3 <none>
[root@master deploy]# curl -I 10.0.0.197:28250
HTTP/1.1 200 OK
Server: nginx/1.14.2
Date: Thu, 16 Jan 2020 06:37:35 GMT
Content-Type: text/html
Content-Length: 612
Last-Modified: Tue, 04 Dec 2018 14:44:49 GMT
Connection: keep-alive
ETag: "5c0692e1-264"
Accept-Ranges: bytes
root@master deploy]# kubectl rollout undo deployment nginx-deployment --to-revision=3
deployment.apps/nginx-deployment rolled back
- 第二种:因为第一种版本号镜像版本没有详细版本,删除 deplotment 用命令行启动 deployment,再次查看会发现具体到相关镜像版本号等信息
[root@master ~]# kubectl delete deploy nginx
deployment.apps "nginx" deleted
[root@master ~]# kubectl run nginx --image=10.0.0.202:5000/nginx:latest --replicas=3 --record
kubectl run --generator=deployment/apps.v1 is DEPRECATED and will be removed in a future version. Use kubectl run --generator=run-pod/v1 or kubectl create instead.
deployment.apps/nginx created
[root@master ~]# kubectl set image deploy nginx nginx=10.0.0.202:5000/nginx:1.16.1 --record
deployment.apps/nginx image updated
[root@master ~]# kubectl set image deploy nginx nginx=10.0.0.202:5000/nginx:1.14.2 --record
deployment.apps/nginx image updated
[root@master ~]# kubectl rollout history deployment nginx
deployment.apps/nginx
REVISION CHANGE-CAUSE
1 kubectl run nginx --image=10.0.0.202:5000/nginx:latest --replicas=3 --record=true
2 kubectl set image deploy nginx nginx=10.0.0.202:5000/nginx:1.16.1 --record=true
3 kubectl set image deploy nginx nginx=10.0.0.202:5000/nginx:1.14.2 --record=true
[root@master ~]# kubectl expose deployment nginx --type=NodePort --port=80
service/nginx exposed
[root@master ~]# kubectl get svc -o wide
NAME TYPE CLUSTER-IP EXTERNAL-IP PORT(S) AGE SELECTOR
kubernetes ClusterIP 10.1.0.1 <none> 443/TCP 2d2h <none>
myweb NodePort 10.1.143.89 <none> 80:30000/TCP 6h32m app=myweb2
nginx NodePort 10.1.246.22 <none> 80:38003/TCP 4s run=nginx
[root@master ~]# curl -I 10.0.0.197:38003
HTTP/1.1 200 OK
Server: nginx/1.14.2
Date: Thu, 16 Jan 2020 09:17:28 GMT
Content-Type: text/html
Content-Length: 612
Last-Modified: Tue, 04 Dec 2018 14:44:49 GMT
Connection: keep-alive
ETag: "5c0692e1-264"
Accept-Ranges: bytes
[root@master ~]# kubectl rollout history deployment nginx
deployment.apps/nginx
REVISION CHANGE-CAUSE
1 kubectl run nginx --image=10.0.0.202:5000/nginx:latest --replicas=3 --record=true
2 kubectl set image deploy nginx nginx=10.0.0.202:5000/nginx:1.16.1 --record=true
3 kubectl set image deploy nginx nginx=10.0.0.202:5000/nginx:1.14.2 --record=true
[root@master ~]# kubectl rollout undo deployment nginx
deployment.apps/nginx rolled back
[root@master ~]# curl -I 10.0.0.197:38003
HTTP/1.1 200 OK
Server: nginx/1.16.1
Date: Thu, 16 Jan 2020 09:18:16 GMT
Content-Type: text/html
Content-Length: 612
Last-Modified: Tue, 13 Aug 2019 10:05:00 GMT
Connection: keep-alive
ETag: "5d528b4c-264"
Accept-Ranges: bytes
[root@master ~]# kubectl rollout history deployment nginx
deployment.apps/nginx
REVISION CHANGE-CAUSE
1 kubectl run nginx --image=10.0.0.202:5000/nginx:latest --replicas=3 --record=true
3 kubectl set image deploy nginx nginx=10.0.0.202:5000/nginx:1.14.2 --record=true
4 kubectl set image deploy nginx nginx=10.0.0.202:5000/nginx:1.16.1 --record=true
[root@master ~]# kubectl rollout undo deployment nginx --to-revision=1
deployment.apps/nginx rolled back
[root@master ~]# curl -I 10.0.0.197:38003
HTTP/1.1 200 OK
Server: nginx/1.17.7
Date: Thu, 16 Jan 2020 09:19:25 GMT
Content-Type: text/html
Content-Length: 612
Last-Modified: Tue, 24 Dec 2019 13:07:53 GMT
Connection: keep-alive
ETag: "5e020da9-264"
Accept-Ranges: bytes
deployment 升级和回滚
- 命令行创建 deployment
kubectl run nginx --image=10.0.0.202:5000/nginx:1.14.2 --replicas=3 --record
- 命令行升级版本
kubectl set image deploy nginx nginx=10.0.0.202:5000/nginx:1.16.1 --record
- 查看 deployment 所有历史版本
kubectl rollout history deployment nginx
- deployment 回滚到上一个版本
kubectl rollout undo deployment nginx
- deployment 回滚到指定版本
kubectl rollout undo deployment nginx --to-revision=2
- 给 deploy 指定端口,会出现新的 SVC 服务
kubectl expose deployment nginx --type=NodePort --port=80
[root@master ~]# kubectl run nginx --image=10.0.0.202:5000/nginx:latest --replicas=3 --record
kubectl run --generator=deployment/apps.v1 is DEPRECATED and will be removed in a future version. Use kubectl run --generator=run-pod/v1 or kubectl create instead.
deployment.apps/nginx created
kubectl run--generator=deployment/apps.v1 已弃用,将在将来的版本中删除。改用 kubectl run--generator=run pod/v1 或 kubectl create。
九、创建 MySQL 服务
9.1 创建 MySQL 的 rc 以及 SVC 文件
RC 文件:
[root@master ~]# cat k8s/rc/k8s_mysql.yaml
apiVersion: v1
kind: ReplicationController
metadata:
name: mysql
spec:
replicas: 1
selector:
app: mysql
template:
metadata:
labels:
app: mysql
spec:
containers:
- name: mysql
image: 10.0.0.202:5000/mysql:5.7
ports:
- containerPort: 3306
env:
- name: MYSQL_ROOT_PASSWORD
value: '598941324'
svc 文件:
[root@master ~]# cat k8s/svc/k8s_mysql_svc.yaml
apiVersion: v1
kind: Service
metadata:
name: mysql
spec:
type: NodePort #ClusterIP
ports:
- port: 3306
nodePort: 30006
targetPort: 3306 #pod port
selector:
app: mysql
检查结果:
[root@node2 ~]# netstat -antup | grep 30006
tcp6 0 0 :::30006 :::* LISTEN 9501/kube-proxy
测试:
[root@master ~]# mysql -uroot -p598941324 -h192.168.50.175 -P30006
Welcome to the MariaDB monitor. Commands end with ; or \g.
Your MySQL connection id is 5
Server version: 5.7.29 MySQL Community Server (GPL)
Copyright (c) 2000, 2018, Oracle, MariaDB Corporation Ab and others.
Type 'help;' or '\h' for help. Type '\c' to clear the current input statement.
MySQL [(none)]> show databases;
+--------------------+
| Database |
+--------------------+
| information_schema |
| mysql |
| performance_schema |
| sys |
+--------------------+
4 rows in set (0.00 sec)
MySQL [(none)]>