kubernetes-v1.20.8二进制安装(二)-etcd
etcd 是基于 Raft 的分布式 KV 存储系统,由 CoreOS 开发,常用于服务发现、共享配置以及并发控制(如 leader 选举、分布式锁等)。
kubernetes 使用 etcd 集群持久化存储所有 API 对象、运行数据。
注意:
- 如果没有特殊指明,本文档的所有操作均在
OPS
节点上执行;
1. etcd部署
因节点有限,将etcd部署在三台master中, 为了方便管理, 会将etcd相关文件放置在kubernetes中master中并存
etcd相关的二进制和配置文件会在OPS的master目录存放, 最后会将整个master目录拷贝至对应节点目录中启动
OPS的工作目录为/opt/software/kubernetes
, 如有需求, 请自行修改(文章使用绝对路径或相对路径显示,不会出现变量路径)
1.1. 创建etcd证书
OPS操作
cfssl
支持SAN(Subject Alternative Name)
,它是X.509中定义的一个扩展,使用了SAN字段的SSL证书,可以扩展此证书支持的域名,即一个证书可以支持多个不同域名的解析,即下面的 *.k8s-host.com
,有利于节点的扩展,不用和以前的部署中提前将需要的IP写入到证书hosts中。
cat > /opt/software/kubernetes/certs/etcd-csr.json <<EOF
{
"CN": "etcd",
"hosts": [
"127.0.0.1",
"*.k8s-host.com"
],
"key": {
"algo": "rsa",
"size": 2048
},
"names": [{
"C": "CN",
"ST": "GuangZhou",
"L": "TianHe",
"O": "k8s",
"OU": "ops"
}]
}
EOF
1.2. 签发etcd证书文件
cd /opt/software/kubernetes/certs
cfssl gencert -ca=ca.pem -ca-key=ca-key.pem -config=ca-config.json -profile=kubernetes etcd-csr.json | cfssl-json -bare etcd
$ ll etcd-*.pem
-rw-r--r-- 1 root root 1436 6月 7 17:21 etcd.pem
-rw------- 1 root root 1679 6月 7 17:21 etcd-key.pem
将签发的证书放置在master角色目录中
cp etcd.pem etcd-key.pem /opt/software/kubernetes/master/certs/
1.3. 下载etcd安装包
当前版本为 3.4.16
ETCD_VERSION=v3.4.16
wget -P /opt/software/kubernetes/tools https://github.com/etcd-io/etcd/releases/download/v3.4.16/etcd-${ETCD_VERSION}-linux-amd64.tar.gz
1.4. 解压etcd安装包
cd /opt/software/kubernetes/master/bin
tar --strip-components=1 -xvpf /opt/software/kubernetes/tools/etcd-v3.4.16-linux-amd64.tar.gz etcd-v3.4.16-linux-amd64/{etcd,etcdctl}
chmod +x etcd*
# 复制etcdctl至ops的bin目录中,方便进行集群检查 (可选)
cp etcdctl /opt/software/kubernetes/bin
1.5. 生成etcd配置文件
不同节点的etcd配置参数需要修改, 请在拷贝后根据实际情况进行修改
配置选项如下:
name listen-peer-urls listen-client-urls
initial-advertise-peer-urls advertise-client-urls
请注意配置文件的路径以拷贝后的节点实际路径为准
cat >/opt/software/kubernetes/master/config/etcd-config.yaml <<EOF
name: etcd01
data-dir: /data/etcd
listen-peer-urls: https://10.0.0.10:2380
listen-client-urls: https://10.0.0.10:2379,http://127.0.0.1:2379
quota-backend-bytes: 8000000000
initial-advertise-peer-urls: https://etcd01.k8s-host.com:2380
advertise-client-urls: https://etcd01.k8s-host.com:2379,http://127.0.0.1:2379
initial-cluster: etcd01=https://etcd01.k8s-host.com:2380,etcd02=https://etcd02.k8s-host.com:2380,etcd03=https://etcd03.k8s-host.com:2380
initial-cluster-token: etcd-cluster
initial-cluster-state: new
debug: false
log-outputs: [stderr]
logger: zap
client-transport-security:
client-cert-auth: false
auto-tls: false
trusted-ca-file: /opt/software/kubernetes/certs/ca.pem
cert-file: /opt/software/kubernetes/certs/etcd.pem
key-file: /opt/software/kubernetes/certs/etcd-key.pem
peer-transport-security:
client-cert-auth: false
auto-tls: false
trusted-ca-file: /opt/software/kubernetes/certs/ca.pem
cert-file: /opt/software/kubernetes/certs/etcd.pem
key-file: /opt/software/kubernetes/certs/etcd-key.pem
EOF
--data-dir
:指定工作目录和数据目录,需在启动服务前创建这个目录;--name
:指定节点名称,当--initial-cluster-state
值为 new 时,--name 的参数值必须位于 --initial-cluster 列表中;--cert-file、--key-file
:etcd server 与 client 通信时使用的证书和私钥;--trusted-ca-file
:签名 client 证书的 CA 证书,用于验证 client 证书;--peer-cert-file、--peer-key-file
:etcd 与 peer 通信使用的证书和私钥;--peer-trusted-ca-file
:签名 peer 证书的 CA 证书,用于验证 peer 证书;
1.6. 创建service启动文件
请注意配置文件的路径以拷贝后的节点实际路径为准.
cat >/opt/software/kubernetes/master/service/etcd.service<<EOF
[Unit]
Description=Etcd Server
Documentation=https://github.com/etcd-io/etcd
After=network.target
After=network-online.target
Wants=network-online.target
[Service]
Type=notify
LimitNOFILE=65536
Restart=on-failure
RestartSec=5s
TimeoutStartSec=0
ExecStart=/opt/software/kubernetes/bin/etcd --config-file /opt/software/kubernetes/config/etcd-config.yaml
[Install]
WantedBy=multi-user.target
EOF
1.7. 分发
复制后修改配置相关信息
这里使用for循环目录进行拷贝操作,是因为config
目录中配置文件在各个节点存在差异,后续的服务如果在拷贝config
目录会覆盖掉修改后的配置, 所以需要单独区分出来. 我在脚本中使用了copy和template模块,防止配置的覆盖,且能有效的修改配置文件(脚本)
hosts=(master01 master02 master03)
domain='k8s-host.com'
config_files=('config/etcd-config.yaml')
cd /opt/software/kubernetes
for host in ${hosts[*]}
do
scp -r master/{bin,certs,service} ${host}.${domain}:/opt/software/kubernetes/
done
# 不可重复执行,否则覆盖修改的配置
for host in ${hosts[*]}
do
for file in ${config_files[*]}
do
scp -r master/${file} ${host}.${domain}:/opt/software/kubernetes/${file}
done
done
1.8. 节点启动
将etcd.service
软链接至/usr/lib/systemd/system/
中, 方便配置 文件的维护管理
软链接至/etc/systemd/system
会可能出现报错 Failed to execute operation: Too many levels of symbolic links
hosts=(master01 master02 master03)
domain='k8s-host.com'
for host in ${hosts[*]}
do
ssh root@${host}.${domain} "ln -s /opt/software/kubernetes/service/etcd.service /usr/lib/systemd/system/etcd.service"
done
启动 需要有多个etcd启动 才会成功, 所以需要到各个节点启动
systemctl daemon-reload && systemctl enable etcd --now
1.9. 验证集群信息
source /etc/profile
cd /opt/software/kubernetes/certs/
etcd_endpoints="https://etcd01.k8s-host.com:2379,https://etcd02.k8s-host.com:2379,https://etcd03.k8s-host.com:2379"
etcdctl --endpoints=${etcd_endpoints} --cacert=./ca.pem --cert=./etcd.pem --key=./etcd-key.pem --write-out=table endpoint status
+----------------------------------+------------------+---------+---------+-----------+------------+
| ENDPOINT | ID | VERSION | DB SIZE | IS LEADER | IS LEARNER |
+----------------------------------+------------------+---------+---------+-----------+------------+
| https://etcd01.k8s-host.com:2379 | fb29bd4eb2954a8e | 3.4.16 | 20 kB | true | false |
| https://etcd02.k8s-host.com:2379 | 9fe2e4f759b4a0f6 | 3.4.16 | 20 kB | false | false |
| https://etcd03.k8s-host.com:2379 | af64d84cb227445f | 3.4.16 | 33 kB | false | false |
+----------------------------------+------------------+---------+---------+-----------+------------+
-
内容过多, 省略部分信息. 如节点无法正常连接则会报错如下信息
{"level":"warn","ts":"2021-09-03T16:52:41.766+0800","caller":"clientv3/retry_interceptor.go:62","msg":"retrying of unary invoker failed","target":"passthrough:///https://etcd01.k8s-host:2379","attempt":0,"error":"rpc error: code = DeadlineExceeded desc = latest balancer error: connection error: desc = \"transport: Error while dialing dial tcp 10.0.0.10:2379: connect: connection refused\""} Failed to get the status of endpoint https://etcd01.k8s-host:2379 (context deadline exceeded)
2. 问题
discovery failed
启动失败的原因在于 data-dir
中记录的信息与 etcd启动的选项所标识的信息不太匹配造成的。
解决方法一:删除所有etcd节点的 data-dir
文件
解决方法二:复制其他节点的data-dir
中的内容,以此为基础上以 --force-new-cluster
的形式强行拉起一个,然后以添加新成员的方式恢复这个集群。