kubernetes-v1.20.8二进制安装(二)-etcd

etcd 是基于 Raft 的分布式 KV 存储系统,由 CoreOS 开发,常用于服务发现、共享配置以及并发控制(如 leader 选举、分布式锁等)。

kubernetes 使用 etcd 集群持久化存储所有 API 对象、运行数据。

注意:

  • 如果没有特殊指明,本文档的所有操作均在 OPS 节点上执行;

1. etcd部署

因节点有限,将etcd部署在三台master中, 为了方便管理, 会将etcd相关文件放置在kubernetes中master中并存

etcd相关的二进制和配置文件会在OPS的master目录存放, 最后会将整个master目录拷贝至对应节点目录中启动

OPS的工作目录为/opt/software/kubernetes, 如有需求, 请自行修改(文章使用绝对路径或相对路径显示,不会出现变量路径)

1.1. 创建etcd证书

OPS操作

cfssl支持SAN(Subject Alternative Name),它是X.509中定义的一个扩展,使用了SAN字段的SSL证书,可以扩展此证书支持的域名,即一个证书可以支持多个不同域名的解析,即下面的 *.k8s-host.com,有利于节点的扩展,不用和以前的部署中提前将需要的IP写入到证书hosts中。

cat > /opt/software/kubernetes/certs/etcd-csr.json <<EOF
{
  "CN": "etcd",
  "hosts": [
    "127.0.0.1",
    "*.k8s-host.com"
  ],
  "key": {
    "algo": "rsa",
    "size": 2048
  },
  "names": [{
    "C": "CN",
    "ST": "GuangZhou",
    "L": "TianHe",
    "O": "k8s",
    "OU": "ops"
  }]
}
EOF

1.2. 签发etcd证书文件

cd /opt/software/kubernetes/certs

cfssl gencert -ca=ca.pem -ca-key=ca-key.pem -config=ca-config.json -profile=kubernetes etcd-csr.json | cfssl-json  -bare etcd

$ ll etcd-*.pem
-rw-r--r-- 1 root root 1436 6月   7 17:21 etcd.pem
-rw------- 1 root root 1679 6月   7 17:21 etcd-key.pem

将签发的证书放置在master角色目录中

cp etcd.pem etcd-key.pem /opt/software/kubernetes/master/certs/

1.3. 下载etcd安装包

当前版本为 3.4.16

ETCD_VERSION=v3.4.16
wget -P /opt/software/kubernetes/tools https://github.com/etcd-io/etcd/releases/download/v3.4.16/etcd-${ETCD_VERSION}-linux-amd64.tar.gz 

1.4. 解压etcd安装包

cd /opt/software/kubernetes/master/bin
tar --strip-components=1   -xvpf /opt/software/kubernetes/tools/etcd-v3.4.16-linux-amd64.tar.gz  etcd-v3.4.16-linux-amd64/{etcd,etcdctl}

chmod +x etcd*

# 复制etcdctl至ops的bin目录中,方便进行集群检查 (可选) 
cp etcdctl /opt/software/kubernetes/bin

1.5. 生成etcd配置文件

不同节点的etcd配置参数需要修改, 请在拷贝后根据实际情况进行修改
配置选项如下:
name listen-peer-urls listen-client-urls
initial-advertise-peer-urls advertise-client-urls

请注意配置文件的路径以拷贝后的节点实际路径为准

cat >/opt/software/kubernetes/master/config/etcd-config.yaml <<EOF
name: etcd01
data-dir: /data/etcd
listen-peer-urls: https://10.0.0.10:2380
listen-client-urls: https://10.0.0.10:2379,http://127.0.0.1:2379
quota-backend-bytes: 8000000000
initial-advertise-peer-urls: https://etcd01.k8s-host.com:2380
advertise-client-urls: https://etcd01.k8s-host.com:2379,http://127.0.0.1:2379
initial-cluster: etcd01=https://etcd01.k8s-host.com:2380,etcd02=https://etcd02.k8s-host.com:2380,etcd03=https://etcd03.k8s-host.com:2380
initial-cluster-token: etcd-cluster
initial-cluster-state: new

debug: false
log-outputs: [stderr]
logger: zap

client-transport-security:
  client-cert-auth: false
  auto-tls: false
  trusted-ca-file: /opt/software/kubernetes/certs/ca.pem
  cert-file: /opt/software/kubernetes/certs/etcd.pem
  key-file: /opt/software/kubernetes/certs/etcd-key.pem
  
peer-transport-security:
  client-cert-auth: false
  auto-tls: false
  trusted-ca-file: /opt/software/kubernetes/certs/ca.pem
  cert-file: /opt/software/kubernetes/certs/etcd.pem
  key-file: /opt/software/kubernetes/certs/etcd-key.pem
EOF
  • --data-dir:指定工作目录和数据目录,需在启动服务前创建这个目录;
  • --name:指定节点名称,当 --initial-cluster-state 值为 new 时,--name 的参数值必须位于 --initial-cluster 列表中;
  • --cert-file、--key-file:etcd server 与 client 通信时使用的证书和私钥;
  • --trusted-ca-file:签名 client 证书的 CA 证书,用于验证 client 证书;
  • --peer-cert-file、--peer-key-file:etcd 与 peer 通信使用的证书和私钥;
  • --peer-trusted-ca-file:签名 peer 证书的 CA 证书,用于验证 peer 证书;

1.6. 创建service启动文件

请注意配置文件的路径以拷贝后的节点实际路径为准.

cat >/opt/software/kubernetes/master/service/etcd.service<<EOF
[Unit]
Description=Etcd Server
Documentation=https://github.com/etcd-io/etcd
After=network.target
After=network-online.target
Wants=network-online.target

[Service]
Type=notify
LimitNOFILE=65536
Restart=on-failure
RestartSec=5s
TimeoutStartSec=0
ExecStart=/opt/software/kubernetes/bin/etcd --config-file /opt/software/kubernetes/config/etcd-config.yaml

[Install]
WantedBy=multi-user.target
EOF

1.7. 分发

复制后修改配置相关信息

这里使用for循环目录进行拷贝操作,是因为config目录中配置文件在各个节点存在差异,后续的服务如果在拷贝config目录会覆盖掉修改后的配置, 所以需要单独区分出来. 我在脚本中使用了copy和template模块,防止配置的覆盖,且能有效的修改配置文件(脚本)

hosts=(master01 master02 master03)
domain='k8s-host.com'
config_files=('config/etcd-config.yaml')
cd /opt/software/kubernetes
for host in ${hosts[*]}
do
    scp -r master/{bin,certs,service} ${host}.${domain}:/opt/software/kubernetes/
done


# 不可重复执行,否则覆盖修改的配置
for host in ${hosts[*]}
do
	for file in ${config_files[*]}
	do
    	scp -r master/${file} ${host}.${domain}:/opt/software/kubernetes/${file}
    done
done

1.8. 节点启动

etcd.service软链接至/usr/lib/systemd/system/中, 方便配置 文件的维护管理

软链接至/etc/systemd/system会可能出现报错 Failed to execute operation: Too many levels of symbolic links

hosts=(master01 master02 master03)
domain='k8s-host.com'
for host in ${hosts[*]}
do
	ssh root@${host}.${domain} "ln -s /opt/software/kubernetes/service/etcd.service /usr/lib/systemd/system/etcd.service"
done

启动 需要有多个etcd启动 才会成功, 所以需要到各个节点启动

systemctl daemon-reload && systemctl enable etcd --now

1.9. 验证集群信息

source /etc/profile
cd /opt/software/kubernetes/certs/
etcd_endpoints="https://etcd01.k8s-host.com:2379,https://etcd02.k8s-host.com:2379,https://etcd03.k8s-host.com:2379"
etcdctl --endpoints=${etcd_endpoints} --cacert=./ca.pem --cert=./etcd.pem  --key=./etcd-key.pem  --write-out=table endpoint status
+----------------------------------+------------------+---------+---------+-----------+------------+
|             ENDPOINT             |        ID        | VERSION | DB SIZE | IS LEADER | IS LEARNER |
+----------------------------------+------------------+---------+---------+-----------+------------+
| https://etcd01.k8s-host.com:2379 | fb29bd4eb2954a8e |  3.4.16 |   20 kB |      true |      false |
| https://etcd02.k8s-host.com:2379 | 9fe2e4f759b4a0f6 |  3.4.16 |   20 kB |     false |      false |
| https://etcd03.k8s-host.com:2379 | af64d84cb227445f |  3.4.16 |   33 kB |     false |      false |
+----------------------------------+------------------+---------+---------+-----------+------------+
  • 内容过多, 省略部分信息. 如节点无法正常连接则会报错如下信息

    {"level":"warn","ts":"2021-09-03T16:52:41.766+0800","caller":"clientv3/retry_interceptor.go:62","msg":"retrying of unary invoker failed","target":"passthrough:///https://etcd01.k8s-host:2379","attempt":0,"error":"rpc error: code = DeadlineExceeded desc = latest balancer error: connection error: desc = \"transport: Error while dialing dial tcp 10.0.0.10:2379: connect: connection refused\""}
    Failed to get the status of endpoint https://etcd01.k8s-host:2379 (context deadline exceeded)
    

2. 问题

discovery failed

启动失败的原因在于 data-dir 中记录的信息与 etcd启动的选项所标识的信息不太匹配造成的。

解决方法一:删除所有etcd节点的 data-dir 文件

解决方法二:复制其他节点的data-dir中的内容,以此为基础上以 --force-new-cluster 的形式强行拉起一个,然后以添加新成员的方式恢复这个集群。

3. 参考链接

etcd 3.4官方文档

etcdlab-官方实工作流

etcd 3.4.16 githup下载

posted @ 2021-08-19 15:03  风吹蛋生丶  阅读(383)  评论(0编辑  收藏  举报