CentOS 7 ETCD集群配置
前言
Etcd 是 CoreOS 基于 Raft 开发的分布式 key-value 存储,可用于服务发现、共享配置以及一致性保障(如数据库选主、分布式锁等)
本次环境,是用于k8s集群,由于在二进制部署 k8s 中,由于 Etcd 集群导致各种各样的问题,特意抽出时间来研究 Etcd 集群。
Etcd 集群配置分为三种:
- 静态发现
- Etcd 动态发现
- DNS 动态发现 通过DNS的SRV解析动态发现集群
本次主要基于 静态发现 和 DNS动态发现 两种,并结合自签的TLS证书来创建集群。
环境准备
此环境实际用于 k8s 中的ETCD集群使用,用于本次文档
主机名 | 角色 | IP | 系统版本 | 内核版本 |
---|---|---|---|---|
docker01.k8s.cn | docker-01 | 192.168.1.222 | CentOS Linux release 7.4.1708 (Core) | 3.10.0-693.el7.x86_64 |
docker02.k8s.cn | docker-02 | 192.168.1.221 | CentOS Linux release 7.4.1708 (Core) | 3.10.0-693.el7.x86_64 |
docker03.k8s.cn | docker-03 | 192.168.1.223 | CentOS Linux release 7.4.1708 (Core) | 3.10.0-693.el7.x86_64 |
安装
在三台机器上均执行
[root@docker-01 ~]# yum install etcd -y [root@docker-01 ~]# rpm -qa etcd etcd-3.3.11-2.el7.centos.x86_64
创建Etcd所需目录,在三台机器上均执行
[root@docker-01 ~]#mkdir /data/k8s/etcd/{data,wal} -p [root@docker-01 ~]#mkdir -p /etc/kubernetes/cert [root@docker-01 ~]#chown -R etcd.etcd /data/k8s/etcd
三台机器防火墙放通2379和2380端口(最好是同步一下时间)
[root@docker-01 ~]# firewall-cmd --zone=public --add-port=2379/tcp --permanent success [root@docker-01 ~]# firewall-cmd --zone=public --add-port=2380/tcp --permanent success [root@docker-01 ~]# firewall-cmd --reload success [root@docker-01 ~]# firewall-cmd --list-ports 2379/tcp 2380/tcp
静态集群
配置
docker-01配置文件
[root@docker-01 ~]# cat /etc/etcd/etcd.conf ETCD_DATA_DIR="/data/k8s/etcd/data" ETCD_WAL_DIR="/data/k8s/etcd/wal" ETCD_LISTEN_PEER_URLS="http://192.168.1.222:2380" ETCD_LISTEN_CLIENT_URLS="http://192.168.1.222:2379,http://127.0.0.1:2379" ETCD_MAX_SNAPSHOTS="5" ETCD_MAX_WALS="5" ETCD_NAME="etcd1" ETCD_SNAPSHOT_COUNT="100000" ETCD_HEARTBEAT_INTERVAL="100" ETCD_ELECTION_TIMEOUT="1000" ETCD_INITIAL_ADVERTISE_PEER_URLS="http://192.168.1.222:2380" ETCD_ADVERTISE_CLIENT_URLS="http://192.168.1.222:2379" ETCD_INITIAL_CLUSTER="etcd1=http://192.168.1.222:2380,etcd2=http://192.168.1.221:2380,etcd3=http://192.168.1.223:2380" ETCD_INITIAL_CLUSTER_TOKEN="etcd-cluster" ETCD_INITIAL_CLUSTER_STATE="new"
docker-02配置文件
[root@docker-02 ~]# cat /etc/etcd/etcd.conf ETCD_DATA_DIR="/data/k8s/etcd/data" ETCD_WAL_DIR="/data/k8s/etcd/wal" ETCD_LISTEN_PEER_URLS="http://192.168.1.221:2380" ETCD_LISTEN_CLIENT_URLS="http://192.168.1.221:2379,http://127.0.0.1:2379" ETCD_MAX_SNAPSHOTS="5" ETCD_MAX_WALS="5" ETCD_NAME="etcd2" ETCD_SNAPSHOT_COUNT="100000" ETCD_HEARTBEAT_INTERVAL="100" ETCD_ELECTION_TIMEOUT="1000" ETCD_INITIAL_ADVERTISE_PEER_URLS="http://192.168.1.221:2380" ETCD_ADVERTISE_CLIENT_URLS="http://192.168.1.221:2379" ETCD_INITIAL_CLUSTER="etcd1=http://192.168.1.222:2380,etcd2=http://192.168.1.221:2380,etcd3=http://192.168.1.223:2380" ETCD_INITIAL_CLUSTER_TOKEN="etcd-cluster" ETCD_INITIAL_CLUSTER_STATE="new"
docker-03配置文件
[root@docker-03 ~]# cat /etc/etcd/etcd.conf ETCD_DATA_DIR="/data/k8s/etcd/data" ETCD_WAL_DIR="/data/k8s/etcd/wal" ETCD_LISTEN_PEER_URLS="http://192.168.1.223:2380" ETCD_LISTEN_CLIENT_URLS="http://192.168.1.223:2379" ETCD_MAX_SNAPSHOTS="5" ETCD_MAX_WALS="5" ETCD_NAME="etcd3" ETCD_SNAPSHOT_COUNT="100000" ETCD_HEARTBEAT_INTERVAL="100" ETCD_ELECTION_TIMEOUT="1000" ETCD_INITIAL_ADVERTISE_PEER_URLS="http://192.168.1.223:2380" ETCD_ADVERTISE_CLIENT_URLS="http://192.168.1.223:2379,http://127.0.0.1:2379" ETCD_INITIAL_CLUSTER="etcd1=http://192.168.1.222:2380,etcd2=http://192.168.1.221:2380,etcd3=http://192.168.1.223:2380" ETCD_INITIAL_CLUSTER_TOKEN="etcd-cluster" ETCD_INITIAL_CLUSTER_STATE="new"
启动测试
[root@docker-01 ~]# systemctl stop firewalld [root@docker-01 ~]# systemctl status etcd ● etcd.service - Etcd Server Loaded: loaded (/usr/lib/systemd/system/etcd.service; disabled; vendor preset: disabled) Active: active (running) since 一 2020-01-13 15:43:24 CST; 2min 1s ago Main PID: 2298 (etcd) Memory: 31.9M CGroup: /system.slice/etcd.service └─2298 /usr/bin/etcd --name=etcd1 --data-dir=/data/k8s/etcd/data --listen-client-urls=http://192.168.1.222:2379 1月 13 15:45:05 docker-01 etcd[2298]: raft.node: 164a311aff833bc1 elected leader c36e0ffc3c8f0b6 at term 70 1月 13 15:45:10 docker-01 etcd[2298]: health check for peer b1eeb25e6baf68e0 could not connect: dial tcp 192.168.1.221:2380: connect: no route to host (pr..._MESSAGE") 1月 13 15:45:10 docker-01 etcd[2298]: health check for peer b1eeb25e6baf68e0 could not connect: dial tcp 192.168.1.221:2380: connect: no route to host (pr...SNAPSHOT") 1月 13 15:45:11 docker-01 etcd[2298]: peer b1eeb25e6baf68e0 became active 1月 13 15:45:11 docker-01 etcd[2298]: established a TCP streaming connection with peer b1eeb25e6baf68e0 (stream Message reader) 1月 13 15:45:11 docker-01 etcd[2298]: established a TCP streaming connection with peer b1eeb25e6baf68e0 (stream MsgApp v2 reader) 1月 13 15:45:15 docker-01 etcd[2298]: the clock difference against peer b1eeb25e6baf68e0 is too high [2m1.808827774s > 1s] (prober "ROUND_TRIPPER_SNAPSHOT") 1月 13 15:45:15 docker-01 etcd[2298]: the clock difference against peer b1eeb25e6baf68e0 is too high [2m1.808608709s > 1s] (prober "ROUND_TRIPPER_RAFT_MESSAGE") 1月 13 15:45:15 docker-01 etcd[2298]: updated the cluster version from 3.0 to 3.3 1月 13 15:45:15 docker-01 etcd[2298]: enabled capabilities for version 3.3 Hint: Some lines were ellipsized, use -l to show in full.
查看集群状态
[root@docker-01 ~]# ETCDCTL_API=3 etcdctl --endpoints=http://192.168.1.222:2379,http://192.168.1.221:2379,http://192.168.1.223:2379 endpoint health http://192.168.1.221:2379 is healthy: successfully committed proposal: took = 4.237397ms http://192.168.1.223:2379 is healthy: successfully committed proposal: took = 6.593361ms http://192.168.1.222:2379 is healthy: successfully committed proposal: took = 6.935029ms [root@docker-01 ~]# etcdctl --endpoints=http://192.168.1.222:2379,http://192.168.1.221:2379,http://192.168.1.223:2379 cluster-health member c36e0ffc3c8f0b6 is healthy: got healthy result from http://192.168.1.223:2379 member 164a311aff833bc1 is healthy: got healthy result from http://192.168.1.222:2379 member b1eeb25e6baf68e0 is healthy: got healthy result from http://192.168.1.221:2379 cluster is healthy
在任意一个节点上使用etcdctl验证集群状态:
[root@docker-01 ~]# etcdctl cluster-health member 164a311aff833bc1 is healthy: got healthy result from http://192.168.1.222:2379 member b1eeb25e6baf68e0 is healthy: got healthy result from http://192.168.1.221:2379 member e7c8f1a60e57abe4 is healthy: got healthy result from http://192.168.1.220:2379 cluster is healthy
集群之间通信介绍 集群服务中的通信一般包括两种场景: 对外提供服务的通信,发生在集群外部的客户端和集群某个节点之间,etcd默认端口为2379,例如 etcdctl 就属于客户端 集群内部的通信,发生在集群内部的任意两个节点之间,etcd的默认端口为2380, 刚安装完etcd可以看到配置文件里面都是http,这是不安全的,为了加强集群通信安全,需要使用https,下面就要介绍如何使用https来访问集群
创建RootCA
安装pki证书管理工具cfssl
安装cfssl工具
只要把安装包改下名字,移动到usr/local/bin/下,加上授权即可
[root@docker-01 ~]# mv cfssl-certinfo_linux-amd64 /usr/local/bin/cfssl-certinfo [root@docker-01 ~]# mv cfssl_linux-amd64 /usr/local/bin/cfssl [root@docker-01 ~]# mv cfssljson_linux-amd64 /usr/local/bin/cfssljson [root@docker-01 ~]# chmod +x /usr/local/bin/cfssl*
配置PKI 证书分两种情况: 服务器与客户端之间的通信,这种情况下服务器的证书仅用于服务器认证,客户端证书仅用于客户端认证 服务器间的通信,这种情况下每个etcd既是服务器也是客户端,因此其证书既要用于服务器认证,也要用于客户端认证
创建PKI配置文件
[root@docker-01 ~]# mkdir /etc/etcd/pki [root@docker-01 ~]# cd /etc/etcd/pki [root@docker-01 pki]# cfssl print-defaults config > ca-config.json [root@docker-01 pki]# cat > ca-config.json << EOF > { > "signing": { > "default": { > "expiry": "168h" > }, > "profiles": { > "server": { > "expiry": "8760h", > "usages": [ > "signing", > "key encipherment", > "server auth" > ] > }, > "client": { > "expiry": "8760h", > "usages": [ > "signing", > "key encipherment", > "client auth" > ] > }, > "peer": { > "expiry": "8760h", > "usages": [ > "signing", > "key encipherment", > "server auth", > "client auth" > ] > } > } > } > } > EOF
在其中定义3个profile server,作为服务器与客户端通信时的服务器证书 client,作为服务器与客户端通信时的客户端证书 peer,作为服务器间通信时用的证书,既认证服务器也认证客户端
创建RootCA证书
[root@docker-01 pki]# cfssl print-defaults csr > rootca-csr.json [root@docker-01 pki]# cat > rootca-csr.json <<EOF > { > "CN": "ETCD Root CA", > "key": { > "algo": "ecdsa", > "size": 256 > }, > "names": [ > { > "C": "US", > "L": "CA", > "ST": "San Francisco" > } > ] > } > EOF
[root@docker-01 pki]# cfssl gencert -initca rootca-csr.json | cfssljson -bare rootca 2020/01/13 21:23:35 [INFO] generating a new CA key and certificate from CSR 2020/01/13 21:23:35 [INFO] generate received request 2020/01/13 21:23:35 [INFO] received CSR 2020/01/13 21:23:35 [INFO] generating key: ecdsa-256 2020/01/13 21:23:35 [INFO] encoded CSR 2020/01/13 21:23:35 [INFO] signed certificate with serial number 314506356009985722822001466953482490446626166553 [root@docker-01 pki]# ls rootca* rootca.csr rootca-csr.json rootca-key.pem rootca.pem
把根CA证书拷贝到集群的所有节点当中: