etcd集群搭建和日常操作

etcd介绍

系统要求

由于 etcd 将数据写入磁盘，因此其性能很大程度上取决于磁盘性能。因此，强烈推荐使用 SSD。要评估磁盘是否足够快用于 etcd，一种可能性是使用磁盘基准测试工具，例如fio。有关如何执行此操作的示例，请阅读此处. 为了防止性能下降或无意中使键值存储超载，etcd 强制将可配置的存储大小配额默认设置为 2GB。为避免交换或内存不足，机器应至少有足够多的 RAM 来覆盖配额。8GB 是正常环境的建议最大大小，如果配置的值超过该值，etcd 会在启动时发出警告。在 CoreOS，etcd 集群通常部署在具有双核处理器、2GB RAM 和至少 80GB SSD 的专用 CoreOS Container Linux 机器上。请注意，性能本质上取决于工作负载；请在生产部署之前进行测试。有关更多建议，请参阅硬件。

为什么是奇数个集群成员？

一个 etcd 集群需要大多数节点（一个仲裁）来就集群状态的更新达成一致。对于具有 n 个成员的集群，quorum 为 (n/2)+1。对于任何奇数大小的集群，添加一个节点总是会增加仲裁所需的节点数。尽管将节点添加到奇数大小的集群看起来更好，因为有更多的机器，但容错性更差，因为完全相同数量的节点可能会失败而不会丢失仲裁，但是有更多的节点可能会失败。如果集群处于无法容忍更多故障的状态，在删除节点之前添加节点是危险的，因为如果新节点无法在集群中注册（例如，地址配置错误），quorum 将永久丢失

最大集群大小是多少？

理论上，没有硬性限制。然而，一个 etcd 集群可能不应该超过七个节点。谷歌 Chubby 锁服务，类似于 etcd，并在谷歌内部广泛部署多年，建议运行五个节点。一个 5 成员的 etcd 集群可以容忍两个成员的故障，这在大多数情况下就足够了。尽管较大的集群提供了更好的容错能力，但写入性能会受到影响，因为必须在更多机器上复制数据

我应该在删除不健康的成员之前添加一个成员吗？

替换 etcd 节点时，重要的是先删除成员，然后添加其替换

为什么 etcd 会因磁盘延迟峰值而失去其领导者？

这是故意的；磁盘延迟是领导者活跃度的一部分。假设集群领导者需要一分钟时间将 raft 日志更新同步到磁盘，但 etcd 集群有一秒的选举超时。即使领导者可以在选举间隔内处理网络消息（例如，发送心跳），它实际上是不可用的，因为它不能提交任何新提案；它正在慢速磁盘上等待。如果集群由于磁盘延迟而频繁失去其领导者，请尝试调整磁盘设置或 etcd 时间参数

etcd集群搭建

环境：一台物理机，通过不同的端口跑出3个节点的etcd集群，建议奇数节点，以防止脑裂

第一步，下载etcd安装包

wget -c https://github.com/etcd-io/etcd/releases/download/v3.5.2/etcd-v3.5.2-linux-amd64.tar.gz

第二步，解压，然后新建配置etcd1.conf文件

name: etcd-1
data-dir: /root/etcd1/data 
listen-client-urls: http://0.0.0.0:2379
advertise-client-urls: http://127.0.0.1:2379
listen-peer-urls: http://0.0.0.0:2380
initial-advertise-peer-urls: http://127.0.0.1:2380
initial-cluster: etcd-1=http://127.0.0.1:2380,etcd-2=http://127.0.0.1:2480,etcd-3=http://127.0.0.1:2580
initial-cluster-token: etcd-cluster-my
initial-cluster-state: new

etcd2.conf配置文件

name: etcd-2
data-dir: /root/etcd2/data
listen-client-urls: http://0.0.0.0:2479
advertise-client-urls: http://127.0.0.1:2479
listen-peer-urls: http://0.0.0.0:2480
initial-advertise-peer-urls: http://127.0.0.1:2480
initial-cluster: etcd-1=http://127.0.0.1:2380,etcd-2=http://127.0.0.1:2480,etcd-3=http://127.0.0.1:2580
initial-cluster-token: etcd-cluster-my
initial-cluster-state: new

etcd3.conf配置文件

name: etcd-3
data-dir: /root/etcd3/data 
listen-client-urls: http://0.0.0.0:2579
advertise-client-urls: http://127.0.0.1:2579
listen-peer-urls: http://0.0.0.0:2580
initial-advertise-peer-urls: http://127.0.0.1:2580
initial-cluster: etcd-1=http://127.0.0.1:2380,etcd-2=http://127.0.0.1:2480,etcd-3=http://127.0.0.1:2580
initial-cluster-token: etcd-cluster-my
initial-cluster-state: new

编辑脚本

#!/bin/bash
CRTDIR=$(pwd)
servers=("etcd1" "etcd2" "etcd3")
for server in ${servers[@]}
do
cd ${CRTDIR}/$server
nohup ./etcd --config-file=etcd.conf &
echo $?
done

运行脚本后查看进程和端口

查看集群状态

[root@VM-0-15-centos etcd1]# ./etcdctl --write-out=table --endpoints=127.0.0.1:2379,127.0.0.1:2479,127.0.0.1:2579 endpoint status
+----------------+------------------+---------+---------+-----------+------------+-----------+------------+--------------------+--------+
|    ENDPOINT    |        ID        | VERSION | DB SIZE | IS LEADER | IS LEARNER | RAFT TERM | RAFT INDEX | RAFT APPLIED INDEX | ERRORS |
+----------------+------------------+---------+---------+-----------+------------+-----------+------------+--------------------+--------+
| 127.0.0.1:2379 | 47a42fb96a975854 |   3.5.2 |   20 kB |     false |      false |         4 |         31 |                 31 |        |
| 127.0.0.1:2479 | 72ab37cc61e2023b |   3.5.2 |   20 kB |     false |      false |         4 |         31 |                 31 |        |
| 127.0.0.1:2579 | 470f778210a711ed |   3.5.2 |   20 kB |      true |      false |         4 |         31 |                 31 |        |
+----------------+------------------+---------+---------+-----------+------------+-----------+------------+--------------------+--------+

[root@VM-0-15-centos etcd1]# ./etcdctl --endpoints=$ENDPOINTS endpoint health
127.0.0.1:2579 is healthy: successfully committed proposal: took = 5.033785ms
127.0.0.1:2479 is healthy: successfully committed proposal: took = 5.003254ms
127.0.0.1:2379 is healthy: successfully committed proposal: took = 4.990036ms

[root@VM-0-15-centos etcd1]# ./etcdctl -w table member list
+------------------+---------+--------+-----------------------+-----------------------+------------+
|        ID        | STATUS  |  NAME  |      PEER ADDRS       |     CLIENT ADDRS      | IS LEARNER |
+------------------+---------+--------+-----------------------+-----------------------+------------+
| 470f778210a711ed | started | etcd-3 | http://127.0.0.1:2580 | http://127.0.0.1:2579 |      false |
| 47a42fb96a975854 | started | etcd-1 | http://127.0.0.1:2380 | http://127.0.0.1:2379 |      false |
| 72ab37cc61e2023b | started | etcd-2 | http://127.0.0.1:2480 | http://127.0.0.1:2479 |      false |
+------------------+---------+--------+-----------------------+-----------------------+------------+

日常操作

下面的ENDPOINTS="127.0.0.1:2379,127.0.0.1:2479,127.0.0.1:2579"

#添加数据
[root@VM-0-15-centos etcd1]# ./etcdctl put /etc/password 123456

#删除数据
[root@VM-0-15-centos etcd1]# ./etcdctl del /etc/password

#取出数据
[root@VM-0-15-centos etcd1]# ./etcdctl --endpoints=$ENDPOINTS get /etc/password

#查看所有key
[root@VM-0-15-centos etcd1]# ./etcdctl --endpoints=$ENDPOINTS get / --prefix --keys-only

#添加角色
etcdctl --endpoints=${ENDPOINTS} role add root

#授予读写权限
etcdctl --endpoints=${ENDPOINTS} role grant-permission root readwrite foo

#查看root角色
etcdctl --endpoints=${ENDPOINTS} role get root


#添加用户root
etcdctl --endpoints=${ENDPOINTS} user add root

#给root用户添加root角色
etcdctl --endpoints=${ENDPOINTS} user grant-role root root


#获取root用户
etcdctl --endpoints=${ENDPOINTS} user get root


#开启认证
etcdctl --endpoints=${ENDPOINTS} auth enable


#开启认证后添加数据
etcdctl --endpoints=${ENDPOINTS} --user=root:123 put foo bar

#如果不给密码就获取不到foo这个key
etcdctl --endpoints=${ENDPOINTS} get foo

#开启认证后查看foo这个key
etcdctl --endpoints=${ENDPOINTS} --user=root:123 get foo

#监视一个key
./etcdctl watch /aaa/name

#查看报警
./etcdctl alarm list

#解除报警
./etcdctl alarm disarm

#碎片整理
介绍：DEFRAG 在 etcd 运行时对一组给定端点的后端数据库文件进行碎片整理。当 etcd 成员从已删除和压缩的键中回收存储空间时，该空间将保留在空闲列表中，并且数据库文件保持相同大小。通过对数据库进行碎片整理，etcd 成员将这些空闲空间释放回文件系统。
注意：要离线碎片整理（--data-dir标志），请使用：etcutl defrag代替
请注意，对活动成员进行碎片整理会阻止系统在重建其状态时读取和写入数据。
请注意，碎片整理请求不会通过集群进行复制。即请求只应用于本地节点。--endpoints在flag 或flag 中指定所有成员--cluster以自动查找所有集群成员

./etcdctl --endpoints=localhost:2379,badendpoint:2379 defrag

./etcdctl defrag --cluster

etcd v2迁移到etcd v3

# write key in etcd version 2 store
export ETCDCTL_API=2
etcdctl --endpoints=http://$ENDPOINT set foo bar

# read key in etcd v2
etcdctl --endpoints=$ENDPOINTS --output="json" get foo

# stop etcd node to migrate, one by one

# migrate v2 data
export ETCDCTL_API=3
etcdctl --endpoints=$ENDPOINT migrate --data-dir="default.etcd" --wal-dir="default.etcd/member/wal"

# restart etcd node after migrate, one by one

# confirm that the key got migrated
etcdctl --endpoints=$ENDPOINTS get /foo

posted @ 2022-04-07 11:10 力王7314 阅读(765) 评论(0) 编辑收藏举报

刷新页面返回顶部