etcd api接口
基本操作api: https://github.com/coreos/etcd/blob/6acb3d67fbe131b3b2d5d010e00ec80182be4628/Documentation/v2/api.md
集群配置api: https://github.com/coreos/etcd/blob/6acb3d67fbe131b3b2d5d010e00ec80182be4628/Documentation/v2/members_api.md
鉴权认证api: https://github.com/coreos/etcd/blob/6acb3d67fbe131b3b2d5d010e00ec80182be4628/Documentation/v2/auth_api.md
配置项:https://github.com/coreos/etcd/blob/master/Documentation/op-guide/configuration.md
https://coreos.com/etcd/docs/latest/runtime-configuration.html
https://coreos.com/etcd/docs/latest/clustering.html
https://coreos.com/etcd/docs/latest/runtime-configuration.html
https://coreos.com/etcd/docs/latest/
https://coreos.com/etcd/docs/latest/admin_guide.html#disaster-recovery
采用标准的restful 接口,支持http 和 https 两种协议。
运行单机etcd服务
1 ./bin/etcd
监听localhost和从IANA分配的端口,2379用于同client通讯,2389用于server与server直接的通讯。
获取版本 /version
1 [root@vStack ~]# curl http://127.0.0.1:2379/version | python -m json.tool 2 % Total % Received % Xferd Average Speed Time Time Time Current 3 Dload Upload Total Spent Left Speed 4 100 44 100 44 0 0 14093 0 --:--:-- --:--:-- --:--:-- 22000 5 { 6 "etcdcluster": "2.3.0", 7 "etcdserver": "2.3.7" 8 }
etcd 的基本API是一个分层的key空间。key空间由通常被称为"nodes"(节点)的keys和目录组成。
对datastore的访问,即通过 /version/keys 端点(endpoint) 访问key空间。
1. PUT 为etcd存储的键赋值, 即创建 message 键值,赋值为"Hello world"
1 [root@vStack ~]# curl http://127.0.0.1:2379/v2/keys/message -X PUT -d value="Hello world" | python -m json.tool 2 % Total % Received % Xferd Average Speed Time Time Time Current 3 Dload Upload Total Spent Left Speed 4 100 119 100 102 100 17 38230 6371 --:--:-- --:--:-- --:--:-- 51000 5 { 6 "action": "set", 7 "node": { 8 "createdIndex": 30, 9 "key": "/message", 10 "modifiedIndex": 30, 11 "value": "Hello world" 12 } 13 }
Response body返回值中的:
action: 请求接口进行的动作名称。 通过 http PUT方法修改node.key的值,对应的action值为:"set“。 PUT方法中,请求body中存在 prevExist=true时, action为update; prevExist=false时,action为create; 其他为set。
node.createIndex: etcd每次变化时创建的,唯一的,单调递增的、整数值作为索引。这个特定的索引值反映了在etcd状态成员里创建了一个给定key。除了用户请求外,etcd内部运行(如启动服务,重启服务、集群信息变化:添加、删除、同步服务等)也可能会因为对节点有变动而引起该值的变化。所以即使我们首次请求,此值也不是从1开始。update、get action不引起 node.createIndex值的变化。
node.key: 在请求的HTTP路径中,作为操作对象key。etcd使用一个类似文件系统的方式来反映键值存储的内容, 因此所有的key都是以‘/’开始 。
node.modifiedIndex: 像 node.createIndex, 这个属性也是etcd的索引。 引起这个值变化的Actions包括:set,delete,update,create,compareAndSwap 和 compareAndDelete。因为 get 和 watchcommands 在存储中不修改状态,所以这两个action不会修改mode.modifiedIndex值, 也不会修改 node.createIndex的值。 重启服务等也会修改此属性值。
node.value: 处理完请求后的key值。 在上面的实例中,成功请求后,修改节点的值为 Hello world。
Response header返回值中:
在responses中包括一些的HTTP 的headers部,在header中提供了一些关于etcd集群的全部信息,集群提供服务请求。
1 X-Etcd-Cluster-Id: 7e27652122e8b2ae 2 X-Etcd-Index: 93 3 X-Raft-Index: 223696 4 X-Raft-Term: 8
X-Etcd-Cluster-Id: etcd 集群id。
X-Etcd-Index: 当前etcd的索引,像前面的解释。当在key空间进行watch时,watch开始时,X-Etcd-Index是当前etcd的索引值,这意味着watched事件可能发生在X-Etcd-Index之后。
X-Raft-Index: 与X-Etcd-Index索引类似,是raft协议的索引。
X-Raft-Term: 是一个在集群中发生master election时,将增长的整数。如果这个值增长的非常快,需要调优这个election超时。详见 tuning 部分。
2. GET 查询etcd某个键存储的值
[root@vStack ~]# curl http://127.0.0.1:2379/v2/keys/message | python -m json.tool % Total % Received % Xferd Average Speed Time Time Time Current Dload Upload Total Spent Left Speed 100 102 100 102 0 0 64110 0 --:--:-- --:--:-- --:--:-- 99k { "action": "get", "node": { "createdIndex": 19, "key": "/message", "modifiedIndex": 19, "value": "Hello world" } }
3. PUT 修改键值:与创建新值几乎相同,但是反馈时会有一个prevNode
值反应了修改前存储的内容。
-d value=xxxx
[root@vStack ~]# curl http://127.0.0.1:2379/v2/keys/message -X PUT -d value="RECREATE" | python -m json.tool % Total % Received % Xferd Average Speed Time Time Time Current Dload Upload Total Spent Left Speed 100 202 100 188 100 14 57108 4252 --:--:-- --:--:-- --:--:-- 62666 { "action": "set", "node": { "createdIndex": 33, "key": "/message", "modifiedIndex": 33, "value": "RECREATE" }, "prevNode": { "createdIndex": 32, "key": "/message", "modifiedIndex": 32, "value": "Hello world" } }
Respone中新的字段 "prevNode", 这个字段表示当前请求完成前的请求节点的状态。 prevNode的格式与node相同, 在访问的节点没有前面状态时将被忽略。
4. DELETE 删除一个值
1 [root@vStack ~]# curl http://127.0.0.1:2379/v2/keys/message -X DELETE | python -m json.tool 2 % Total % Received % Xferd Average Speed Time Time Time Current 3 Dload Upload Total Spent Left Speed 4 100 177 100 177 0 0 73261 0 --:--:-- --:--:-- --:--:-- 172k 5 { 6 "action": "delete", 7 "node": { 8 "createdIndex": 19, 9 "key": "/message", 10 "modifiedIndex": 29 11 }, 12 "prevNode": { 13 "createdIndex": 19, 14 "key": "/message", 15 "modifiedIndex": 28, 16 "value": "test createIndex" 17 } 18 }
5. PUT 对一个键进行定时删除:etcd中对键进行定时删除,设定一个ttl值,当这个值到期时键就会被删除。反馈的内容会给出expiration项告知超时时间,ttl项告知设定的时长。
在设定一个key时,设定其ttl(time to live), ttl时间后,自动删除。
-d ttl=xxx
1 [root@vStack ~]# curl http://127.0.0.1:2379/v2/keys/foo -XPUT -d value=bar -d ttl=5 | python -m json.tool 2 % Total % Received % Xferd Average Speed Time Time Time Current 3 Dload Upload Total Spent Left Speed 4 100 159 100 144 100 15 60453 6297 --:--:-- --:--:-- --:--:-- 72000 5 { 6 "action": "set", 7 "node": { 8 "createdIndex": 34, 9 "expiration": "2016-04-23T12:01:57.992249507Z", 10 "key": "/foo", 11 "modifiedIndex": 34, 12 "ttl": 5, 13 "value": "bar" 14 } 15 }
在repsonse中有两个新的字段:
expiration:key的有效截至日期。
ttl: key的ttl值,单位秒。
注意|:
key只有被cluster header设定过期,如果一个memeber 脱离的集群,它里面的key将没有过期,直到重新加入后才有过期功能。
6. PUT 取消定时删除任务
-d ttl=
1 [root@vStack ~]# curl http://127.0.0.1:2379/v2/keys/foo -XPUT -d value=bar -d ttl= -d prevExist=true | python -m json.tool 2 % Total % Received % Xferd Average Speed Time Time Time Current 3 Dload Upload Total Spent Left Speed 4 100 254 100 225 100 29 98944 12752 --:--:-- --:--:-- --:--:-- 219k 5 { 6 "action": "update", 7 "node": { 8 "createdIndex": 38, 9 "key": "/foo", 10 "modifiedIndex": 39, 11 "value": "bar" 12 }, 13 "prevNode": { 14 "createdIndex": 38, 15 "expiration": "2016-04-23T12:07:05.415596297Z", 16 "key": "/foo", 17 "modifiedIndex": 38, 18 "ttl": 78, 19 "value": "bar" 20 } 21 }
7. PUT 刷新key的 ttl
ttl 到删除key和重新设置ttl,都会触发watcher。通过在请求的body中增加 refresh=true,更新ttl(必须存在),不引起触发watcher事件。
-d refresh=true
1 [root@vStack ~]# curl http://127.0.0.1:2379/v2/keys/message -XPUT -d ttl=100 -d refresh=true | python -m json.tool 2 % Total % Received % Xferd Average Speed Time Time Time Current 3 Dload Upload Total Spent Left Speed 4 100 304 100 284 100 20 46973 3307 --:--:-- --:--:-- --:--:-- 56800 5 { 6 "action": "set", 7 "node": { 8 "createdIndex": 145, 9 "expiration": "2016-12-28T06:58:20.426383304Z", 10 "key": "/message", 11 "modifiedIndex": 145, 12 "ttl": 100, 13 "value": "" 14 }, 15 "prevNode": { 16 "createdIndex": 144, 17 "expiration": "2016-12-28T06:57:55.628682326Z", 18 "key": "/message", 19 "modifiedIndex": 144, 20 "ttl": 76, 21 "value": "" 22 } 23 }
8. GET 对键值修改进行监控:etcd提供的这个API通过long polling(轮询)让用户可以监控一个值或者递归式(recursive=true 在url path中作为参数)地监控一个目录及其子目录的值,当目录或值发生变化时,etcd会主动通知。
?wait=true 监听当前节点
?recursive=true 递归监听当前节点和子目录
?waitIndex=xxx 监听过去已经发生的。过去值的查询或监听, 必选与wait一起使用。
1 [root@vStack ~]# curl 'http://127.0.0.1:2379/v2/keys/message?wait=true&waitIndex=2230' | python -m json.tool 2 % Total % Received % Xferd Average Speed Time Time Time Current 3 Dload Upload Total Spent Left Speed 4 100 183 0 183 0 0 131k 0 --:--:-- --:--:-- --:--:-- 178k 5 { 6 "action": "set", 7 "node": { 8 "createdIndex": 2230, 9 "key": "/message", 10 "modifiedIndex": 2230, 11 "value": "123" 12 }, 13 "prevNode": { 14 "createdIndex": 2229, 15 "key": "/message", 16 "modifiedIndex": 2229, 17 "value": "123" 18 } 19 }
watch 一个ttl自删除的key时,收到如下 “expire” action。
1 [root@vStack ~]# curl http://127.0.0.1:2379/v2/keys/message?wait=true | python -m json.tool 2 % Total % Received % Xferd Average Speed Time Time Time Current 3 Dload Upload Total Spent Left Speed 4 100 215 0 215 0 0 19 0 --:--:-- 0:00:10 --:--:-- 45 5 { 6 "action": "expire", 7 "node": { 8 "createdIndex": 2223, 9 "key": "/message", 10 "modifiedIndex": 2224 11 }, 12 "prevNode": { 13 "createdIndex": 2223, 14 "expiration": "2016-12-28T09:25:00.028597482Z", 15 "key": "/message", 16 "modifiedIndex": 2223, 17 "value": "" 18 } 19 }
9. GET 对过去的键值操作进行查询:类似上面提到的监控,在其基础上指定过去某次修改的索引编号,就可以查询历史操作。默认可查询的历史记录为1000条。
? waitIndex=xxx 监听过去已经发生的。 这个在确保在watch命令中,没有丢失事件非常有用。例如:我们反复watch 我们得到节点的 modifiedIndex+1。
因为 node 的modifiedIndex的值是不连续,如果waitIndex的值没有相应modifiedIndex,返回最大的modifedIndex的节点信息。 如果大于节点中所有的modifiedIndex,等待,直到节点的modifiedIndex值大于等于waitIndex的值。
即使删除key后,也可以查询历史数据。
store中有一个全局的currentIndex,每次变更,index会加1.然后每个event都会关联到currentIndex.
当客户端调用watch接口(参数中增加 wait参数)时,如果请求参数中有waitIndex,并且waitIndex 小于 currentIndex,则从 EventHistroy 表中查询index小于等于waitIndex,并且和watch key 匹配的 event,如果有数据,则直接返回。如果历史表中没有或者请求没有带 waitIndex,则放入WatchHub中,每个key会关联一个watcher列表。 当有变更操作时,变更生成的event会放入EventHistroy表中,同时通知和该key相关的watcher。
注意:
1. 必须与 wait 一起使用;
2. curl 中url需要使用引号。
3. etcd 仅仅保留系统中所有key最近的1000条event,建议将获取到的response发送到另一个线程处理,而不是处理response而阻塞watch。
4. 如果watch超出了etcd保存的最近1000条,建议get后使用response header中的 X-Etcd-Index
+ 1进行重新watch,而不是使用node中的modifiedIndex+1. 因为 X-Etcd-Index
永远大于等于modifiedIndex, 使用modifiedIndex可能会返回401错误码,同样超出。
5. long polling可能会被服务器关闭,如超时或服务器关闭。导致仅仅收到header 200OK,body为空,此时应重新watch。
1 [root@vStack ~]# curl 'http://127.0.0.1:2379/v2/keys/foo?wait=true&waitIndex=2' | python -m json.tool 2 % Total % Received % Xferd Average Speed Time Time Time Current 3 Dload Upload Total Spent Left Speed 4 100 144 0 144 0 0 102k 0 --:--:-- --:--:-- --:--:-- 140k 5 { 6 "action": "set", 7 "node": { 8 "createdIndex": 34, 9 "expiration": "2016-04-23T12:01:57.992249507Z", 10 "key": "/foo", 11 "modifiedIndex": 34, 12 "ttl": 5, 13 "value": "bar" 14 } 15 }
如果超出了etcd保留的最近1000条,返回 401错误码
1 [root@vStack ~]# curl 'http://127.0.0.1:2379/v2/keys/message?wait=true&waitIndex=8' | python -m json.tool 2 % Total % Received % Xferd Average Speed Time Time Time Current 3 Dload Upload Total Spent Left Speed 4 100 154 100 154 0 0 56163 0 --:--:-- --:--:-- --:--:-- 150k 5 { 6 "cause": "the requested history has been cleared [1186/8]", 7 "errorCode": 401, 8 "index": 2185, 9 "message": "The event in requested index is outdated and cleared" 10 }
10. PUT 创建目录
1 [root@vStack ~]# curl http://127.0.0.1:2379/v2/keys/dir -XPUT -d dir=true | python -m json.tool 2 % Total % Received % Xferd Average Speed Time Time Time Current 3 Dload Upload Total Spent Left Speed 4 100 95 100 87 100 8 21260 1955 --:--:-- --:--:-- --:--:-- 29000 5 { 6 "action": "set", 7 "node": { 8 "createdIndex": 63, 9 "dir": true, 10 "key": "/dir", 11 "modifiedIndex": 63 12 } 13 }
11. GET 列出目录下所有的节点信息,最后以/
结尾(不是必须的)。还可以通过recursive参数递归列出所有子目录信息。 没有recursive,返回第二级。后面不在返回。
1 [root@vStack ~]# curl http://127.0.0.1:2379/v2/keys/dir1/ | python -m json.tool 2 % Total % Received % Xferd Average Speed Time Time Time Current 3 Dload Upload Total Spent Left Speed 4 100 167 100 167 0 0 65234 0 --:--:-- --:--:-- --:--:-- 83500 5 { 6 "action": "get", 7 "node": { 8 "createdIndex": 67, 9 "dir": true, 10 "key": "/dir1", 11 "modifiedIndex": 67, 12 "nodes": [ 13 { 14 "createdIndex": 67, 15 "dir": true, 16 "key": "/dir1/dir2", 17 "modifiedIndex": 67 18 } 19 ] 20 } 21 }
12. POST 自动在目录下创建有序键。在对创建的目录使用POST
参数,会自动在该目录下创建一个以global etcd index值为键的值,这样就相当于根据创建时间的先后进行了严格排序。该API对分布式队列这类场景非常有用。
1 [root@vStack ~]# curl http://127.0.0.1:2379/v2/keys/queue -XPOST -d value=Job1 | python -m json.tool 2 % Total % Received % Xferd Average Speed Time Time Time Current 3 Dload Upload Total Spent Left Speed 4 100 127 100 117 100 10 31655 2705 --:--:-- --:--:-- --:--:-- 39000 5 { 6 "action": "create", 7 "node": { 8 "createdIndex": 47, 9 "key": "/queue/00000000000000000047", 10 "modifiedIndex": 47, 11 "value": "Job1" 12 } 13 }
13. GET 按顺序列出所有创建的有序键
? sorted=true
? recursive=true
1 [root@vStack ~]# curl -s 'http://127.0.0.1:2379/v2/keys/queue?sorted=true' | python -m json.tool 2 { 3 "action": "get", 4 "node": { 5 "createdIndex": 46, 6 "dir": true, 7 "key": "/queue", 8 "modifiedIndex": 46, 9 "nodes": [ 10 { 11 "createdIndex": 46, 12 "key": "/queue/00000000000000000046", 13 "modifiedIndex": 46, 14 "value": "" 15 }, 16 { 17 "createdIndex": 47, 18 "key": "/queue/00000000000000000047", 19 "modifiedIndex": 47, 20 "value": "Job1" 21 }, 22 { 23 "createdIndex": 48, 24 "key": "/queue/00000000000000000048", 25 "modifiedIndex": 48, 26 "value": "aaaa" 27 }, 28 { 29 "createdIndex": 49, 30 "key": "/queue/00000000000000000049", 31 "modifiedIndex": 49, 32 "value": "aaaa" 33 }, 34 { 35 "createdIndex": 50, 36 "key": "/queue/00000000000000000050", 37 "modifiedIndex": 50, 38 "value": "aaaa" 39 }, 40 { 41 "createdIndex": 51, 42 "key": "/queue/00000000000000000051", 43 "modifiedIndex": 51, 44 "value": "aaaa" 45 } 46 ] 47 } 48 }
14. DELETE 删除目录:默认情况下只允许删除空目录,如果要删除有内容的目录需要加上recursive=true
参数。
?dir=true 删除目录
?recursive=true 删除非空目录
删除非空目录必须使用 recursive=true 参数,删除空目录,dir=true或recursive=true至少有一个。
1 [root@vStack ~]# curl 'http://127.0.0.1:2379/v2/keys/dir1?dir=true' -XDELETE | python -m json.tool 2 % Total % Received % Xferd Average Speed Time Time Time Current 3 Dload Upload Total Spent Left Speed 4 100 77 100 77 0 0 38557 0 --:--:-- --:--:-- --:--:-- 77000 5 { 6 "cause": "/dir1", 7 "errorCode": 108, 8 "index": 67, 9 "message": "Directory not empty" 10 } 11 [root@vStack ~]# curl 'http://127.0.0.1:2379/v2/keys/dir1?dir=true&recursive=true' -XDELETE | python -m json.tool 12 % Total % Received % Xferd Average Speed Time Time Time Current 13 Dload Upload Total Spent Left Speed 14 100 166 100 166 0 0 62032 0 --:--:-- --:--:-- --:--:-- 83000 15 { 16 "action": "delete", 17 "node": { 18 "createdIndex": 67, 19 "dir": true, 20 "key": "/dir1", 21 "modifiedIndex": 68 22 }, 23 "prevNode": { 24 "createdIndex": 67, 25 "dir": true, 26 "key": "/dir1", 27 "modifiedIndex": 67 28 } 29 }
15. PUT 创建定时删除的目录:就跟定时删除某个键类似。如果目录因为超时被删除了,其下的所有内容也自动超时删除。
如果目录存在,创建时,返回 102 错误码
-d ttl=xx
1 [root@vStack ~]# curl http://127.0.0.1:2379/v2/keys/dir -XPUT -d ttl=30 -d dir=true | python -m json.tool 2 % Total % Received % Xferd Average Speed Time Time Time Current 3 Dload Upload Total Spent Left Speed 4 100 157 100 142 100 15 22873 2416 --:--:-- --:--:-- --:--:-- 28400 5 { 6 "action": "set", 7 "node": { 8 "createdIndex": 52, 9 "dir": true, 10 "expiration": "2016-04-23T13:37:51.502289114Z", 11 "key": "/dir", 12 "modifiedIndex": 52, 13 "ttl": 30 14 } 15 }
16. PUT 设置刷新目录超时时间 开始创建时,没有设置ttl, 或刷新已设置ttl的目录的ttl的值。
-d ttl=xxx 设置或刷新的ttl值。 ttl为空是,取消ttl。
-d prevExist=true 必选参数,否者报错102错误码
会触发watcher事件。
1 [root@vStack ~]# curl http://127.0.0.1:2379/v2/keys/dir -XPUT -d ttl=30 -d dir=true -d prevExist=true | python -m json.tool 2 % Total % Received % Xferd Average Speed Time Time Time Current 3 Dload Upload Total Spent Left Speed 4 100 304 100 274 100 30 60392 6612 --:--:-- --:--:-- --:--:-- 91333 5 { 6 "action": "update", 7 "node": { 8 "createdIndex": 56, 9 "dir": true, 10 "expiration": "2016-04-23T13:42:56.395923381Z", 11 "key": "/dir", 12 "modifiedIndex": 61, 13 "ttl": 30 14 }, 15 "prevNode": { 16 "createdIndex": 56, 17 "dir": true, 18 "expiration": "2016-04-23T13:42:46.225222674Z", 19 "key": "/dir", 20 "modifiedIndex": 56, 21 "ttl": 20 22 } 23 }
当ttl时间到后,watcher将收到一个"expire" action.
1 [root@vStack ~]# curl http://127.0.0.1:2379/v2/keys/dir?wait=true | python -m json.tool 2 % Total % Received % Xferd Average Speed Time Time Time Current 3 Dload Upload Total Spent Left Speed 4 100 207 0 207 0 0 16 0 --:--:-- 0:00:12 --:--:-- 43 5 { 6 "action": "expire", 7 "node": { 8 "createdIndex": 2219, 9 "key": "/dir", 10 "modifiedIndex": 2220 11 }, 12 "prevNode": { 13 "createdIndex": 2219, 14 "dir": true, 15 "expiration": "2016-12-28T09:22:35.853484071Z", 16 "key": "/dir", 17 "modifiedIndex": 2219 18 } 19 }
17. 创建一个隐藏节点:命名时名字以下划线_
开头的key或目录,默认就是隐藏键。
list目录下时,将不显示。可以显示使用。
1 [root@vStack ~]# curl http://127.0.0.1:2379/v2/keys/_message -XPUT -d value="Hello hidden world" | python -m json.tool 2 % Total % Received % Xferd Average Speed Time Time Time Current 3 Dload Upload Total Spent Left Speed 4 100 134 100 110 100 24 46948 10243 --:--:-- --:--:-- --:--:-- 107k 5 { 6 "action": "set", 7 "node": { 8 "createdIndex": 69, 9 "key": "/_message", 10 "modifiedIndex": 69, 11 "value": "Hello hidden world" 12 } 13 }
注意:
1. api url 区分大小写,包括其中的参数。
2. 如果key存在,通过 curl http://IP:PORT/v2/keys/message001 -XPUT -d dir=true , 将会把key调整为dir属性,value值为None; 增加 -d prevExist=false,将报105错误码。 修改为dir后,无法在恢复为key。
3. 不能对一个dir进行赋值,即 curl http://127.0.0.1:2379/v2/keys/message001 -XPUT -d value=123 , 返回错误码 102, “Not a file”
4. key相当于文件系统中的文件,可以赋值即向文件写内容。dir相当于文件系统的目录或路径,内容包括dir和key, 即文件系统中的目录和文件。
5. 在api url中的path,体现了存储结构。如果目录不存在,直接创建。如:curl http://127.0.0.1:2379/v2/keys/fst/sec/thr -XPUT -d value=123 中的fst、sec会自动创建为dir。
6. 创建dir与key的区别,即在 curl的body中是否有 dir=true,有即为dir, 否认则key; dir存在时,value无效。 创建key时,value可以不存在。
7. 不能在key下创建dir或可以,否者报错误码:104,“Not a directory”
8. 目录不能重复创建,即 curl -v http://127.0.0.1:2379/v2/keys/message -XPUT -d dir=true 如果 message 目录已经已经存在,返回错误码:102, “Not a file”
9. 删除一个非空目录,返回错误码:102. 通过在url中增加 recursive=true 参数,可以参数非空目录。
Statistics 统计接口
etcd 集群记录大量的统计数据,包括:延时(latency),带宽和正常运行时间。统计功能通过统计端点(/stats)去理解一个集群的内部健康状态。
An etcd cluster keeps track of a number of statistics including latency, bandwidth and uptime. These are exposed via the statistics endpoint to understand the internal health of a cluster.
Leader Statistics 领导点统计
1 [root@localhost testectd]# curl http://127.0.0.1:2379/v2/stats/self | python -m json.tool 2 % Total % Received % Xferd Average Speed Time Time Time Current 3 Dload Upload Total Spent Left Speed 4 119 357 119 357 0 0 274k 0 --:--:-- --:--:-- --:--:-- 348k 5 { 6 "id": "45b967575ff25cb2", 7 "leaderInfo": { 8 "leader": "45b967575ff25cb2", 9 "startTime": "2016-12-29T20:15:13.811259537+08:00", 10 "uptime": "8m19.603722077s" 11 }, 12 "name": "infra0", 13 "recvAppendRequestCnt": 18, 14 "sendAppendRequestCnt": 3670, 15 "sendBandwidthRate": 123950.52498801574, 16 "sendPkgRate": 7.5456304767920797, 17 "startTime": "2016-12-29T20:14:29.300999352+08:00", 18 "state": "StateLeader" 19 }
1 [root@localhost testectd]# curl http://127.0.0.1:2379/v2/stats/leader | python -m json.tool 2 % Total % Received % Xferd Average Speed Time Time Time Current 3 Dload Upload Total Spent Left Speed 4 132 398 132 398 0 0 133k 0 --:--:-- --:--:-- --:--:-- 388k 5 { 6 "followers": { 7 "3c828782a67e0043": { 8 "counts": { 9 "fail": 1211, 10 "success": 0 11 }, 12 "latency": { 13 "average": 0, 14 "current": 0, 15 "maximum": 0, 16 "minimum": 9.2233720368547758e+18, 17 "standardDeviation": 0 18 } 19 }, 20 "b26f1b9a6c735437": { 21 "counts": { 22 "fail": 0, 23 "success": 3231 24 }, 25 "latency": { 26 "average": 0.0073246419065304607, 27 "current": 0.0032520000000000001, 28 "maximum": 1.713633, 29 "minimum": 0.0012520000000000001, 30 "standardDeviation": 0.035654606550540036 31 } 32 } 33 }, 34 "leader": "45b967575ff25cb2" 35 }
Memeber API
1. List members
返回http 200 OK response,显示在 etcd 集群中的所有成员。
1 [root@vStack ~]# curl http://192.168.10.150:2379/v2/members | python -m json.tool 2 % Total % Received % Xferd Average Speed Time Time Time Current 3 Dload Upload Total Spent Left Speed 4 100 138 100 138 0 0 73287 0 --:--:-- --:--:-- --:--:-- 134k 5 { 6 "members": [ 7 { 8 "clientURLs": [ 9 "http://192.168.10.150:2379" 10 ], 11 "id": "8e9e05c52164694d", 12 "name": "default", 13 "peerURLs": [ 14 "http://localhost:2380" 15 ] 16 } 17 ] 18 }
[root@vStack ~]# curl http://127.0.0.1:2379/v2/members | python -m json.tool % Total % Received % Xferd Average Speed Time Time Time Current Dload Upload Total Spent Left Speed 100 181 100 181 0 0 138k 0 --:--:-- --:--:-- --:--:-- 176k { "members": [ { "clientURLs": [ "http://localhost:2379", "http://localhost:4001" ], "id": "ce2a822cea30bfca", "name": "default", "peerURLs": [ "http://localhost:2380", "http://localhost:7001" ] } ] }
1 [root@vStack ~]# curl http://192.168.10.150:2379/v2/members | python -m json.tool 2 % Total % Received % Xferd Average Speed Time Time Time Current 3 Dload Upload Total Spent Left Speed 4 100 227 100 227 0 0 116k 0 --:--:-- --:--:-- --:--:-- 221k 5 { 6 "members": [ 7 { 8 "clientURLs": [], 9 "id": "755ef544f1926e2e", 10 "name": "", 11 "peerURLs": [ 12 "http://127.0.0.1:2380" 13 ] 14 }, 15 { 16 "clientURLs": [ 17 "http://192.168.10.150:2379" 18 ], 19 "id": "8e9e05c52164694d", 20 "name": "default", 21 "peerURLs": [ 22 "http://localhost:2380" 23 ] 24 } 25 ] 26 }
2. Add a member
成功时返回 HTTP 201 response 状态码,及新建入成员的信息,对新加入的成员生成一个成员id。 失败时,返回失败状态的字符描述。
Returns an HTTP 201 response code and the representation of added member with a newly generated a memberID when successful. Returns a string describing the failure condition when unsuccessful.
If the POST body is malformed an HTTP 400 will be returned. If the member exists in the cluster or existed in the cluster at some point in the past an HTTP 409 will be returned. If any of the given peerURLs exists in the cluster an HTTP 409 will be returned. If the cluster fails to process the request within timeout an HTTP 500 will be returned, though the request may be processed later.
1 curl http://10.0.0.10:2379/v2/members -XPOST \ 2 -H "Content-Type: application/json" -d '{"peerURLs":["http://10.0.0.10:2380"]}'
1. 需要在header中设置 Content-Type: application/json, 否则会报 405 错误 Unsupported Media Type
2. 如果已经存在相同的peerURLs,直接返回当前存在相同peerURLs的member。
3. 如果添加一个无法使用的peerURLs,导致服务挂掉,无法操作。重启也无法使用。解决方法删除物理文件,但这个会删除记录的数据,导致持久数据的就丢失。需要进一步寻求解决方法。
集群信息会记录到持久化信息文件中,重启问题依旧。除非使用不同的name或改变数据目录。
3. Delete a member
从集群中删除一个memeber。 member ID 必须是一个64位整数的16位编码的字符串。成功时,返回 204 状态码和没有内容。失败时,返回404状态码和字符描述的失败情况。
从集群中删除一个不存在的member,返回500错误。集群处理失败请求,包括超时,返回一个500错误码。即使请求可能后面会处理。
1 [root@localhost testectd]# curl http://192.168.10.150:2379/v2/members/2ae1ee131894262b -XDELETE | python -m json.tool 2 % Total % Received % Xferd Average Speed Time Time Time Current 3 Dload Upload Total Spent Left Speed 4 0 0 0 0 0 0 0 0 --:--:-- --:--:-- --:--:-- 0 5 No JSON object could be decoded
1. 删除成员后,etcd使用的data-dir必须被删除。如下是删除最后一个member,etcd给出的输出,服务退出。
2016-12-29 16:10:59.544409 E | etcdserver: the member has been permanently removed from the cluster
2016-12-29 16:10:59.544480 I | etcdserver: the data-dir used by this member must be removed.
2. 通过etcdctl 删除一个成员后,服务会退出。通过 etcdctl重新加入,显示为unstart。
如果需要重新加入集群,先用命令加入,再启动,否则启动时报 the member has been permanently removed from the cluster
加入后,启动前,需要删除其存储的数据(member id发生了改变,会将使用磁盘记录的id,与新加入的ID不一致)。并设置 --initial-cluster-state existing 不能设置为 new
注意 cluster版本要一致。cluster{"etcdserver":"3.0.15","etcdcluster":"3.0.0"} 可以。
{"etcdserver":"2.3.7","etcdcluster":"2.3.0"}
出现: {"etcdserver":"3.0.15","etcdcluster":"2.3.0"} 在集群系统中出现不同版本的member
以上删除重新加入的操作,高版本的可以,单低版本的不支持,报 failed to find member 3c828782a67e0043 in cluster 34b660d543ad1445 无法发现其他member
即 在集群中,成员的版本不同。低版本的成员失败退出,重启启动可以重新加入集群。通过接口,被动从集群中移除,再次加入,只能停止所有的成员,删除其中磁盘数据,重新构建。导致数据的丢失,如何恢复? 高版本的成员没有此问题。
即 在集群中,成员的版本不同。集群版本降低为 低版本 如:{"etcdserver":"3.0.15","etcdcluster":"2.3.0"}, 低版本的成员退出后,集群版本升级为高版本:{"etcdserver":"3.0.15","etcdcluster":"3.0.0"}
其版本的首先启动时,使用--initial-cluster-state new; 高版本的在启动时 --initial-cluster-state existing 会报 集群版本不兼容。需要使用 --initial-cluster-state new。 再次启动高版本的可以使用 existing 。
构建集群时,采用就低版本,在高版本加入时,需要使用 --initial-cluster-state new 或 不设置, 使用existing ,报集群不兼容。
1 [root@centos7mini etcd]# ./etcdctl member remove 45b967575ff25cb2 2 Removed member 45b967575ff25cb2 from cluster 3 4 [root@centos7mini etcd]# ./etcdctl member add infra0 http://192.168.10.150:2380 5 Added member named infra0 with ID 700fb7bf97791e71 to cluster 6 7 ETCD_NAME="infra0" 8 ETCD_INITIAL_CLUSTER="infra3=http://192.168.10.184:2380,infra0=http://192.168.10.150:2380,infra1=http://192.168.10.231:2380" 9 ETCD_INITIAL_CLUSTER_STATE="existing" 10 11 [root@centos7mini etcd]# ./etcdctl member list 12 3c828782a67e0043: name=infra3 peerURLs=http://192.168.10.184:2380 clientURLs=http://192.168.10.184:2379 isLeader=true 13 700fb7bf97791e71[unstarted]: peerURLs=http://192.168.10.150:2380 14 b26f1b9a6c735437: name=infra1 peerURLs=http://192.168.10.231:2380 clientURLs=http://192.168.10.231:2379 isLeader=false 15 [root@centos7mini etcd]#
4. Change the peer urls of a member
修改已存在成员的peer urls。成员ID必须是一个64位整数的16进制显示的字符串。成功时,返回204状态码,空内容。失败时,返回失败状态字符描述。
修改的成员不存在,返回400 错误码。 如果提供的peerlURL存在,将返回409错误码。500错误: 集群处理超时。
1 [root@localhost etcd-v3.0.15-linux-amd64]# ./etcdctl member list 2 3c828782a67e0043: name=infra3 peerURLs=http://192.168.10.184:2380 clientURLs=http://192.168.10.184:2379 isLeader=false 3 45b967575ff25cb2: name=infra0 peerURLs=http://192.168.10.150:2380 clientURLs=http://192.168.10.150:2379 isLeader=true 4 b26f1b9a6c735437: name=infra1 peerURLs=http://192.168.10.231:2380 clientURLs=http://192.168.10.231:2379 isLeader=false 5 [root@localhost etcd-v3.0.15-linux-amd64]# 6 [root@localhost etcd-v3.0.15-linux-amd64]# curl http://192.168.10.150:2379/v2/members/b26f1b9a6c735437 -XPUT -H "Content-Type: application/json" -d '{"peerURLs":["http://127.0.0.1:2380"]}' 7 [root@localhost etcd-v3.0.15-linux-amd64]# 8 [root@localhost etcd-v3.0.15-linux-amd64]# ./etcdctl member list 9 3c828782a67e0043: name=infra3 peerURLs=http://192.168.10.184:2380 clientURLs=http://192.168.10.184:2379 isLeader=false 10 45b967575ff25cb2: name=infra0 peerURLs=http://192.168.10.150:2380 clientURLs=http://192.168.10.150:2379 isLeader=true 11 b26f1b9a6c735437: name=infra1 peerURLs=http://127.0.0.1:2380 clientURLs=http://192.168.10.231:2379 isLeader=false 12 [root@localhost etcd-v3.0.15-linux-amd64]#
在启动etcd设置 --listen-client-urls 值时,请将localhost:2379或127.0.0.1:2379 设置,否者 本地etcdctl会报错如下
1 [root@centos7mini etcd]# ./etcdctl member list 2 Error: client: etcd cluster is unavailable or misconfigured 3 error #0: dial tcp 127.0.0.1:4001: getsockopt: connection refused 4 error #1: dial tcp 127.0.0.1:2379: getsockopt: connection refused
一个节点断开后,成为candicate,向其他member发起vote,重新选准 master/leader。
2016-12-29 19:43:15.767905 I | raft: b26f1b9a6c735437 became candidate at term 146
2016-12-29 19:43:15.767932 I | raft: b26f1b9a6c735437 received vote from b26f1b9a6c735437 at term 146
2016-12-29 19:43:15.767961 I | raft: b26f1b9a6c735437 [logterm: 101, index: 688] sent vote request to 45b967575ff25cb2 at term 146
2016-12-29 19:43:17.266905 I | raft: b26f1b9a6c735437 is starting a new election at term 146
将一个memeber加入两个集群时,出现 cluster id 匹配问题。以下是静态创建cluster。
1 # etcd --name infra1 --initial-advertise-peer-urls http://192.168.10.231:2380 \ 2 --listen-peer-urls http://192.168.10.231:2380 \ 3 --listen-client-urls http://192.168.10.231:2379,http://127.0.0.1:2379 \ 4 --advertise-client-urls http://192.168.10.231:2379 \ 5 --initial-cluster-token etcd-cluster-1 \ 6 --initial-cluster infra0=http://192.168.10.150:2380,infra1=http://192.168.10.231:2380,infra3=http://192.168.10.184:2380 \ 7 --initial-cluster-state new 8 9 # ./etcd --debug --name infra3 --initial-advertise-peer-urls http://192.168.10.184:2380 \ 10 --listen-peer-urls http://192.168.10.184:2380 --initial-cluster infra3=http://192.168.10.184:2380 \ 11 --listen-client-urls http://192.168.10.184:2379 --advertise-client-urls http://192.168.10.184:2379 \ 12 --initial-cluster-state new --initial-cluster-token etcd-cluster-1
2016-12-30 11:47:30.939730 E | rafthttp: request sent was ignored (cluster ID mismatch: remote[3c828782a67e0043]=625ac7f9082c643, local=34b660d543ad1445)
2016-12-30 11:47:30.977766 E | rafthttp: request sent was ignored (cluster ID mismatch: remote[3c828782a67e0043]=625ac7f9082c643, local=34b660d543ad1445)
如下建立集群后,--debug 提示:150上的iptables防火墙导致。
2016-12-30 12:07:19.479241 D | rafthttp: failed to dial 45b967575ff25cb2 on stream Message (dial tcp 192.168.10.150:2380: getsockopt: no route to host)
2016-12-30 12:07:19.914614 D | rafthttp: failed to dial 45b967575ff25cb2 on stream MsgApp v2 (dial tcp 192.168.10.150:2380: i/o timeout)
2016-12-30 12:07:20.781345 D | rafthttp: failed to dial 45b967575ff25cb2 on stream Message (dial tcp 192.168.10.150:2380: i/o timeout)
2016-12-30 12:07:21.216792 D | rafthttp: failed to dial 45b967575ff25cb2 on stream MsgApp v2 (dial tcp 192.168.10.150:2380: i/o timeout)
2016-12-30 12:07:21.885187 D | rafthttp: failed to dial 45b967575ff25cb2 on stream Message (dial tcp 192.168.10.150:2380: getsockopt: no route to host)
2016-12-30 12:07:22.518689 D | rafthttp: failed to dial 45b967575ff25cb2 on stream MsgApp v2 (dial tcp 192.168.10.150:2380: i/o timeout)
2016-12-30 12:07:22.620358 D | rafthttp: failed to dial 45b967575ff25cb2 on stream MsgApp v2 (dial tcp 192.168.10.150:2380: getsockopt: no route to host)
2016-12-30 12:07:23.187832 D | rafthttp: failed to dial 45b967575ff25cb2 on stream Message (dial tcp 192.168.10.150:2380: i/o timeout)
2016-12-30 12:07:23.925031 D | rafthttp: failed to dial 45b967575ff25cb2 on stream MsgApp v2 (dial tcp 192.168.10.150:2380: i/o timeout)
2016-12-30 12:07:24.490547 D | rafthttp: failed to dial 45b967575ff25cb2 on stream Message (dial tcp 192.168.10.150:2380: i/o timeout)
2016-12-30 12:07:24.592188 D | rafthttp: failed to dial 45b967575ff25cb2 on stream Message (dial tcp 192.168.10.150:2380: getsockopt: no route to host)
2016-12-30 12:07:25.226673 D | rafthttp: failed to dial 45b967575ff25cb2 on stream MsgApp v2 (dial tcp 192.168.10.150:2380: i/o timeout)
2016-12-30 12:07:25.696521 D | rafthttp: failed to dial 45b967575ff25cb2 on stream Message (dial tcp 192.168.10.150:2380: getsockopt: no route to host)
2016-12-30 12:07:26.528616 D | rafthttp: failed to dial 45b967575ff25cb2 on stream MsgApp v2 (dial tcp 192.168.10.150:2380: i/o timeout)
2016-12-30 12:07:26.630548 D | rafthttp: failed to dial 45b967575ff25cb2 on stream MsgApp v2 (dial tcp 192.168.10.150:2380: getsockopt: no route to host)
2016-12-30 12:07:27.000087 D | rafthttp: failed to dial 45b967575ff25cb2 on stream Message (dial tcp 192.168.10.150:2380: i/o timeout)
2016-12-30 12:07:27.932728 D | rafthttp: failed to dial 45b967575ff25cb2 on stream MsgApp v2 (dial tcp 192.168.10.150:2380: i/o timeout)
2016-12-30 12:07:28.302774 D | rafthttp: failed to dial 45b967575ff25cb2 on stream Message (dial tcp 192.168.10.150:2380: i/o timeout)
2016-12-30 12:07:28.404591 D | rafthttp: failed to dial 45b967575ff25cb2 on stream Message (dial tcp 192.168.10.150:2380: getsockopt: no r
如下调试信息是在leader 上的。 由于iptables防火墙的原因导致。
2016-12-30 14:03:10.838829 D | rafthttp: failed to dial 45b967575ff25cb2 on stream Message (dial tcp 192.168.10.150:2380: i/o timeout)
2016-12-30 14:03:11.353601 W | etcdserver: failed to reach the peerURL(http://192.168.10.150:2380) of member 45b967575ff25cb2 (Get http://192.168.10.150:2380/version: dial tcp 192.168.10.150:2380: getsockopt: no route to host)
2016-12-30 14:03:11.353640 W | etcdserver: cannot get the version of member 45b967575ff25cb2 (Get http://192.168.10.150:2380/version: dial tcp 192.168.10.150:2380: getsockopt: no route to host)
2016-12-30 14:03:11.672262 D | rafthttp: failed to dial 45b967575ff25cb2 on stream MsgApp v2 (dial tcp 192.168.10.150:2380: i/o timeout)
2016-12-30 14:03:12.140697 D | rafthttp: failed to dial 45b967575ff25cb2 on stream Message (dial tcp 192.168.10.150:2380: i/o timeout)
2016-12-30 14:03:12.974912 D | rafthttp: failed to dial 45b967575ff25cb2 on stream MsgApp v2 (dial tcp 192.168.10.150:2380: i/o timeout)
2016-12-30 14:03:13.445167 D | rafthttp: failed to dial 45b967575ff25cb2 on stream Message (dial tcp 192.168.10.150:2380: i/o timeout)
2016-12-30 14:03:13.547340 D | rafthttp: failed to dial 45b967575ff25cb2 on stream Message (dial tcp 192.168.10.150:2380: getsockopt: no route to host)
2016-12-30 14:03:14.278497 D | rafthttp: failed to dial 45b967575ff25cb2 on stream MsgApp v2 (dial tcp 192.168.10.150:2380: i/o timeout)
2016-12-30 14:03:14.380259 D | rafthttp: failed to dial 45b967575ff25cb2 on stream MsgApp v2 (dial tcp 192.168.10.150:2380: getsockopt: no route to host)
2016-12-30 14:03:14.850132 D | rafthttp: failed to dial 45b967575ff25cb2 on stream Message (dial tcp 192.168.10.150:2380: i/o timeout)
2016-12-30 14:03:15.358565 W | etcdserver: failed to reach the peerURL(http://192.168.10.150:2380) of member 45b967575ff25cb2 (Get http://192.168.10.150:2380/version: dial tcp 192.168.10.150:2380: getsockopt: no route to host)
2016-12-30 14:03:15.358613 W | etcdserver: cannot get the version of member 45b967575ff25cb2 (Get http://192.168.10.150:2380/version: dial tcp 192.168.10.150:2380: getsockopt: no route to host)
如下调试信息,是由于 192.168.10.231:2380 无法访问, b26f1b9a6c735437 为member id。
2016-12-30 21:38:43.130791 D | rafthttp: failed to dial b26f1b9a6c735437 on stream Message (dial tcp 192.168.10.231:2380: getsockopt: connection refused)
2016-12-30 21:38:43.191227 D | rafthttp: failed to dial b26f1b9a6c735437 on stream MsgApp v2 (dial tcp 192.168.10.231:2380: getsockopt: connection refused)
2016-12-30 21:38:43.232280 D | rafthttp: failed to dial b26f1b9a6c735437 on stream Message (dial tcp 192.168.10.231:2380: getsockopt: connection refused)
2016-12-30 21:38:43.292289 D | rafthttp: failed to dial b26f1b9a6c735437 on stream MsgApp v2 (dial tcp 192.168.10.231:2380: getsockopt: connection refused)
2016-12-30 21:38:43.334129 D | rafthttp: failed to dial b26f1b9a6c735437 on stream Message (dial tcp 192.168.10.231:2380: getsockopt: connection refused)
2016-12-30 21:38:43.393576 D | rafthttp: failed to dial b26f1b9a6c735437 on stream MsgApp v2 (dial tcp 192.168.10.231:2380: getsockopt: connection refused)
2016-12-30 14:18:36.328628 W | rafthttp: the clock difference against peer 3c828782a67e0043 is too high [2.00405135s > 1s]
2016-12-30 14:19:06.329559 W | rafthttp: the clock difference against peer 3c828782a67e0043 is too high [2.003973758s > 1s]
2016-12-30 14:19:36.331189 W | rafthttp: the clock difference against peer 3c828782a67e0043 is too high [2.004098356s > 1s]
2016-12-30 21:38:22.857546 W | rafthttp: the clock difference against peer 3c828782a67e0043 is too high [7h59m58.924003117s > 1s]
2016-12-30 21:38:22.892541 W | rafthttp: health check for peer b26f1b9a6c735437 failed
2016-12-30 21:38:22.892848 W | rafthttp: the clock difference against peer b26f1b9a6c735437 is too high [7h59m56.920465483s > 1s]