ETCD磁盘空间爆满解决方案
ETCD磁盘报警处理
etcd默认的空间配额限制为2G,超出空间配额限制就会影响服务,所以需要定期清理
查看ETCD日志
8月 04 17:00:04 1.novalocal etcd[24848]: read-only range request "key:\"XXXXXXXXXXXXXXXXXXXXX" " with result "range_response_count:1 size:3775" took too long (1.354750458s) to execute
8月 04 17:00:05 1.novalocal etcd[24848]: read-only range request "key:\"XXXXXXXXXXXXXXXXXXXXX" range_end:\"XXXXXXXXXXXXXXXXXXXXX" " with result "range_response_count:2303873 size:1274241272" took too long (11.31986XXXXXXXXXXXXXXXXXXXXX
8月 04 17:05:09 1.novalocal etcd[24848]: read-only range request "key:\"XXXXXXXXXXXXXXXXXXXXX" " with result "range_response_count:1 size:3775" took too long (1.136787261s) to execute
8月 04 17:05:10 1.novalocal etcd[24848]: read-only range request "key:\"XXXXXXXXXXXXXXXXXXXXX" range_end:\"XXXXXXXXXXXXXXXXXXXXX" " with result "range_response_count:2303873 size:1274241272" took too long (11.68081XXXXXXXXXXXXXXXXXXXXX
8月 04 17:05:11 1.novalocal etcd[24848]: WARNING: 2020/08/04 17:05:11 grpc: Server.processUnaryRPC failed to write status connection error: desc = "transport is closing"
8月 04 17:10:14 1.novalocal etcd[24848]: read-only range request "key:\"XXXXXXXXXXXXXXXXXXXXX" " with result "range_response_count:1 size:3775" took too long (1.173390639s) to execute
8月 04 17:10:15 1.novalocal etcd[24848]: read-only range request "key:\"XXXXXXXXXXXXXXXXXXXXX" range_end:\"XXXXXXXXXXXXXXXXXXXXX" " with result "range_response_count:2303873 size:1274241272" took too long (11.42705XXXXXXXXXXXXXXXXXXXXX
8月 04 17:15:19 1.novalocal etcd[24848]: read-only range request "key:\"XXXXXXXXXXXXXXXXXXXXX" " with result "range_response_count:1 size:3775" took too long (1.311071626s) to execute
8月 04 17:15:20 1.novalocal etcd[24848]: read-only range request "key:\"XXXXXXXXXXXXXXXXXXXXX" range_end:\"XXXXXXXXXXXXXXXXXXXXX" " with result "range_response_count:2303873 size:1274241272" took too long (11.22721XXXXXXXXXXXXXXXXXXXXX
发现存在大量
took too long (11.42705XXXXXXXXXXXXXXXXXXXXX
日志
查看ETCD集群状态
- 查看集群状态
ETCDCTL_API=3 ./etcdctl --endpoints=$ip:$port --write-out=table endpoint status
+------------------------+------------------+---------+---------+-----------+-----------+------------+
| ENDPOINT | ID | VERSION | DB SIZE | IS LEADER | RAFT TERM | RAFT INDEX |
+------------------------+------------------+---------+---------+-----------+-----------+------------+
| http://127.0.0.1:2379 | 728d3145169b227d | 3.3.10 | 2.1 GB | false | 6 | 3616392 |
+------------------------+------------------+---------+---------+-----------+-----------+------------+
- 查看ETCD集群报警情况
ETCDCTL_API=3 ./etcdctl --endpoints=$ip:$port alarm list
meberID:XXXXXXXXXXXXXXX alarm:NOSPACE
此处
alarm
提示NOSPACE
,需要升级 ETCD 集群的空间(默认为2G的磁盘使用空间),或者压缩老数据,升级空间后,需要使用 etcd命令,取消此报警信息,否则集群依旧无法使用
增加etcd的容量,由2G-->8G,增加以下三个参数
vi /etc/systemd/system/rio-etcd.service
## auto-compaction-retention 参数#(单位⼩时)
--auto-compaction-mode=revision --auto-compaction-retention=24 --quota-backend-bytes=8589934592
获取当前etcd数据的修订版本(revision)
rev=$(ETCDCTL_API=3 etcdctl --endpoints=$ip:$port endpoint status --write-out="json" | egrep -o '"revision":[0-9]*' | egrep -o '[0-9].*')
echo $rev
- 整合压缩旧版本数据
ETCDCTL_API=3 etcdctl --endpoints=$ip:$port compact $rev
- 执行碎片整理
ETCDCTL_API=3 etcdctl --endpoints=$ip:$port defrag
解除告警
ETCDCTL_API=3 etcdctl --endpoints=$ip:$port alarm disarm
验证可以添加新数据
ETCDCTL_API=3 etcdctl --endpoints=$ip:$port put newkeytestfornospace 123
参考文档
https://www.cnblogs.com/lvcisco/p/10775021.html
【推荐】国内首个AI IDE,深度理解中文开发场景,立即下载体验Trae
【推荐】编程新体验,更懂你的AI,立即体验豆包MarsCode编程助手
【推荐】抖音旗下AI助手豆包,你的智能百科全书,全免费不限次数
【推荐】轻量又高性能的 SSH 工具 IShell:AI 加持,快人一步
· 基于Microsoft.Extensions.AI核心库实现RAG应用
· Linux系列:如何用heaptrack跟踪.NET程序的非托管内存泄露
· 开发者必知的日志记录最佳实践
· SQL Server 2025 AI相关能力初探
· Linux系列:如何用 C#调用 C方法造成内存泄露
· 震惊!C++程序真的从main开始吗?99%的程序员都答错了
· 【硬核科普】Trae如何「偷看」你的代码?零基础破解AI编程运行原理
· 单元测试从入门到精通
· 上周热点回顾(3.3-3.9)
· Vue3状态管理终极指南:Pinia保姆级教程