Elasticsearch 节点磁盘使用率过高,导致ES集群索引无副本
一、问题
最近在查看线上的 es,发现最近2天的索引没有副本,集群的状态也是为 yellow 的。
二、问题的原因
es 所在的服务器磁盘是还有剩余空间的。只不过磁盘使用了大概 89%,按道理来说应该是会继续使用的,并创建索引的副本的,我们经过查阅官方文档。
cluster.routing.allocation.disk.watermark.low
Controls the low watermark for disk usage. It defaults to 85%, meaning that Elasticsearch will not allocate shards to nodes that have more than 85% disk used. It can also be set to an absolute byte value (like 500mb) to prevent Elasticsearch from allocating shards if less than the specified amount of space is available. This setting has no effect on the primary shards of newly-created indices or, specifically, any shards that have never previously been allocated.
cluster.routing.allocation.disk.watermark.high
Controls the high watermark. It defaults to 90%, meaning that Elasticsearch will attempt to relocate shards away from a node whose disk usage is above 90%. It can also be set to an absolute byte value (similarly to the low watermark) to relocate shards away from a node if it has less than the specified amount of free space. This setting affects the allocation of all shards, whether previously allocated or not.
cluster.routing.allocation.disk.watermark.flood_stage
Controls the flood stage watermark. It defaults to 95%, meaning that Elasticsearch enforces a read-only index block (index.blocks.read_only_allow_delete) on every index that has one or more shards allocated on the node that has at least one disk exceeding the flood stage. This is a last resort to prevent nodes from running out of disk space. The index block must be released manually once there is enough disk space available to allow indexing operations to continue.
我们可以知道,es 集群的默认配置是当集群中的某个节点磁盘达到使用率为 85% 的时候,就不会在该节点进行创建副本,当磁盘使用率达到 90% 的时候,尝试将该节点的副本重分配到其他节点。当磁盘使用率达到95% 的时候,当前节点的所有索引将被设置为只读索引。
三、问题解决的办法
1. 扩大磁盘
……
2. 删除部分历史索引
3. 更改es设置
- 更改配置文件(需要重启es)
- 动态更改(api,无需重启)
es 的设置默认是 85% 和 90 %,我们更改为 90%和 95%。
3.1、更改配置文件(需要重启es)
在elasticsearch.yml
文件配置:
cluster.routing.allocation.disk.threshold_enabled: true
cluster.routing.allocation.disk.watermark.low: 90%
cluster.routing.allocation.disk.watermark.high: 95%
cluster.routing.allocation.disk.watermark.flood_stage: 98%
3.2、动态更改
所谓的动态更改就是通过 es 的 api 进行更改。transient
临时更改,persistent
是永久更改。
api 接口 /_cluster/settings
注意 cluster.routing.allocation.disk.watermark.flood_stage 参数是 6.0 版本开始才有的,在5的版本是没有该配置的, 是不支持的,我在修改5.6 的版本的时候添加了该参数,是有错误返回的 "reason":"persistent setting [cluster.routing.allocation.disk.watermark.flood_stage], not dynamically updateable"},"status":4001. 5.6 版本官方文档链接:https://www.elastic.co/guide/en/elasticsearch/reference/5.6/disk-allocator.html
查看es 当前的配置
查看es 当前的配置 get 请求 /_cluster/settings
。
curl 172.1.2.208:9200/_cluster/settings
{
"persistent": {
"xpack": {
"monitoring": {
"collection": {
"enabled": "true"
}
}
}
},
"transient": {
"cluster": {
"routing": {
"allocation": {
"disk": {
"watermark": {
"low": "90%",
"high": "95%"
}
}
}
},
"info": {
"update": {
"interval": "1m"
}
}
}
}
}
永久更改 persistent
重启后不失效。
{"persistent":
{
"cluster.routing.allocation.disk.watermark.low": "90%",
"cluster.routing.allocation.disk.watermark.high": "95%",
"cluster.info.update.interval": "1m"
}
}
临时更改 transient
重启后配置失效。
{"transient":
{
"cluster.routing.allocation.disk.watermark.low": "90%",
"cluster.routing.allocation.disk.watermark.high": "95%",
"cluster.info.update.interval": "1m"
}
}
示例:
root@111:~# curl -H "Content-Type: application/json" -XPUT 172.1.2.208:9200/_cluster/settings -d '{"transient": { "cluster.routing.allocation.disk.watermark.low": "90%", "cluster.routing.allocation.disk.watermark.high": "95%", "cluster.info.update.interval": "1m"}}'
{"acknowledged":true,"persistent":{},"transient":{"cluster":{"routing":{"allocation":{"disk":{"watermark":{"low":"90%","high":"95%"}}}},"info":{"update":{"interval":"1m"}}}}}
四、扩展
其实我们在官方文档也就可以看到,就是我们不仅仅可以使用百分比来进行设置,我们也可以使用空间的大小来进行设置,类似500mb
这样。
【推荐】国内首个AI IDE,深度理解中文开发场景,立即下载体验Trae
【推荐】编程新体验,更懂你的AI,立即体验豆包MarsCode编程助手
【推荐】抖音旗下AI助手豆包,你的智能百科全书,全免费不限次数
【推荐】轻量又高性能的 SSH 工具 IShell:AI 加持,快人一步
· 开发者必知的日志记录最佳实践
· SQL Server 2025 AI相关能力初探
· Linux系列:如何用 C#调用 C方法造成内存泄露
· AI与.NET技术实操系列(二):开始使用ML.NET
· 记一次.NET内存居高不下排查解决与启示
· 阿里最新开源QwQ-32B,效果媲美deepseek-r1满血版,部署成本又又又降低了!
· 开源Multi-agent AI智能体框架aevatar.ai,欢迎大家贡献代码
· Manus重磅发布:全球首款通用AI代理技术深度解析与实战指南
· 被坑几百块钱后,我竟然真的恢复了删除的微信聊天记录!
· 没有Manus邀请码?试试免邀请码的MGX或者开源的OpenManus吧