elasticsearch 2.2+ index.codec: best_compression启用压缩
官方说法,来自https://www.elastic.co/guide/en/elasticsearch/reference/2.2/index-modules.html#_static_index_settings:
index.codec
The default
value compresses stored data with LZ4 compression, but this can be set tobest_compression
which uses DEFLATE for a higher compression ratio, at the expense of slower stored fields performance.
注意:2.1以下都是实验特性!2.2+才稳定!
Now you can also enable better compression on the cold nodes by setting index.codec: best_compression
in theirconfig/elasticsearch.yml
file in order to be able to archive more data with the same amount of disk space.
摘自:https://www.elastic.co/blog/store-compression-in-lucene-and-elasticsearch
下面的数据摘自:https://www.elastic.co/blog/elasticsearch-storage-the-true-story-2.0
The test methodology hasn’t changed so you can check out the old blog post or the README in the Github repo for the details.
Test | String fields | _all | index size /w LZ4 | index size /w DEFLATE | expansion ratio /w LZ4 | expansion ratio /w DEFLATE | Impact of DEFLATE |
Structured data file. Original file size: 67644119 | |||||||
1 | analyzed and not_analyzed | enabled | 63047579 | 53131592 | 0.932 | 0.785 | -0.157 |
2 | analyzed and not_analyzed | disabled | 48271433 | 38327106 | 0.713 | 0.566 | -0.206 |
3 | not_analyzed | disabled | 38920800 | 29014796 | 0.575 | 0.428 | -0.254 |
3b | not_analyzed, except for 'message' field which is retained and analyzed | disabled | 65382872 | 49532858 | 0.966 | 0.732 | -0.242 |
4 | not_analyzed, except for 'agent' field which is analyzed | disabled | 43083702 | 32063602 | 0.636 | 0.474 | -0.255 |
Semi-structured data file. Original file size: 75037027 |
|||||||
1 | analyzed and not_analyzed | enabled | 100478376 | 82132782 | 1.339 | 1.094 | -0.182 |
2 | analyzed and not_analyzed | disabled | 75238480 | 56911638 | 1.002 | 0.758 | -0.243 |
3 | not_analyzed | disabled | 71866672 | 53553561 | 0.957 | 0.713 | -0.254 |
3b | not_analyzed, except for 'message' field which is retained and analyzed | disabled | 104638750 | 83824398 | 1.394 | 1.117 | -0.198 |
4 | not_analyzed, except for 'agent' field which is analyzed | disabled | 72925624 | 54603882 | 0.971 | 0.727 | -0.251 |
With the standard LZ4-based compression, the indexed data size to raw data size ratio ranged from 0.575 to 1.394. After enabling DEFLATE-based compression using the best_compression index.codec option, the indexed data size to raw data size ratio range came down to 0.429 to 1.117. Enabling the best_compression option resulted in a 15.7% to 25.6% reduction in indexed data size depending on the test parameters.
As you can see, the ratio of index size to raw data size can vary greatly based on your mapping configuration, what fields you decide to create/retain, and the characteristics of the data set itself. We encourage you to run similar tests yourself to determine what the data compression/expansion factor is for your data set and application requirements.
Conclusion
There were many amazing features added to Elasticsearch 2.0 worth considering. As we’ve discussed, two of these new features in particular can reduce the hardware footprint required for an Elasticsearch cluster by 15-25% or more: 1) the addition of a best_compression option and 2) enabling doc_values by default. This allows us to get to compression ratios between 0.429 and 1.117.
【推荐】国内首个AI IDE,深度理解中文开发场景,立即下载体验Trae
【推荐】编程新体验,更懂你的AI,立即体验豆包MarsCode编程助手
【推荐】抖音旗下AI助手豆包,你的智能百科全书,全免费不限次数
【推荐】轻量又高性能的 SSH 工具 IShell:AI 加持,快人一步
· 记一次.NET内存居高不下排查解决与启示
· 探究高空视频全景AR技术的实现原理
· 理解Rust引用及其生命周期标识(上)
· 浏览器原生「磁吸」效果!Anchor Positioning 锚点定位神器解析
· 没有源码,如何修改代码逻辑?
· 全程不用写代码,我用AI程序员写了一个飞机大战
· MongoDB 8.0这个新功能碉堡了,比商业数据库还牛
· 记一次.NET内存居高不下排查解决与启示
· 白话解读 Dapr 1.15:你的「微服务管家」又秀新绝活了
· DeepSeek 开源周回顾「GitHub 热点速览」