LSM Tree 学习笔记——本质是将随机的写放在内存里形成有序的小memtable,然后定期合并成大的table flush到磁盘
The Sorted String Table (SSTable
) is one of the most popular outputs for storing, processing, and exchanging datasets.
An SSTable
is a simple abstraction to efficiently store large numbers of key-value
pairs while optimizing for high throughput, sequential read/write
workloads.
Unfortunately, the SSTable name itself has also been overloaded by the industry to refer to services that go well beyond just the sorted table, which has only added unnecessary confusion to what is a very simple and a useful data structure on its own. Let's take a closer look under the hood of an SSTable and how LevelDB makes use of it.
SSTable: Sorted String Table
SSTable本身是个简单而有用的数据结构, 而往往由于工业界对于它的overload, 导致大家的误解
它本身就像他的名字一样, 就是a set of sorted key-value pairs
如下图左, 当文件比较大的时候, 也可以建立key:offset的index, 用于快速分段定位, 但这个是可选的.
这个结构和普通的key-value pairs的区别, 可以support range query和random r/w
A "Sorted String Table" then is exactly what it sounds like, it is a file which contains a set of arbitrary, sorted key-value pairs inside.
Duplicate
keys are fine, there is no need for "padding" for keys or values, and
keys and values are arbitrary blobs. Read in the entire file
sequentially and you have a sorted index. Optionally, if the file is
very large, we can also prepend, or create a standalone key:offset
index for fast access.
That's all an SSTable is: very simple, but also a very useful way to exchange large, sorted data segments.
SSTables and Log Structured Merge Trees
仅仅SSTable数据结构本身仍然无法support高效的range query和random r/w的场景
还需要一整套的机制来完成从memory sort, flush to disk, compaction以及快速读取……这样的一个完成的机制和架构称为,"The Log-Structured Merge-Tree" (LSM Tree)
名字很形象, 首先是基于log的, 不断产生SSTable结构的log文件, 并且是需要不断merge以提高效率的
下图很好的描绘了LSM Tree的结构和大部分操作
【推荐】国内首个AI IDE,深度理解中文开发场景,立即下载体验Trae
【推荐】编程新体验,更懂你的AI,立即体验豆包MarsCode编程助手
【推荐】抖音旗下AI助手豆包,你的智能百科全书,全免费不限次数
【推荐】轻量又高性能的 SSH 工具 IShell:AI 加持,快人一步
· 记一次.NET内存居高不下排查解决与启示
· 探究高空视频全景AR技术的实现原理
· 理解Rust引用及其生命周期标识(上)
· 浏览器原生「磁吸」效果!Anchor Positioning 锚点定位神器解析
· 没有源码,如何修改代码逻辑?
· 全程不用写代码,我用AI程序员写了一个飞机大战
· MongoDB 8.0这个新功能碉堡了,比商业数据库还牛
· 记一次.NET内存居高不下排查解决与启示
· 白话解读 Dapr 1.15:你的「微服务管家」又秀新绝活了
· DeepSeek 开源周回顾「GitHub 热点速览」