Note: Improving Restore Speed for Backup Systems that Use Inline Chunk-Based Deduplication
思路/方法
Measuring restore speed
提出了speed-factor,用以衡量存储速度。
Container capping
限制恢复文件时使用的container个数,为了保证调用container数目小于上限,有时需要放弃一部分重删效果(将某chunk在新的container中复制一次)
capping操作需要将数据形成segment(20M左右,5000个4KB chunk)
segment处理
- 读取一个segment大小的chunk到buffer中,确定每个chunk是否已存储、位于哪个container中(bloom filter等均可)
- 设置T个container的上限(用T个container可以还原该segment)
- 写入“new”chunk,建立索引
assembly
提出新的恢复算法(I/O单位(container)与使用单位(chunk)大小差别很大)。
cache container中的chunks到一个buffer中,减少高频chunk提取时的I/O。
工作量
- 9000+ 行C++代码。
- dataSets:2个
- Workgroup: Created from a semi- regular series of backups of the desktop PCs of a group of 20 engineers taken over a period of four months.
- 2year: a synthetic data set provided to us by HP Storage that they have designed to mimic the important characteristics of the data from a past customer escalation involving high fragmentation.
- 实验内容
- RAM usage (2 dataSet)
- BaseLine LRU cache
- Capping (varSegmentSize varTNumber 2dataSet)
- Assembly (speedFactor RAMUsage 2dataSet)
- varContainerSzie - speed test