Note: Improving Restore Speed for Backup Systems that Use Inline Chunk-Based Deduplication

思路/方法

Measuring restore speed

提出了speed-factor,用以衡量存储速度。

Container capping

限制恢复文件时使用的container个数,为了保证调用container数目小于上限,有时需要放弃一部分重删效果(将某chunk在新的container中复制一次)
capping操作需要将数据形成segment(20M左右,5000个4KB chunk)

segment处理

  1. 读取一个segment大小的chunk到buffer中,确定每个chunk是否已存储、位于哪个container中(bloom filter等均可)
  2. 设置T个container的上限(用T个container可以还原该segment)
  3. 写入“new”chunk,建立索引

assembly

提出新的恢复算法(I/O单位(container)与使用单位(chunk)大小差别很大)。

cache container中的chunks到一个buffer中,减少高频chunk提取时的I/O。

工作量

  1. 9000+ 行C++代码。
  2. dataSets:2个
    • Workgroup: Created from a semi- regular series of backups of the desktop PCs of a group of 20 engineers taken over a period of four months.
    • 2year: a synthetic data set provided to us by HP Storage that they have designed to mimic the important characteristics of the data from a past customer escalation involving high fragmentation.
  3. 实验内容
    • RAM usage (2 dataSet)
    • BaseLine LRU cache
    • Capping (varSegmentSize varTNumber 2dataSet)
    • Assembly (speedFactor RAMUsage 2dataSet)
    • varContainerSzie - speed test
posted @ 2017-12-21 17:08  tino_ryj  阅读(268)  评论(0编辑  收藏  举报