leveldb snapshot详解

  了解leveldb 的snapshot首先得了解SequenceNumber。当插入数据时,SequenceNumber会依次增长,例如插入key1, key2, key3, key4等数据时,依次对应的SequenceNumber为1, 2, 3, 4。当然,并不是每次都会如此简单,当存在合并写时,例如key1, key2, key3, key4,key5. key1对应的SequenceNumber为1, key2, key3, key4对应的SequenceNumber为2, key5对应的SequenceNumber为5.

  一条kv键对会安如下格式插入到memtable里去:

  internal_key_size                       internal_key                   value_size            value

  ----------------------------|-----------------------|-----------------------|---------------  

  其中,internal_key 里就带了SequenceNumber, internal_key格式如下:

  key                                        SequenceNumber                    type(value类型)

  ---------------------|--------------------------------------|--------------------------

  也就是说SequenceNumber会跟随着kv键对存储的。

  

  接下来,我们看看snapshot的api, 接口和实现如下:

1 const Snapshot* DBImpl::GetSnapshot() {
2   MutexLock l(&mutex_);
3   return snapshots_.New(versions_->LastSequence());
4 }
5 
6 void DBImpl::ReleaseSnapshot(const Snapshot* s) {
7   MutexLock l(&mutex_);
8   snapshots_.Delete(reinterpret_cast<const SnapshotImpl*>(s));
9 }

  snapshots_为一个维护snapshot的双向链表。每次获取一个snapshot,就以当前的SequenceNumber new一个snapshot, 并插入到双向链表中。当释放一个snapshot时,就从双向链表中删除。

  那么如何保持快照的数据不会被删除了?在leveldb中,唯一会删除数据的地方就是compaction了。so,我们看下DBImpl::DoCompactionWork的核心部分

 1 Status DBImpl::DoCompactionWork(CompactionState* compact) {
 2   //...................
 3   if (snapshots_.empty()) {
 4     compact->smallest_snapshot = versions_->LastSequence();
 5   } else {
 6     compact->smallest_snapshot = snapshots_.oldest()->number_;
 7   }
 8 
 9   // Release mutex while we're actually doing the compaction work
10   mutex_.Unlock();
11 
12   Iterator* input = versions_->MakeInputIterator(compact->compaction);
13   input->SeekToFirst();
14   Status status;
15   ParsedInternalKey ikey;
16   std::string current_user_key;
17   bool has_current_user_key = false;
18   SequenceNumber last_sequence_for_key = kMaxSequenceNumber;
19   for (; input->Valid() && !shutting_down_.Acquire_Load(); ) {
20     //..............................
21     // Handle key/value, add to state, etc.
22     bool drop = false;
23     if (!ParseInternalKey(key, &ikey)) {
24       // Do not hide error keys
25       current_user_key.clear();
26       has_current_user_key = false;
27       last_sequence_for_key = kMaxSequenceNumber;
28     } else {
29       if (!has_current_user_key ||
30           user_comparator()->Compare(ikey.user_key,
31                                      Slice(current_user_key)) != 0) {
32         // First occurrence of this user key
33         current_user_key.assign(ikey.user_key.data(), ikey.user_key.size());
34         has_current_user_key = true;
35         last_sequence_for_key = kMaxSequenceNumber;
36       }
37 
38       if (last_sequence_for_key <= compact->smallest_snapshot) {
39         // Hidden by an newer entry for same user key
40         drop = true;    // (A)
41       } else if (ikey.type == kTypeDeletion &&
42                  ikey.sequence <= compact->smallest_snapshot &&
43                  compact->compaction->IsBaseLevelForKey(ikey.user_key)) {
44         // For this user key:
45         // (1) there is no data in higher levels
46         // (2) data in lower levels will have larger sequence numbers
47         // (3) data in layers that are being compacted here and have
48         //     smaller sequence numbers will be dropped in the next
49         //     few iterations of this loop (by rule (A) above).
50         // Therefore this deletion marker is obsolete and can be dropped.
51         drop = true;
52       }
53 
54       last_sequence_for_key = ikey.sequence;
55     }
56 
57     if (!drop) {
58     //..............................
59     }
60 
61     input->Next();
62   }
63 }

  在第6行中,compact->smallest_snapshot 赋值为最旧的snapshot的SequenceNumber. 随后创建了compation目标的iterator, 对于同一个key_a,  遍历时可能会出现

  (key_a,  value5)--------(key_a,  value4)--------(key_a,  value3)--------(key_a,  value2)--------(key_a,  value1)的顺序。

  当遍历至(key_a,  value5)时, 会运行33-35行的代码。随后last_sequence_for_key赋值为(key_a,  value5) , 下一次遍历至(key_a,  value4)时,将last_sequence_for_key 和compact->smallest_snapshot做比较,如果last_sequence_for_key小于compact->smallest_snapshot时,表示last_sequence_for_key比最旧的snaphot的SequenceNumber还要小,因此(key_a,  value4)可以在compact时drop掉。否则,如果(key_a,  value4)是删除操作,并且其sequency小于最旧的snaphot的SequenceNumber, 并且比该kv所在level更高level上没有相同key时这三个条件都满足时,也可以在compact时drop掉。其它情况都不可以drop.

  这样的compact逻辑就是为了旧snapshot可以读到旧的值,而不会因为后续的更新而变化。达到快照的目的。

  Get时,可以通过option传入snapshot参数。在Get逻辑中,实际的seek时会跳过SequenceNumber比snapshot大的kv键对。从而保证读到的时snapshot时的值,而非后续的新值。

posted on 2014-04-04 00:02  ewouldblock7  阅读(3476)  评论(0编辑  收藏  举报

导航