CMU_15445_P4_Part1

CMU_15445_Project4_Task1-2

到这部分, BUSTUB 数据库引擎的整体面容开始显现出来了, 在文件 src/include/common/bustub_instance.h 中, 当我们想要实例化一个 BUSTUB 对象的时候, 可以看到它的组成部分有:

  // Currently the followings are directly referenced by recovery test, so
  // we cannot do anything on them until someone decides to refactor the recovery test.
  // 磁盘管理器
  std::unique_ptr<DiskManager> disk_manager_;
  // BPM, 管理缓冲区
  std::unique_ptr<BufferPoolManager> buffer_pool_manager_;
  // 锁管理器
  std::unique_ptr<LockManager> lock_manager_;
  // 事务管理器, 这是我们 Project4 需要完成的部分
  std::unique_ptr<TransactionManager> txn_manager_;
  // 日志管理工具
  std::unique_ptr<LogManager> log_manager_;
  // 事务管理中创建 checkpoints
  std::unique_ptr<CheckpointManager> checkpoint_manager_;
  // 数据库表存储的画布
  std::unique_ptr<Catalog> catalog_;
  // 数据库执行引擎, 也就是我们的 Executor
  std::unique_ptr<ExecutionEngine> execution_engine_;
  /** Coordination for catalog */
  std::shared_mutex catalog_lock_;

这些可以看作是构成一个数据库的核心部分.

Task #1 - Timestamps

MVCC 在事务的并发控制的时候往往选择时间片算法, 使用时间片可以作为事务执行的先后顺序, 以及作为版本控制的基础.

每个数据库对象会存储该对象在某个时间片的物理版本信息, 也就是数据库的版本信息.
读时间片: 每个事务执行的时候都有一个读时间片, 是事务开始执行的时候分配的, 由于是事务开始的时候分配的时间片, 读时间片限制了事务可以读的版本信息, 该事务仅可以读取其读时间片前一个版本的数据库对象的内容. 例如, 下图假设 \(\mathbf{Txn_X}\) 的读时间片是 3, 那么它可以看到数据库中逻辑对象的物理版本为 \(A,3, B,3, C,2, D,3\) , 其中 A,B,C,D 是第一列的值, 3,3,2,3 是第二列的值.
写时间片: 写时间片是一个事务 commit 的时间片, 在 MVCC 中, DBMS 控制事务的提交, 系统的 commit_ts 是一个单调递增的计数器.

Watermark

我们可以看到数据库的版本控制链, 事务读, 事务提交, 都存在时间片, 这些时间片是如何落在同一个时间轴的呢, 也就是这些时间片之间的关联关系是怎样的呢?

将读时间片与写时间片合并到一个时间轴: 我们先假设只有写时间片的轴, 写时间片是事务在轴上 commit 的时间点, Watermark 记录当前时间点最近 commit 的事务的 commit 时间片, 也就是最近一次 commit 的写时间片. 当一个事务开始的时候, 使用这个时间片作为读时间片, 表示该时间片以前的版本均可以读.
数据库版本时间片向 Commit 时间片的轴合并: BUSTUB 中维护数据库逻辑对象多版本的方式是使用 UndoLog, 每个事务在 Commit 的时候会生成 UndoLog 记录本次 Commit 修改之前的状态. 后续我们可以看到, 如果一个 UndoLog 记录的时间片为 3, 那么我们需要执行这个 UndoLog 才可以看到时间片为 3 的时候的一个逻辑对象的物理状态. 那么按照顺序执行应该是, 在一个事务 commit 的时候, 生成前一个时间片的 UndoLog, 例如事务的 commit 时间片为 4, 那么在 commit 之前, 生成时间片为 3 的 UndoLog.

Task #2 - Storage Format and Sequential Scan

在之前的博客中我介绍了在 BUSTUB 中数据对象的存储形式, 也就是 Tuple 在内存中存储的形式以及结构. 在 MVCC 中, 我们需要将数据按照版本的方式进行存储, 这样对底层存储 Tuple 的形式有一些变化, BUSTUB 是如何构建这个版本链的呢, 事务又是如何找到对应的可读版本的呢?
在 BUSTUB 中, Tuple 中的数据实际上会存储在下面三个部分:

TableHeap: TableHeap 中存储的是最新版本的数据
Transaction Manager: Transaction Manager 并不直接存储数据, 而是存储一个 Tuple 到这个 Tuple 版本链的映射, 这个映射可以根据 Tuple 对应的 RID 找到对应 Tuple 的版本链, 从而遍历版本链, 找到对应版本的数据
Transaction: 每个事务会以 UndoLog 的形式存储自己对 Tuple 的修改, 在 BUSTUB 中, 事务仅存储对 UndoLog 修改的部分, 并且在 UndoLog 形成版本链的时候, 会存储前一个修改该 Tuple 的事务, 并且从这个事务中找到修改项.
下图是这三部分存储 Tuple 数据的一个示例.

当我们找到某个对应时间片时 Tuple 中存储的信息的时候, 需要使用上述的三个部分存储的数据将最新版本的 Tuple 数据恢复到某一个时间片时刻的 Tuple 的数据.

Tuple Reconstruction

我们知道在 TableHeap 中获取最新版本的信息后需要经过 TransactionManager 以及 Transaction 中的 UndoLog 重构某个时间片版本的 Tuple 的物理数据, 我们先来看第一步, 从 TableHeap 到 TransactionManager 获取版本链的开头.

From TableHeap to TransactionManager(VersionUndoLink)

在代码中我们可以看到下面部分:
在 TransactionManager 中可以通过下面的 version_info_ 和 PageVersionInfo 获取某个 RID 对应的 Tuple 的当前最新版本的 VersionUndoLink. 这个 RID 来自于最新版本的 Tuple.

struct PageVersionInfo {
  /** protects the map */
  std::shared_mutex mutex_;
  /** Stores previous version info for all slots. Note: DO NOT use `[x]` to access it because
    * it will create new elements even if it does not exist. Use `find` instead. 使用 version_info_ 中的 page_id_t 与
    * slot_offset_t 可以得到一个 Tuple 的 RID, 用于访问某个 Tuple 的版本链
    */
  std::unordered_map<slot_offset_t, VersionUndoLink> prev_version_;
};

/** protects version info */
std::shared_mutex version_info_mutex_;
/** Stores the previous version of each tuple in the table heap. Do not directly access this field. Use the helper
  * functions in `transaction_manager_impl.cpp`. */
std::unordered_map<page_id_t, std::shared_ptr<PageVersionInfo>> version_info_;

使用一张图具体化这个流程如下:

图中很清楚的描述了当从一个 TableHeap 读取一个最近版本的 Tuple 的时候, 是如何找到这个 Tuple 的版本链的. 上图中可以看到在 version_info_ 和 prev_version_ 都使用了互斥锁, 这是因为当找到一个 prev_version_, 获取这个 prev_version_ 的互斥锁, 而释放 version_info_mutex_, 这样可以避免阻塞其他事务或者线程访问 version_info_.

使用 UndoLog 版本链重构 Tuple

在 TransactionManager 获取到一个 VersionUndoLink 后, 可以从这个 VersionUndoLink 中获取这个 Tuple 的第一个版本的 UndoLink. 第一个 UndoLink 是来自于 TransactionManager, 但是后续形成版本链的 UndoLink 和 UndoLog 都是存储在事务中的, 形成版本链的流程如下:

UndoLink 和 UndoLog 是如何形成版本链的

可以从 UndoLink 中获取前一个版本的事务的 ID(prev_txn_). 然后在 txn_map_ 事务 Map 中找到这个事务 ID 对应的事务 pre_txn, 在 UndoLink 中使用 prev_log_idx_ 记录前一个版本的 UndoLog 在事务中的下标, 因此使用 pre_txn->GetUndoLog(UndoLink.prev_log_idx_); 可以得到前一个事务的 UndoLog. 而在这个 UndoLog 中又记录着更前一个事务的 UndoLink. 这种方式形成了 Tuple 的版本链, 并且从 TransactionManager 开始就可以找到这条链.

可以用下图来描述这个版本链形成的过程:

UndoLog 重构版本链

在上图中我们可以看到 UndoLog 做了哪些事情, 时间片为 \(n\) 的 UndoLog 会将前一个版本的 Tuple 中的某些数据项修改为时间片为 \(n\) 时 Tuple 中存储的内容, 也就是回退到版本时间片.
在上图中, 每一个 UndoLog 的结构如下图所示:

UndoLog结构

在代码中对应的 UndoLog 结构体的信息如下所示:

struct UndoLog {
  /* Whether this log is a deletion marker */
  bool is_deleted_;
  /** The fields modified by this undo log
   * modified_fields is a vector of bool that has the same length as the table schema. If one of the fields is set to
   * true, it indicates that the field is updated. The tuple field contains the partial tuple.
   */
  std::vector<bool> modified_fields_;
  /* The modified fields */
  Tuple tuple_;
  /* Timestamp of this undo log */
  timestamp_t ts_{INVALID_TS};
  /* Undo log prev version */
  UndoLink prev_version_{};
};

对 UndoLog 的一些解释:

在前面版本链构造的过程中, 我们以及看到如果要获得时间片为 \(\mathbf{T}\) 的 Tuple 的物理存储版本, 需要构造版本链之后, 执行所有 ts_ 小于等于 \(\mathbf{T}\) 的 UndoLog.
is_deleted_: UndoLog 记录都是这个 UndoLog 之前的状态, is_deleted_ == true 表示这个 UndoLog 之前的状态是删除状态, 所以 tuple_ 存储的内容此时为空, 而在生成这个 UndoLog 的时候, 对于一个 Tuple 而言, 删除这个 Tuple, 只是将 TupleMeta 中的 is_deleted 设置为 true, 并没有对 Tuple 实际存储的数据进行修改, 所以 UndoLog 只需要记录一个标志位就可以了, 实际上没有任何数据修改.
modified_fields_: 在上图中显示了, 这个字段主要记录 Tuple 的哪些字段被修改了, 被修改的部分在数组中对应位置存储为 true, 否则为 false.
tuple_: 是记录当前 Tuple 在当前对应的 UndoLog 修改之前的 Tuple 的内容, 但是仅记录修改的部门, 与上面的 modified_fields_ 对应.
ts_: 这个 UndoLog 对应的时间片.

ReconstructTuple 的实现

ReconstructTuple 函数的实现较为简单, 因为它实际上接收的是一个 undo_logs 数组, 可以遍历这个数组, 按照数组的顺序反向构建 Tuple 即可, 我的部分实现代码如下:

std::vector<Value> temp_values;

/** 创建局部 schema */
auto temp_schema = GetUndoLogSchema(&undo_log, schema);

/** 组装新的 tuple */
uint32_t modified_index = 0;
for (uint32_t index = 0; index < undo_log.modified_fields_.size(); index++) {
  if (undo_log.modified_fields_[index]) {
    temp_values.emplace_back(undo_log.tuple_.GetValue(temp_schema.get(), modified_index));
    modified_index++;
  } else {
    temp_values.emplace_back(tuple_result.GetValue(schema, index));
  }
}

/** 生成新的 tuple_result */
tuple_result = Tuple(temp_values, schema);

需要特别注意的是 is_deleted_ 标志的检测, 以及一开始的 TupleMeta 中的 is_deleted_ 的检测, 如果是已经被删除的 Tuple, 需要过滤.

Sequential Scan / Tuple Retrieval

在 SeqScanExecutor 中我们实现了从数据库中读取 Tuple, 但是在 MVCC 之前, 我们都是直接从 TableHeap 中读取的, 也就是这个 Tuple 的最新版本的数据, 但是在 MVCC 中, 其他已经 Commited 事务可能已经修改了 Tuple, 因此需要构造一个版本链, 找到这个事务应该读取的 Tuple 的版本, 通常, 我们会遇到下面三种情况:
在分类之前, 需要先介绍一下 BUSTUB 中给事务分配 ID 的机制.

在 BUSTUB 中, commit 时间片是一个单调底层的计数器, 从 0 到 TXN_START_ID - 1. 时间片的数据类型是 int64_t, 当 TableHeap 中某个 Tuple 的 TupleMeta.ts_ 的最高位是 1 的时候, 表示这个 Tuple 正在被一个事务修改, 但是这个事务还没有 commited. 换句话说, 已经 commited 的事务的 commit_ts 是从 0 到 TXN_START_ID - 1, 那么没有 commit 之前, 这个事务的 txn_id 和这个事务正在修改的 Tuple 的 TupleMeta.ts_ 为 TXN_START_ID + txn_human_readable_id = txn_id. 第一个事务的 txn_id 为 TXN_START_ID, 后续每开始执行一个事务, txn_id 增加但是这个数字很大, 不够直观的表示是第几个事务, 因此使用 txn_human_readable_id 直观的表示正在执行的事务的 ID.

在介绍完事务 ID 分配机制后, 我们将 TableHeap 中存储 Tuple 的版本与事务 ID 之间的关系分成下面三种情况:

TableHeap 中存储的 Tuple 就是当前事务可以读取的最新版本, 这里也有两种情况, 一种是: tuple_meta.ts_ < TXN_START_ID && tuple_meta.ts_ <= txn_read_ts, 表示没有事务正在修改这个 Tuple, 并且当前事务的读时间戳大于最新数据的时间戳, 所以可以直接读 TableHeap 中的数据. 另一种是 tuple_meta.ts_ == TXN_START_ID + exec_ctx_->GetTransaction()->GetTransactionIdHumanReadable()) 表示当前 TableHeap 中的数据恰好是当前事务正在修改的数据, 因此也可以直接读
tuple_meta.ts_ >= TXN_START_ID. 这种情况表明当前 TableHeap 中的这个 Tuple 被其他事务修改了, 但是其他事务还没有提交, 因此需需要从 UndoLog 的版本链中重构这个 Tuple 到正确时间片下的版本.
tuple_meta.ts_ > txn_read_ts_: 如果 tuple_meta.ts_ < TXN_START_ID, 但是当前 TableHeap 中的 ts_ 大于事务的读时间戳 txn_read_ts_, 表示 TableHeap 中的 Tuple 的版本要比事务开始的时候新, 因此也需要构建 UndoLog 版本链, 重构这个 Tuple 到正确时间片下的版本.

我们可以用下图的例子说明这一过程:

图中的 _ 表示Tuple 的这一列没有被修改, 之前的状态保持不变.

在上面的例子中:

事务 Txn9 是一个修改了 Tuple 但是没有 commit 的事务, Txn9 扫描这个数据库表得到的输出应该是 (A, 9), (B, 9), (C, 2), (D, 9).
例如一个事务的读时间片是 4, 那么它遍历表得到的数据应该是 (A, 3), (B, 3), (C, 4), 因为事务 Txn9 对 Tuple 的修改还没有提交, 因此不应该读取最新的数据.

实现方式

Sequential Scan 比较复杂的步骤是当读取 TableHeap 中的一个 tuple 的时候, 读到的是最新版本, 需要使用 undolog 版本链找到事务正确读取的版本. 在查看版本链的时候需要关注下面的细节:

假设当前事务为 Txn, 需要在版本链中找到第一个 ts_ <= Txn.read_ts_ 的 undolog, 作为该事务读取到的 tuple 的版本, 而该 undolog 后续的 undolog 都不读取.
如果事务 Txn 的 read_ts 在所有的 UndoLog 的 ts_ 之前, 说明在这个事务 Txn 开始时, 这个 Tuple 还不存在.
如果当前 Tuple 正在被其他事务修改, 并且 undo_logs 为空, 说明是其他事务新增的 Tuple, 该事务读不到.

在实现过程中, 分别对上面三个步骤进行判断

std::vector<UndoLog> undo_logs;
auto version_undo_link = exec_ctx_->GetTransactionManager()->GetUndoLink(*rid);
/** 第一个 ts_ 小于等于 txn_read_ts_ 的 undolog 是否放入 */
bool threshold_ts_undolog = false;
/** 从头开始向尾部遍历版本链 */
while (version_undo_link.has_value() && version_undo_link->IsValid()) {
  auto undo_log = exec_ctx_->GetTransactionManager()->GetUndoLog(version_undo_link.value());
  if (!threshold_ts_undolog) {
    version_undo_link = undo_log.prev_version_;
    undo_logs.emplace_back(undo_log);
    /**
      * 如果 undo_log 的时间戳小于当前事务读的时间戳, 跳出循环, 无需执行该 undoLog
      * 如果 undo_log 的时间戳等于当前事务的时间戳, 仍然需要执行这次 undolog
      */
    if (undo_log.ts_ <= txn_read_ts_) {
      threshold_ts_undolog = true;
    }
  }
  /** If we have read the threshold_ts_undolog, break out of this loop */
  if (threshold_ts_undolog) {
    break;
  }
}

判断不读取该 Tuple 的情况, 也就是跳过的情况

// ! 如果事务的 read_ts 在所有的 UndoLog 之前, 说明在这个 read_ts 时, 这个 Tuple 还不存在
if (!undo_logs.empty() && undo_logs.back().ts_ > txn_read_ts_) {
  ++(*table_iterator_);
  continue;
}
// ! 如果当前 Tuple 正在被其他事务修改, 并且 undo_logs 为空, 说明是其他事务新增的 Tuple, 该事务读不到
if ((tuple_meta.ts_ >= TXN_START_ID || tuple_meta.ts_ > txn_read_ts_) && undo_logs.empty()) {
  ++(*table_iterator_);
  continue;
}

最后需要注意的一点是, 如果一个 tuple 需要通过版本链访问旧版本, Sequential Scan 返回读取的 rid 仍然是 TableHeap 中的那个 rid. 因为 rid 是 tuple 的位置的标识, 在 MVCC 中并不是具体数据存储的标识.

TxnMgrDbg 实时查看版本链

这是一个后续调试中会频繁调用与使用的工具, 这个工具的作用就是输出一张表中所有 tuples 的实时信息, 包含这个 tuple 的版本链条.
注释中给出了输出的格式如下:
首先第一行输出这个 tuple 在 TableHeap 中的最新信息, 第二行开始, 到最后输出这个 tuple 的版本链信息.

RID=0/0 ts=txn8 tuple=(1, <NULL>, <NULL>)
txn8@0 (2, _, _) ts=1

下面解释一下这些表示什么意思:

RID=0/0: TableHeap 中的一个 tuple 的标识
ts=txn8: 当前 TableHeap 中, 最新的 tuple 的时间片信息, ts=txn8 表示这个 tuple 被事务 txn8 修改过, 但是还没有 commit.
tuple: 例如上面的 tuple=(1, <NULL>, <NULL>) 表示的是 TableHeap 中这个 tuple 存储的最新的书局.
txn8@0: 表示这个 undolog 是由事务 txn8 生成的, 实际中不会出现输出示例中的这种, 正在修改的 tuple 的事务和 undolog 中的事务相同的情况, 这里都是 txn8, 只是一个格式示例. 后面的 @0 表示的是这个 undolog 是事务 txn8 的第一个 undolog, 下标为 0. 也就是 undo_log_index.
(2, _, _): 记录的是 undolog 的 tuple, 但是我们知道, undolog 仅记录被修改的部分, 这里却输出的所有的部分, 没有被修改的部分使用 _ 表示.
ts=1: 这个 undolog 的 ts_.

下面是我的实现步骤:

void TxnMgrDbg(const std::string &info, TransactionManager *txn_mgr, const TableInfo *table_info,
               TableHeap *table_heap) {
  // always use stderr for printing logs...
  fmt::println(stderr, "debug_hook: {}", info);

  /**
  fmt::println(
      stderr,
      "You see this line of text because you have not implemented `TxnMgrDbg`. You should do this once you have "
      "finished task 2. Implementing this helper function will save you a lot of time for debugging in later tasks.");
  */
  // We recommend implementing this function as traversing the table heap and print the version chain. An example
  // output of our reference solution:
  //
  // debug_hook: before verify scan
  // RID=0/0 ts=txn8 tuple=(1, <NULL>, <NULL>)
  //   txn8@0 (2, _, _) ts=1
  // RID=0/1 ts=3 tuple=(3, <NULL>, <NULL>)
  //   txn5@0 <del> ts=2
  //   txn3@0 (4, <NULL>, <NULL>) ts=1
  // RID=0/2 ts=4 <del marker> tuple=(<NULL>, <NULL>, <NULL>)
  //   txn7@0 (5, <NULL>, <NULL>) ts=3
  // RID=0/3 ts=txn6 <del marker> tuple=(<NULL>, <NULL>, <NULL>)
  //   txn6@0 (6, <NULL>, <NULL>) ts=2
  //   txn3@1 (7, _, _) ts=1
  /** 迭代 table_heap 中的每一个 Tuple */
  auto smallest_read_ts = txn_mgr->GetWatermark();
  auto table_iterator_ = std::make_unique<TableIterator>(table_heap->MakeIterator());
  while (!table_iterator_->IsEnd()) {
    /** 获取当前 Tuple */
    auto [tuple_meta, current_tuple] = table_iterator_->GetTuple();
    auto rid = current_tuple.GetRid();
    /** 输出 Tuple 当前的状态, 需要注意 tuple_meta 中记录的是否删除标志, 以及时间戳用于得到事务的可读 ID 的信息 */
    fmt::println(stderr, "RID={}/{} ts={}{} tuple={}", rid.GetPageId(), rid.GetSlotNum(),
                 (tuple_meta.ts_ > TXN_START_ID) ? fmt::format("txn{}", tuple_meta.ts_ - TXN_START_ID)
                                                 : fmt::format("{}", tuple_meta.ts_),
                 tuple_meta.is_deleted_ ? " <del marker>" : "", current_tuple.ToString(&(table_info->schema_)));

    /** 获取最初的版本链 */
    auto version_undo_link = txn_mgr->GetUndoLink(rid);
    /** 由于 GetWatermark() 的设置, 后续看不到的 tuple, 删除的 tuple 就不输出了, 看不到了 */
    bool undolog_start_drop = tuple_meta.ts_ <= smallest_read_ts ? true : false;
    while (version_undo_link.has_value() && version_undo_link->IsValid()) {
      if (undolog_start_drop) {
        break;
      }
      /** 打印当前的 UndoLog 版本链的信息 */
      auto undo_log = txn_mgr->GetUndoLog(version_undo_link.value());
      /** 获取当前版本链的对应的事务信息 */
      auto iter = txn_mgr->txn_map_.find(version_undo_link->prev_txn_);
      if (iter == txn_mgr->txn_map_.end()) {
        continue;
      }
      auto txn_id = iter->second->GetTransactionIdHumanReadable();
      /** 当前 undoLog 在事务的 undologs 数组的下标 */
      auto undo_log_index = version_undo_link->prev_log_idx_;
      /** 为了输出, 构造出当前 undo_log 的 Schema 信息以及 Tuple 的格式输出信息 */
      auto format_schema = GenerateFormatSchema(undo_log, &table_info->schema_);
      auto format_tuple = GenerateFormatTuple(undo_log, &table_info->schema_, format_schema.get());
      /** 输出当前版本链中的 undo_log 的信息 */
      fmt::println(stderr, "txn{}@{} {} ts={}", txn_id, undo_log_index,
                   undo_log.is_deleted_ ? "<del>" : format_tuple->ToString(format_schema.get()), undo_log.ts_);
      version_undo_link = undo_log.prev_version_;

      if (!undolog_start_drop && undo_log.ts_ < smallest_read_ts) {
        undolog_start_drop = true;
      }
    }
    ++(*table_iterator_);
  }
}

posted @ 2025-02-25 18:21 虾野百鹤阅读(51) 评论(0) 收藏举报

刷新页面返回顶部

行远自迩登高自卑

CMU_15445_P4_Part1

CMU_15445_Project4_Task1-2

Task #1 - Timestamps

Watermark

Task #2 - Storage Format and Sequential Scan

Tuple Reconstruction

From TableHeap to TransactionManager(VersionUndoLink)

使用 UndoLog 版本链重构 Tuple

ReconstructTuple 的实现

Sequential Scan / Tuple Retrieval

实现方式

TxnMgrDbg 实时查看版本链

公告

行远自迩 登高自卑

CMU_15445_P4_Part1

CMU_15445_Project4_Task1-2

Task #1 - Timestamps

Watermark

Task #2 - Storage Format and Sequential Scan

Tuple Reconstruction

From TableHeap to TransactionManager(VersionUndoLink)

使用 UndoLog 版本链重构 Tuple

ReconstructTuple 的实现

Sequential Scan / Tuple Retrieval

实现方式

TxnMgrDbg 实时查看版本链

公告

行远自迩登高自卑