CMU_15445_P3_Part4

Sort + Limit Executors + Window Functions + Top-N Optimization

这里的实现比较简单, 就不赘述了, 后续补充一下这部分的优先队列的使用, 以及选择的方式

Window Functions

窗口函数的实现才是这部分的重点

窗口函数介绍

MySQL 窗口函数是一类特殊的 SQL 函数, 用于在查询结果中对数据进行分组和排序的基础上执行计算, 而无需将结果分组到单独的行中. 与聚合函数(如 SUM、AVG 等)相同, 窗口函数会新增通过聚合函数计算新增的列. 与聚合函数(如 SUM、AVG 等)不同, 窗口函数不会减少结果集的行数, 而是为每一行生成一个计算值.

窗口函数的特点

保留行数：窗口函数计算后的结果不会减少数据行, 而是为每一行添加计算结果.
支持分组和排序：通过 PARTITION BY 和 ORDER BY 子句, 可以按组或按顺序计算结果.
与聚合函数结合使用：窗口函数通常与聚合函数结合使用, 为每行提供基于“窗口”的聚合结果.

窗口函数语法

窗口函数在使用时, 通用语法如下:

function_name ( [参数] )
OVER (
    [PARTITION BY column_name]
    [ORDER BY column_name]
    [ROWS | RANGE BETWEEN frame_start AND frame_end]
)

语法说明：

上述语法说明了窗口函数主要由四部分组成, 分别是, 聚合函数, 分组, 排序, 以及窗口帧

function_name：窗口函数名, 如 ROW_NUMBER、RANK、SUM 等, 通常是聚合函数, 用来生成新的 column 的函数
PARTITION BY：将数据分组为窗口, 根据 column value 将表中的数据进行分组.
ORDER BY：定义窗口内的排序方式.
窗口帧：定义了窗口的范围, 即对于每一行计算的上下文数据行的范围, ROWS: 基于行的物理位置定义窗口帧, RANGE: 基于排序列的值范围定义窗口帧.

常见的窗口函数

1. 排名函数

ROW_NUMBER()：为每组数据的每行生成唯一的连续编号.
RANK()：为每组数据生成排名, 具有相同值的行具有相同排名, 后续排名会跳过.
DENSE_RANK()：为每组数据生成排名, 具有相同值的行具有相同排名, 后续排名不会跳过.

示例：

sqlCopyEditSELECT
    employee_id,
    department_id,
    salary,
    ROW_NUMBER() OVER (PARTITION BY department_id ORDER BY salary DESC) AS row_number,
    RANK() OVER (PARTITION BY department_id ORDER BY salary DESC) AS rank,
    DENSE_RANK() OVER (PARTITION BY department_id ORDER BY salary DESC) AS dense_rank
FROM employees;

2. 聚合函数

SUM()、AVG()、COUNT() 等支持窗口计算.

示例：

sqlCopyEditSELECT
    department_id,
    employee_id,
    salary,
    SUM(salary) OVER (PARTITION BY department_id) AS total_salary,
    AVG(salary) OVER (PARTITION BY department_id) AS avg_salary
FROM employees;

3. 平均值计算

例如计算每个销售员的滑动平均销售额, 范围是当前行前一行到当前行后一行, 使用 ROWS BETWEEN AND限定范围.

示例：

SELECT 
    salesman,
    department,
    sales_amount,
    AVG(sales_amount) OVER (
        PARTITION BY department
        ORDER BY id
        ROWS BETWEEN 1 PRECEDING AND 1 FOLLOWING
    ) AS moving_avg
FROM sales;

当没有显示的使用窗口帧时, 默认的窗口帧的定义以及范围如下：

对于带 ORDER BY 的窗口函数, 默认帧是 RANGE BETWEEN UNBOUNDED PRECEDING AND CURRENT ROW. 表示从分区的开头到当前行.
对于不带 ORDER BY 的窗口函数, 默认帧是整个分区(等价于 RANGE BETWEEN UNBOUNDED PRECEDING AND UNBOUNDED FOLLOWING) 表示整个分区的正题.

BUSTUB 中的窗口函数

BUSTUB 中为了实现起来更加简单, 对窗口函数进行了下面的限制:

省略了窗口帧, 无需处理窗口帧, 只需要处理 PARTITION BY 以及 ORDER BY 两种条件.
在同一个使用窗口函数的 SQL 语句中, 多个窗口函数的 ORDER BY 语句一定完全一致. 例如, 下面的 SQL 查询语句不支持

-- ORDER BY 的条件不一致
SELECT SUM(v1) OVER (ORDER BY v1), SUM(v1) OVER (ORDER BY v2) FROM t1;
-- 一个存在 ORDER BY, 一个不存在 ORDER BY
SELECT SUM(v1) OVER (ORDER BY v1), SUM(v2) OVER () FROM t1;

并且测试场景中也不会测试输出的tuples 的顺序, 但是请注意, 当有 ORDER BY 时, 排序是窗口函数正确计算的前提, 因此你必须对元组进行排序. 但最终输出的行顺序无关紧要, 测试用例只关心每一行的值是否正确.
我们可以使用下面的顺序执行窗口函数:
将 tuples 按照 PARTITION BY 进行分组.
根据 ORDER BY 的信息对 tuples 进行排序.
对每一个窗口函数的聚合函数生成一个初始值.
对每一个分组, 将值进行聚合, 类似于 Aggregation .

开始的一些思路与想法

SORT 需要将所有的 tuples 读出来后才可以排序, 但是 Aggregate 可以每读一个 tuple 做一次 Combine(), 那么这里是先将所有的都读出来还是每次读一个呢---> 涉及到分组与排序, 一定是将所有孩子节点的 tuples 全部读出来放到一个数组中用于排序
按道理应该是先分组, 所以一定需要先读取, 然后排序, 并且必须要在分组内进行排序, 应该怎么做呢
需要用到哪些数据结构, 分组应该用到 HashTable, 是和 HashJoin 那样的 HashTable 还是和 Aggregate 那样的 HashTable 呢, 分组之后需要排序, 排序之后又需要计算 Aggregate 的值
每一个窗口函数的 ORDER BY 是相同的, 但是 PARTITION BY 不同, 最大的问题是排序会打乱顺序, 导致不同窗口函数的最后结果的每一行之间无法对齐.
如何处理 PARTITION BY 与 ORDER BY 不同的两个窗口函数, 分组与排序的不同会导致 tuples 的顺序不同, 排序不同后应该如何将两个tuple的结果对应起来, 例如, 在窗口函数A中 tuple_1 的排序后的位置与窗口函数B 中 tuple_1 的位置不同, 每一个窗口函数都维护一个HashTable吗, 相同条件的窗口函数使用同一个HashTable即可, 然后 HashTable 中的 Value 用来记录计算结果

整体思路以及做法

我们以一个测试案例来解释我的 Window Functions 的实现方式, 测试案例如下:

select count(*) over (partition by v1 order by v2), min(v2) over (partition by v1 order by v2), max(v2) over (partition by v1 order by v2), count(v2) over (partition by v1 order by v2), sum(v2) over (partition by v1 order by v2) from t2;

将 window_function 与分组方式进行对应, 因为 BUSTUB 限制了所有 Window Functions 中的 ORDER BY 语句是一样的, 因此可以只按照 PARTITION BY 进行分组对应, 这一步的目的是后续将每个 Windows Functions 需要处理的 Tuples 数组进行分类, 并且找到 Window Function 的 Index 与这些分类的对应关系, 例如, 在上面的例子中, 每个 Windows Functions 的 ORDER BY 与 PARTITION BY 部分都是一样的, 因此只有一个 partition_by_expression_, 并且 window_function_sort_index_[window_functions.first] 均为 0. 我们只需要存储并且排序一个 Tuples 数组即可.
从孩子节点中读取所有的孩子节点的 children_tuples, 使用一个 std::vector<Tuple> 数组存储, 如果这条 SQL 语句中的 Window Functions 中有 ORDER BY 的话, 这些 ORDER BY 都是一样的, 因此选择一个 ORDER BY 进行排序即可. 如果没有 ORDER BY 则不排序. 将排序后的 Tuples 数组记录为 window_function_tuples_, 排序的目的是使最后的输出结果的顺序对齐.
计算非 Window Functions 部分, 也就是 SELECT 部分的 Column 的值, 并且临时存储下来, 上述例子中没有, 例如测试案例 select v1, rank() over (order by v1) from t1; 中的 v1 部分.
将排序后的 Tuple 数组, 按照 partition_by_expression_ 进行排序, 先按照 PARTITION BY 的 Column 排序, 再按照 ORDER BY 的 Column 进行排序, 上述的例子就是先按照 v1 进行排序, 再按照 v2 进行排序, 因为按照 v1 排序, 实际上就可以实现分区的功能了, 再按照 v2 排序, 是按照顺序进行排序的.
使用 tuple_align_index_ 记录 Tuples 数组使用 partition_by_expression_ 排序前后的对应位置, 这是因为 partition_by_expression_ 可能不同, 因此在按照分区以及排序顺序计算的时候, 计算结果的顺序不同, 但是最后的输出需要同一 ROW 的输出应该保持对齐, 因此需要记录排序前后的对应关系, 最后使用对齐的 Tuple Index 下标输出.
对上述 #5 中 partition_by_expression_ 排序后的每一个 Tuple 数组进行分区处理, 因为按照 PARTITION BY 进行排序了, 因此, 可以记录每个分区的起始位置作为分区信息. 这一步是为了提前记录分区信息, 避免后续的重复计算.
对于每一个窗口函数, 使用上述 #5 排序后的 Tuples 数组, 以及 #6 中的分区信息, 计算每一个窗口函数在排序后的值, 计算的结果存储在一个 HashTable 中, 这个 HashTable 的 key 是, {Window Functions Index, Tuple Index}, 其中 Window Function Index 就是窗口函数自身的下标, 也就是 window_function.first, 而 Tuple Index 是这个 Tuple 在 #5 中排序后的 Tuples 数组的下标, 注意是排序后的下标.
根据是否对 Tuples 进行排序, 计算每一个窗口函数的 Aggregate 函数的结果 Values, 因为不排序与排序的计算窗口是不同的, 不排序的时候, 整个分区是一个窗口, 而排序的时候窗口帧是 UNBOUNDED FOLLOWING AND CURRENT ROW. 将计算的结果存储到 #7 中的 HashTable 中. 这一步的目的是计算每个窗口函数对应的 Aggregate 函数的结果.
对于每一个窗口函数可以使用 window_function_sort_index_ 找到对应的排序后的 Tuples 数组, 然后使用 tuple_align_index_ 找到排序前后对应的 Tuple 的下标, 使用这个下标可以对齐每一个窗口函数的 Aggregate 计算结果在未排序的 Tuples 数组的 ROW 号(行号), 然后将前面 #3 中的计算结果与 #8 中 HashTable 中存储的结果进行合并, 得到最终的结果. 存储到一个结果的 Tuples 数组中
在 Next() 中顺序的输出所有的计算结果.

部分过程的具体实现

计算每个 partition_by_expression_ 的分区位置信息

这一步是上述 #7 步骤的实现, 对每一种分区信息提前计算, 一开始我是将这个步骤直接写进了步骤 #8 中的, 拿出来后, 其实只需要计算一次, 让我的 Leaderboard 排名从 66 上升到了 57.

/**
* 计算每个 partition_by_expression_ 对应的分区信息, 对于每个窗口函数可以检索到对应的分区信息
* 对每一个排序之后的 tuple 处理, 获取 aggregate 函数计算的结果, 需要根据窗口和分区进行计算
*  */
std::vector<uint32_t> partition_index;
std::vector<std::vector<uint32_t>> partition_indexs;
for (size_t i = 0; i < partition_by_expression_.size(); i++) {
partition_index.clear();
// 初始化 pre_partition_value 和 partition_value 为 NULL
Value pre_partition_value = ValueFactory::GetNullValueByType(TypeId::INTEGER);
Value partition_value = ValueFactory::GetNullValueByType(TypeId::INTEGER);
for (uint32_t index = 0; index < window_function_tuples_[i].size(); index++) {
    if (index == 0) {
    partition_index.push_back(index);
    if (partition_by_expression_[i].empty()) {
        break;
    }
    // 使用 partition_by_expression_ 计算当前行的 Tuple 对应 Value
    partition_value =
        partition_by_expression_[i][0]->Evaluate(&window_function_tuples_[i][index], this->GetOutputSchema());
    } else {
    if (partition_by_expression_[i].empty()) {
        break;
    }
    // 使用 partition_by_expression_ 计算当前行的 Tuple 对应 Value
    partition_value =
        partition_by_expression_[i][0]->Evaluate(&window_function_tuples_[i][index], this->GetOutputSchema());
    if (partition_value.CompareEquals(pre_partition_value) == CmpBool::CmpFalse) {
        partition_index.push_back(index);
    }
    }
    pre_partition_value = partition_value;
}
partition_indexs.emplace_back(partition_index);
}

计算 Window Functions 的每一个 Aggregate Value 的值

使用 ORDER BY 与未使用 ORDER BY 的窗口函数在计算 Aggregate 函数的时候, 计算窗口是不同的.

// 如果使用了 ORDER_BY, 在每一个分区内计算
for (uint32_t index = 0; index < partition_index.size(); index++) {
    std::vector<Value> order_by_result;
    // 记录分区区间, 在分区区间内计算
    uint32_t start_index = partition_index[index];
    uint32_t end_index = index == partition_index.size() - 1 ? window_function_tuples_[tuple_array_no].size()
                                                            : partition_index[index + 1];
    // 使用 ORDER_BY 的时候, 先计算每一个表达式的值
    for (uint32_t i = start_index; i < end_index; i++) {
    Value value;
    if (window_function.second.type_ == WindowFunctionType::Rank) {
        value = window_function.second.order_by_[0].second->Evaluate(&window_function_tuples_[tuple_array_no][i],
                                                                    this->GetOutputSchema());
    } else {
        value = window_function.second.function_->Evaluate(&window_function_tuples_[tuple_array_no][i],
                                                            this->GetOutputSchema());
    }
    order_by_result.push_back(value);
    }
    // 一个分区内的所有结果一次性计算完, 然后写入到 hash table 中
    aht_.InsertCombineOrderBy(order_by_result, function_index, start_index, end_index, window_function.second.type_);
    order_by_result.clear();
}

/**
* Inserts a partition's values into the hash table, the current aggregation is combined with the previous
* aggregation.
* @param values the values of the window function aggregate result values in this partition
* @param function_index the window function index
* @param start_index the start index of the partition
* @param end_index the end index of the partition
* @param function_type the type of the window function
*/
void InsertCombineOrderBy(const std::vector<Value> &values, uint32_t function_index, uint32_t start_index,
                        uint32_t end_index, WindowFunctionType function_type) {
Value value;
switch (function_type) {
    case WindowFunctionType::CountStarAggregate:
    case WindowFunctionType::CountAggregate:
    // 计算这个分区中 tuple 的个数
    for (uint32_t i = start_index; i < end_index; i++) {
        ht_.insert(
            {WindowFunctionAggregateKey{function_index, i}, {ValueFactory::GetIntegerValue(i - start_index + 1)}});
    }
    break;
    case WindowFunctionType::SumAggregate:
    // 计算这个分区中的窗口函数中的累计和
    value = ValueFactory::GetIntegerValue(0);
    for (uint32_t i = start_index; i < end_index; i++) {
        value = value.Add(values[i - start_index]);
        ht_.insert({WindowFunctionAggregateKey{function_index, i}, {value}});
    }
    break;
    case WindowFunctionType::MinAggregate:
    // 计算这个分区内的最小值
    value = values[0];
    for (uint32_t i = start_index; i < end_index; i++) {
        if (value.CompareGreaterThan(values[i - start_index]) == CmpBool::CmpTrue) {
        value = values[i - start_index];
        }
        ht_.insert({WindowFunctionAggregateKey{function_index, i}, {value}});
    }
    break;
    case WindowFunctionType::MaxAggregate:
    // 计算这个分区内的最大值
    value = values[0];
    for (uint32_t i = start_index; i < end_index; i++) {
        if (value.CompareLessThan(values[i - start_index]) == CmpBool::CmpTrue) {
        value = values[i - start_index];
        }
        ht_.insert({WindowFunctionAggregateKey{function_index, i}, {value}});
    }
    break;
    case WindowFunctionType::Rank:
    // 计算这个分区内的排名
    auto pre_value = values[0];
    uint32_t rank_count = 1;
    ht_.insert({WindowFunctionAggregateKey{function_index, start_index}, {ValueFactory::GetIntegerValue(1)}});
    for (uint32_t i = start_index + 1; i < end_index; i++) {
        // 如果前一个值比当前的值要小, 那么排名就是前一个值的排名
        if (pre_value.CompareLessThan(values[i - start_index]) == CmpBool::CmpTrue) {
        ht_.insert({WindowFunctionAggregateKey{function_index, i}, {ValueFactory::GetIntegerValue(++rank_count)}});
        } else {
        ht_.insert({WindowFunctionAggregateKey{function_index, i}, {ValueFactory::GetIntegerValue(rank_count)}});
        rank_count++;
        }
        pre_value = values[i - start_index];
    }
    break;
}
}

自己实现过程中的一些 BUG

对读入的 Tuples 数组进行排序的时候首先要判断 ORDER BY 语句是否存在, 并且找到对应的 window_function_index

int window_function_index;
for (const auto &window_function : plan_->window_functions_) {
    window_function_index = window_function.first;
}

RANK 类型的窗口函数中直接使用 RANK 的 window_function 表达式返回的结果是一个常数 1, 应该使用对应的 ORDER BY 表达式获取对应的 Column 的 Value, 然后在这个 Value 上进行排序.

// 使用 ORDER_BY 的时候, 先计算每一个表达式的值
for (uint32_t i = start_index; i < end_index; i++) {
Value value;
if (window_function.second.type_ == WindowFunctionType::Rank) {
    value = window_function.second.order_by_[0].second->Evaluate(&window_function_tuples_[tuple_array_no][i],
                                                                this->GetOutputSchema());
} else {
    value = window_function.second.function_->Evaluate(&window_function_tuples_[tuple_array_no][i],
                                                        this->GetOutputSchema());
}
order_by_result.push_back(value);
}

posted @ 2025-01-21 18:28 虾野百鹤阅读(75) 评论(0) 收藏举报

刷新页面返回顶部

行远自迩登高自卑

CMU_15445_P3_Part4

Sort + Limit Executors + Window Functions + Top-N Optimization

Window Functions

窗口函数介绍

窗口函数的特点

窗口函数语法

语法说明：

常见的窗口函数

1. 排名函数

示例：

2. 聚合函数

示例：

3. 平均值计算

示例：

BUSTUB 中的窗口函数

开始的一些思路与想法

整体思路以及做法

部分过程的具体实现

计算每个 partition_by_expression_ 的分区位置信息

计算 Window Functions 的每一个 Aggregate Value 的值

自己实现过程中的一些 BUG

公告

行远自迩 登高自卑

CMU_15445_P3_Part4

Sort + Limit Executors + Window Functions + Top-N Optimization

Window Functions

窗口函数介绍

窗口函数的特点

窗口函数语法

语法说明：

常见的窗口函数

1. 排名函数

示例：

2. 聚合函数

示例：

3. 平均值计算

示例：

BUSTUB 中的窗口函数

开始的一些思路与想法

整体思路以及做法

部分过程的具体实现

计算每个 partition_by_expression_ 的分区位置信息

计算 Window Functions 的每一个 Aggregate Value 的值

自己实现过程中的一些 BUG

公告

行远自迩登高自卑