MySQL Distinct 相关源码阅读笔记

Based on MySQL 8.0 community version

Aggregator_distinct

指聚合型DISTINCT,下面为一些example:

SELECT COUNT(DISTINCT)
-- 除了COUNT外,MySQL目前agg DISTINTCT只支持数值类型
SELECT SUM(DISTINCT)
SELECT AVG(DISTINCT)

以前的版本distinct会单独有Item_sum_distinct等这样的Item去实现,但这样的话每个agg function的Item实现都得加个distinct实现了,不大合理。MySQL8.0将非distinct和distinct逻辑抽出来,变成了Aggregator_simple和Aggregator_distinct,服务于继承了Item_sum的所有聚合函数。

class Aggregator_simple : public Aggregator {
 public:
  Aggregator_simple(Item_sum *sum) : Aggregator(sum) {}
  Aggregator_type Aggrtype() override { return Aggregator::SIMPLE_AGGREGATOR; }

  bool setup(THD *thd) override { return item_sum->setup(thd); }
  void clear() override { item_sum->clear(); }
  bool add() override { return item_sum->add(); }
  void endup() override{};
  my_decimal *arg_val_decimal(my_decimal *value) override;
  double arg_val_real() override;
  bool arg_is_null(bool use_null_value) override;
};

从上看 Aggregator_simple 基本只是个调用wrapper,表示非distinct的Item_sum处理,因此直接调用Item_sum的逻辑即可。

在MySQL的实现中一个聚合函数Item_sum的步骤简单就是三步:setup, add, endup。setup在处理之前初始化,add表示每条record的process,endup就是收尾最后计算聚合的结果。

对于distinct来说,去重有两种做法:第一种是维护索引结构存储相关的field作为索引来去重;第二种是持久化成一个磁盘的临时table,并且把相应的单个field或多个field的结果组合成一个新的field,并在上面做hash类型的主键索引(唯一索引)。MySQL主要依赖第一种方式,以索引树的方式存下key,重复的key则拒绝插入;第二种方式仅在COUNT DISTINCT且选择了临时表存储的时候才会采用。

MySQL的去重树的实现是Unique类,内部去重数据结构是红黑树。当tree的数量大于max_elements时会触发flush将内存的tree刷到磁盘的一个文件。单个文件内部可以保证unique,但是跨文件不能保证。最后会用一种merge_sort的方式,对多个文件进行归并遍历从而实现有序去重。

在实现中,COUNT DISTINCT被拎出来单独处理,一方面可能是因为COUNT比较常用,另一方面COUNT有可能被优化节省存储计算。非COUNT 的DISTINCT 在聚合这里主要指类似 SELECT SUM(DISTINCT field)的query。

Aggregator_distinct COUNT操作 setup步骤:

  1. 构建临时表用于持久化存储。临时表的存储引擎在如下setup_tmp_table_handler方法中。判断逻辑如下:

    • select_option若有配置强制指定TMP_TABLE_FORCE_MYISAM,则临时表使用MyISAM引擎持久化,会写盘。
    • use_tmp_disk_storage_engine判断究竟是用disk还是in memory的临时表,基本是force_disk_table参数、opt_initialize(初始化阶段heap engine not ready)、blob与memory table的不兼容等条件判断。
    bool setup_tmp_table_handler(TABLE *table, ulonglong select_options,
                                 bool force_disk_table, bool schema_table);
    
    bool use_tmp_disk_storage_engine(
        TABLE *table, ulonglong select_options, bool force_disk_table,
        enum_internal_tmp_mem_storage_engine mem_engine);
    
  2. 如果临时表是内存引擎(MEMORY/TEMPTABLE),则构建并直接使用去重树,去重树效率会被认为比临时表高。

Aggregator_distinct 非COUNT操作 setup 则是直接构建临时表和去重树。最后实际还是依靠去重树遍历去完成去重和Item_sum计算。

以上不论是否COUNT操作都存在临时表构建后不用来存数据的情况,但代码中依然会依赖临时表table对象来获取Field::is_null()的信息。

SELECT DISTINCT

// EXAMPLE
SELECT DISTINCT a FROM table;
SELECT DISTINCT a, b, c FROM table;
SELECT DISTINCT a.*  FROM tbl_applicant a

// related member From class JOIN
/**
    At construction time, set if SELECT DISTINCT. May be reset to false
    later, when we set up a temporary table operation that deduplicates for us.
   */
  bool select_distinct;

SELECT_DISTINCT有两种方式,以tpch lineitem表举例,假设我们在l_orderkey上有个非unique的二级索引,在l_returnflag字段上没有索引。

![image-20190509161446268](/Users/lhfcws/Library/Application Support/typora-user-images/image-20190509161446268.png)

![image-20190509161510410](/Users/lhfcws/Library/Application Support/typora-user-images/image-20190509161510410.png)

![image-20190509161537787](/Users/lhfcws/Library/Application Support/typora-user-images/image-20190509161537787.png)

如上图,如果distinct的列能被索引覆盖,则会走index;否则会创建临时表。这点和聚合DISTINCT里提到的DISTINCT的两种解决方法依然不违背,mysql index相当于提到的第一种索引的方法,只是无需在处理中额外维护。

在optimize阶段,会先提取select distinct的涉及字段,并尝试判断是否属于同一个索引:

static void add_loose_index_scan_and_skip_scan_keys(JOIN *join,
                                                    JOIN_TAB *join_tab) {
 	...
   else if (join->select_distinct) { /* Collect all query fields referenced in
                                         the SELECT clause. */
    List<Item> &select_items = join->fields_list;
    List_iterator<Item> select_items_it(select_items);
    Item *item;
    while ((item = select_items_it++))
      item->walk(&Item::collect_item_field_processor, Item::WALK_POSTFIX,
                 (uchar *)&indexed_fields);
    cause = "distinct";
  }
  ...
  
  Key_map possible_keys;
  possible_keys.set_all();

  /* Intersect the keys of all group fields. */
  while ((cur_item = indexed_fields_it++)) {
    if (cur_item->used_tables() != join_tab->table_ref->map()) {
      /*
        Doing GROUP BY or DISTINCT on a field in another table so no
        index in this table is usable
      */
      return;
    } else
      possible_keys.intersect(cur_item->field->part_of_key);
    // 此处如果possible_keys返回0,即交集为空,则distinct不能使用index优化。
  }
  ...  
}

单表的distinct操作会转成GROUP BY操作。而GROUP BY 的fields如果在上面的逻辑中没有判断成索引,则会设置need_tmp_before_win=true。该变量会在bool JOIN::make_tmp_tables_info() 函数中作为是否要创建distinct 临时表的开关。创建临时表的逻辑在bool JOIN::create_intermediate_table

// JOIN::test_skip_sort()
/*
        If we are going to use semi-join LooseScan, it will depend
        on the selected index scan to be used.  If index is not used
        for the GROUP BY, we risk that sorting is put on the LooseScan
        table.  In order to avoid this, force use of temporary table.
        TODO: Explain the quick_group part of the test below.
       */
      if ((m_ordered_index_usage != ORDERED_INDEX_GROUP_BY) &&
          (tmp_table_param.quick_group ||
           (tab->emb_sj_nest &&
            tab->position()->sj_strategy == SJ_OPT_LOOSE_SCAN))) {
        need_tmp_before_win = true;
        simple_order = simple_group = false;  // Force tmp table without sort
      }

临时表的创建核心函数是位于sql_tmp_table.cc的create_tmp_table。其中涉及distinct的逻辑如下代码。using_unique_constraint表示需要另外加一列key的hash值用来做distinct,用于distinct的field太大或太多无法做索引的情况。逻辑就是能用distinct field作key的就生成key,否则using_unique_constraint=true生成一个额外的hash_key字段。

// sql_tmp_table.cc:create_tmp_table
// ...
if (group) {
    if (!param->quick_group)
      group = 0;  // Can't use group key
    else
      for (ORDER *tmp = group; tmp; tmp = tmp->next) {
        /*
          marker == MARKER_BIT means two things:
          - store NULLs in the key, and
          - convert BIT fields to 64-bit long, needed because MEMORY tables
            can't index BIT fields.
        */
        (*tmp->item)->marker = Item::MARKER_BIT;
        const uint char_len = (*tmp->item)->max_length /
                              (*tmp->item)->collation.collation->mbmaxlen;
        if (char_len > CONVERT_IF_BIGGER_TO_BLOB)
          using_unique_constraint = true;
      }
    if (group) {
      if (param->group_length >= MAX_BLOB_WIDTH) using_unique_constraint = true;
      distinct = 0;  // Can't use distinct if group key is too large
    }
  }

// ...
 update_hidden:
    /*
      Calculate length of distinct key. The goal is to decide what to use -
      key or unique constraint. As blobs force unique constraint on their
      own due to their length, they aren't taken into account.
    */
	  
    if (distinct && !using_unique_constraint && hidden_field_count <= 0 &&
        new_field) {
      if (new_field->flags & BLOB_FLAG)
        // BLOB的不能做索引
        using_unique_constraint = true;
      else
        //计算总的distinct_key_length用于后面判断是否distinct的key过大。
        distinct_key_length += new_field->pack_length();
    } 

// ...
 /*
    To enforce unique constraint we need to add a field to hold key's hash
    A1) already detected unique constraint
    A2) distinct key is too long
    A3) number of keyparts in distinct key is too big
  */
  if (using_unique_constraint ||               // 1
      distinct_key_length > max_key_length ||  // 2
      (distinct &&                             // 3
       (fieldnr - param->hidden_field_count) > max_key_parts)) {
    using_unique_constraint = true;
  }

GROUPBY 执行

make_group_fields -> alloc_group_fields -> get_end_select_func的调用链中,get_end_select_func会决定end_send函数的实现。end_send函数是JOIN执行到最后一个表的时候的next_select函数。

/**
  @details
  Rows produced by a join sweep may end up in a temporary table or be sent
  to a client. Setup the function of the nested loop join algorithm which
  handles final fully constructed and matched records.
  @return
    end_select function to use. This function can't fail.
*/
Next_select_func JOIN::get_end_select_func() {
  DBUG_ENTER("get_end_select_func");
  /*
     Choose method for presenting result to user. Use end_send_group
     if the query requires grouping (has a GROUP BY clause and/or one or
     more aggregate functions). Use end_send if the query should not
     be grouped.
   */
  if (streaming_aggregation && !tmp_table_param.precomputed_group_by) {
    DBUG_PRINT("info", ("Using end_send_group"));
    DBUG_RETURN(end_send_group);
  }
  DBUG_PRINT("info", ("Using end_send"));
  DBUG_RETURN(end_send);
}

MySQL的JOIN是串行执行的,在evaluate_join_record路径上,每一条record都会递归式调用下一个table的next_select(一般是sub_select)。在最后一个表next_select则会设成end_send;对于非临时表 走索引的GROUPBY则会设成end_send_group

if (end_of_records) {  // 当前表读完 或 这一次读达到了buffer阈值
    enum_nested_loop_state nls =
        (*qep_tab->next_select)(join, qep_tab + 1, end_of_records);
    DBUG_RETURN(nls);
  }

end_send_group里最直白的逻辑是update_sum_func调用,表示聚合函数的add调用(如果有聚合函数的话)。整个思路大致是同一个group内的就直接调用相关的聚合函数的add操作;否则,设置join->seen_first_record = true并更新join->group_fields的值以表示进入了新的group。

临时表(不论是否GROUPBY)则是走sub_select_op->end_write写到一个temptable里(参见create_intermediate_table)。

SELECT DISTINCT的一些优化:
  • distinct on unique field

    bool JOIN::optimize_distinct_group_order() {
      ...
       if (select_distinct &&
            list_contains_unique_index(tab, find_field_in_item_list,
                                       (void *)&fields_list)) {
          select_distinct = 0;
          trace_opt.add("distinct_is_on_unique", true)
              .add("removed_distinct", true);
        }  
      ...
    }
    
  • remove const table(1 row) distinct

  • Optimize distinct when used on a subset of the tables.

    E.g.,: SELECT DISTINCT t1.a FROM t1,t2 WHERE t1.b=t2.b
    In this case we can stop scanning t2 when we have found one t1.a
    
    
    void JOIN::optimize_distinct() {
      // check const table only
      for (int i = primary_tables - 1; i >= 0; --i) {
        QEP_TAB *last_tab = qep_tab + i;
        /**
        select_list_tables: The set of those tables whose fields are referenced in the select list of this select level.
      */
        if (select_lex->select_list_tables & last_tab->table_ref->map()) break;
        last_tab->not_used_in_distinct = true;
      }
    
      /* Optimize "select distinct b from t1 order by key_part_1 limit #" */
      if (order && skip_sort_order) {
        /* Should already have been optimized away */
        DBUG_ASSERT(m_ordered_index_usage == ORDERED_INDEX_ORDER_BY);
        if (m_ordered_index_usage == ORDERED_INDEX_ORDER_BY) {
          order = NULL;
        }
      }
    }
    
posted @ 2019-06-20 00:34  Lhfcws  阅读(834)  评论(0编辑  收藏  举报