innodb索引统计信息

以下分析基于mysql5.6.10

统计信息相关字典表

information_schema.statistics

mysql.innodb_table_stats

mysql.innodb_index_stats

先初始化数据，我们看看这些表里存了些什么

drop table t1;
create table t1(c1 int,c2 int,c3 int,c4 int,
     primary key(c1),
     unique key idx1(c2),
     key idx2(c3,c4));
 
insert into t1 values(1,1,1,1);
insert into t1 values(2,2,1,2);
insert into t1 values(3,3,1,3);
insert into t1 values(4,4,1,4);
insert into t1 values(5,5,2,1);
insert into t1 values(6,6,2,1);

mysql> analyze table t1;
+---------+---------+----------+----------+
| Table   | Op      | Msg_type | Msg_text |
+---------+---------+----------+----------+
| test.t1 | analyze | status   | OK       |
+---------+---------+----------+----------+
1 row in set (0.46 sec)

mysql> show index from t1;
+-------+------------+----------+--------------+-------------+-----------+-------------+----------+--------+------+------------+---------+---------------+
| Table | Non_unique | Key_name | Seq_in_index | Column_name | Collation | Cardinality | Sub_part | Packed | Null | Index_type | Comment | Index_comment |
+-------+------------+----------+--------------+-------------+-----------+-------------+----------+--------+------+------------+---------+---------------+
| t1    |          0 | PRIMARY  |            1 | c1          | A         |           6 |     NULL | NULL   |      | BTREE      |         |               |
| t1    |          0 | idx1     |            1 | c2          | A         |           6 |     NULL | NULL   | YES  | BTREE      |         |               |
| t1    |          1 | idx2     |            1 | c3          | A         |           6 |     NULL | NULL   | YES  | BTREE      |         |               |
| t1    |          1 | idx2     |            2 | c4          | A         |           6 |     NULL | NULL   | YES  | BTREE      |         |               |
+-------+------------+----------+--------------+-------------+-----------+-------------+----------+--------+------+------------+---------+---------------+
4 rows in set (0.00 sec)

mysql> select * from mysql.innodb_table_stats where table_name='t1';
+---------------+------------+---------------------+--------+----------------------+--------------------------+
| database_name | table_name | last_update         | n_rows | clustered_index_size | sum_of_other_index_sizes |
+---------------+------------+---------------------+--------+----------------------+--------------------------+
| test          | t1         | 2013-08-22 21:23:07 |      6 |                    1 |                        2 |
+---------------+------------+---------------------+--------+----------------------+--------------------------+
1 row in set (0.00 sec)

mysql> select * from mysql.innodb_index_stats where table_name='t1';
+---------------+------------+------------+---------------------+--------------+------------+-------------+-----------------------------------+
| database_name | table_name | index_name | last_update         | stat_name    | stat_value | sample_size | stat_description                  |
+---------------+------------+------------+---------------------+--------------+------------+-------------+-----------------------------------+
| test          | t1         | PRIMARY    | 2013-08-22 21:23:07 | n_diff_pfx01 |          6 |           1 | c1                                |
| test          | t1         | PRIMARY    | 2013-08-22 21:23:07 | n_leaf_pages |          1 |        NULL | Number of leaf pages in the index |
| test          | t1         | PRIMARY    | 2013-08-22 21:23:07 | size         |          1 |        NULL | Number of pages in the index      |
| test          | t1         | idx1       | 2013-08-22 21:23:07 | n_diff_pfx01 |          6 |           1 | c2                                |
| test          | t1         | idx1       | 2013-08-22 21:23:07 | n_leaf_pages |          1 |        NULL | Number of leaf pages in the index |
| test          | t1         | idx1       | 2013-08-22 21:23:07 | size         |          1 |        NULL | Number of pages in the index      |
| test          | t1         | idx2       | 2013-08-22 21:23:07 | n_diff_pfx01 |          2 |           1 | c3                                |
| test          | t1         | idx2       | 2013-08-22 21:23:07 | n_diff_pfx02 |          5 |           1 | c3,c4                             |
| test          | t1         | idx2       | 2013-08-22 21:23:07 | n_diff_pfx03 |          6 |           1 | c3,c4,c1                          |
| test          | t1         | idx2       | 2013-08-22 21:23:07 | n_leaf_pages |          1 |        NULL | Number of leaf pages in the index |
| test          | t1         | idx2       | 2013-08-22 21:23:07 | size         |          1 |        NULL | Number of pages in the index      |
+---------------+------------+------------+---------------------+--------------+------------+-------------+-----------------------------------+
11 rows in set (1.51 sec)

其中 show index from t1; 实际上访问的是information_schema.statistics表。

等价于select * from information_schema.statistics where table_name='t1';

统计信息项

我们来试图将统计字典表中的字段和源码中的统计项联系起来，以下是源码中的统计信息项

unsigned n_uniq:10;/*!< number of fields from the beginning
                which are enough to determine an index
                entry uniquely */

ib_uint64_t* stat_n_diff_key_vals;
                /*!< approximate number of different
                key values for this index, for each
                n-column prefix where 1 <= n <=
                dict_get_n_unique(index) (the array is
                indexed from 0 to n_uniq-1); we
                periodically calculate new
                estimates */

    ib_uint64_t*    stat_n_sample_sizes;
                /*!< number of pages that were sampled
                to calculate each of stat_n_diff_key_vals[],
                e.g. stat_n_sample_sizes[3] pages were sampled
                to get the number stat_n_diff_key_vals[3]. */

    ib_uint64_t*    stat_n_non_null_key_vals;
                /* approximate number of non-null key values
                for this index, for each column where
                1 <= n <= dict_get_n_unique(index) (the array
                is indexed from 0 to n_uniq-1); This
                is used when innodb_stats_method is
                "nulls_ignored". */

    ulint       stat_index_size;
                /*!< approximate index size in
                database pages */
    ulint       stat_n_leaf_pages;
                /*!< approximate number of leaf pages in the
                index tree */

根据注释，很容易看出联系，其中

对于 idx1(c1)

n_uniq=1 即c1

stat_n_diff_key_vals[0]=6

对于 idx2(c2,c3)

n_uniq=3 即(c2,c3,c1)

stat_n_diff_key_vals[0]=2 //idx2前缀c2不同的个数

stat_n_diff_key_vals[1]=5 //idx2前缀c2,c3不同的个数

stat_n_diff_key_vals[2]=6 //idx2前缀c2,c3,c1不同的个数

inndb统计信息相关参数

系统参数

innodb_stats_auto_recalc
innodb_stats_method
innodb_stats_on_metadata
innodb_stats_persistent
innodb_stats_persistent_sample_pages
innodb_stats_sample_pages
innodb_stats_transient_sample_pages

参考：http://dev.mysql.com/doc/refman/5.6/en/innodb-parameters.html

表参数，建表是指定

| STATS_AUTO_RECALC [=] {DEFAULT|0|1}
  | STATS_PERSISTENT [=] {DEFAULT|0|1}

参考：http://docs.oracle.com/cd/E17952_01/refman-5.6-en/create-table.html

这里看到统计信息有两种类别persistent和transient，这里先大致介绍下区别，具体实现下见下章节

1 persistent 会将统计信息持久化到mysql.innodb_table_stats ，mysql.innodb_index_stats表中

2 persistent和transient统计算法不同，persistent比transient统计相对精确，当然也更耗时

一些说明：

Innodb有一个后台线程dict_stats_thread，专门用于更新persistent类型的统计信息

innodb_stats_auto_recalc 开启与否只会影响persistent类型的统计。

更新统计信息源码实现

1 transient

btr_cur_open_at_rnd_pos

从根节点页开始，随机取一个记录，再从此记录找到其指向的下层页，从下层也随机取一个记录，这样依次向下层取记录，直到叶子节点页。

每次采样一页读取的页数为B树的深度。

n_diff

通过cmp_rec_rec_with_match比较页内前后记录后，后面不匹配的列都认为diff，记入n_diff

total_external_size

见后面章节

2 persistent类型

相关函数dict_stats_update_persistent

Persistent和transient统计不同的地方在于stat_n_diff_key_vals的计算

dict_stats_analyze_index关键代码如下

   for (n_prefix = n_uniq; n_prefix >= 1; n_prefix--) {
       
        /* Commit the mtr to release the tree S lock to allow
        other threads to do some work too. */

        mtr_commit(&mtr);
        mtr_start(&mtr);
        mtr_s_lock(dict_index_get_lock(index), &mtr);

        if (root_level != btr_height_get(index, &mtr)) { 
            break;
        } 

        if (level_is_analyzed
            && (n_diff_on_level[n_prefix - 1] >= N_DIFF_REQUIRED(index)
            || level == 1)) {
            goto found_level;
        }
         /* search for a level that contains enough distinct records */

        if (level_is_analyzed && level > 1) {
            /* if this does not hold we should be on
            "found_level" instead of here */
            ut_ad(n_diff_on_level[n_prefix - 1]
                  < N_DIFF_REQUIRED(index));

            level--;
            level_is_analyzed = false;
        }
        /* descend into the tree, searching for "good enough" level */
        for (;;) {
            /* make sure we do not scan the leaf level
            accidentally, it may contain too many pages */
            ut_ad(level > 0);

            /* scanning the same level twice is an optimization bug */

            ut_ad(!level_is_analyzed);
 
            /* Do not scan if this would read too many pages.
            Here we use the following fact:
            the number of pages on level L equals the number
            of records on level L+1, thus we deduce that the
            following call would scan total_recs pages, because
            total_recs is left from the previous iteration when
            we scanned one level upper or we have not scanned any
            levels yet in which case total_recs is 1. */

            if (total_recs > N_SAMPLE_PAGES(index)) {
                /* if the above cond is true then we are
                not at the root level since on the root
                level total_recs == 1 (set before we
                enter the n-prefix loop) and cannot
                be > N_SAMPLE_PAGES(index) */

                ut_a(level != root_level); 

                /* step one level back and be satisfied with
                whatever it contains */

                level++;
                level_is_analyzed = true;

                break;
            }

            dict_stats_analyze_index_level(index,level,n_diff_on_level,&total_recs,&total_pages,n_diff_boundaries,&mtr); 

            level_is_analyzed = true; 

            if (n_diff_on_level[n_prefix - 1]
                >= N_DIFF_REQUIRED(index)
                || level == 1) {
                /* we found a good level with many distinct
                records or we have reached the last level we
                could scan */
                break;
            }
            level--;
            level_is_analyzed = false;
        }
found_level:         
        dict_stats_analyze_index_for_n_prefix(
            index, level, total_recs, n_prefix,
            n_diff_on_level[n_prefix - 1],
            &n_diff_boundaries[n_prefix - 1], &mtr);
    }

上面的逻辑分两个阶段

1 从根节点开始向下查找到合适的B树的某层L，条件为

if (total_recs > N_SAMPLE_PAGES(index))

L不可以是叶子层

2 从层L开始，将n_diff_boundaries分为N段，N为采样数。分别从N个段中，随机取N个记录，这N个记录依次向下层查找到合适的采样页。这个采样页不一定时叶子页。

分解一下：

1 dict_stats_analyze_index_level函数获取以下值

n_diff：本层不同值个数

total_recs：本层总记录数

total_pages：本层总页数

n_diff_boundaries：出现不同值的边界位置数组，数组长度为n_diff，0<= n_diff_boundaries<total_recs-1

2 为何是递减循环

for (n_prefix = n_uniq; n_prefix >= 1; n_prefix--)

为了dict_stats_analyze_index_level不从头根层开始向下分析，而是从当前层开始

3 分段采样

dict_stats_analyze_index_for_n_prefix

分段数

n_recs_to_dive_below = ut_min(N_SAMPLE_PAGES(index),

                      n_diff_for_this_prefix);

取分段中随机记录

left = n_diff_for_this_prefix * i / n_recs_to_dive_below;
        right = n_diff_for_this_prefix * (i + 1)
            / n_recs_to_dive_below - 1; 
rnd = ut_rnd_interval(0, (ulint) (right - left));

从随机记录开始向下取样，统计样页的n_diff,样页不一定是叶子页

dict_stats_analyze_index_below_cur

4 样页不一定是叶子页

当当前层的页的记录的n_pre都一致时，下层页也应一致。因此不需再向下层取样。

何时更新统计信息

1 create table/truncate table 会初始化统计信息

2 open table

如果设置innodb_stats_persistent，先从统计字典表中读取，如果读取不到则通过persistent方式更新统计信息

否则通过transient方式更新统计信息

3 analyze table

如果设置innodb_stats_persistent，则通过persistent方式更新统计信息，否则通过transient方式更新统计信息

4 数据发生变化

表距离上一次更新统计信息，发生变化的行数超过当前行数1/16时，通过transient方式更新统计信息

表距离上一次更新统计信息，发生变化的行数超过当前行数%10，且设置了innodb_stats_auto_recalc和innodb_stats_persistent，通过persistent方式更新统计信息

何时使用统计信息

MRR http://dev.mysql.com/doc/refman/5.6/en/mrr-optimization.html

ROR(RowidOrderedRetrieva)

GROUP

优化是会用到，相关函数如下，这块后续深入下

multi_range_read_info_const

ror_scan_selectivity

cost_group_min_max

需优化的地方

1 stat_n_leaf_pages，stat_index_size的统计

Btr_get_size取到的是ret，而不是used.

它们的区别是ret:segment中的所有页，包括一些free页

User: segment中已使用的页

ret >used

Free页即不用于B树页，也不用于externer页，因此stat_index_size不应包括free页。这应该算是个bug吧。应用used统计。

*used = mtr_read_ulint(inode + FSEG_NOT_FULL_N_USED, MLOG_4BYTES, mtr)
        + FSP_EXTENT_SIZE * flst_get_len(inode + FSEG_FULL, mtr)
        + fseg_get_n_frag_pages(inode, mtr);
 
 ret = fseg_get_n_frag_pages(inode, mtr)
        + FSP_EXTENT_SIZE * flst_get_len(inode + FSEG_FREE, mtr)
        + FSP_EXTENT_SIZE * flst_get_len(inode + FSEG_NOT_FULL, mtr)
        + FSP_EXTENT_SIZE * flst_get_len(inode + FSEG_FULL, mtr);

2 关于external page

读了这段代码发现，发现external page 属于leaf segment;

total_external_size
                += btr_rec_get_externally_stored_len(
                    rec, offsets_rec);

external page 和叶子页公用一个segment, external page和叶子页在同一个extent内混合出现，让叶子页在物理上更离散。

1 会影响只查非大字段查询

2 在某些情况下MRR显得很无力

3 统计信息计算external page导致统计偏差

解决方法：external page可以用单独的segment管理

3 关于persistent采样

transient 采样比较简单，但过于随机，极端情况下会出现采集到同一页的情况。

Persistent 方式做到了尽量采集不同的值，并且不会出现采集到同一页的情况。Persistent 方式会读取b树前N(N>0,即不会统计叶子层) 层所有页和记录。

假设100条记录的分布如下

91个0 加上 1 2 3 4 5 6 7 8 9

假设采样10页

按Persistent逻辑采样记录为 0 1 2 3 4 5 6 7 8 9 n_diff=9

按transient逻辑采样很大可能结果为 9个0 加 1 n_diff=2

显然 Persistent会统计比transient统计要精确。

n_recs_to_dive_below = ut_min(N_SAMPLE_PAGES(index),
                      n_diff_for_this_prefix);

对于n_diff_for_this_prefix< N_SAMPLE_PAGES(index),这时候的采样数为n_diff_for_this_prefix，采样数过少，会导致偏差。

此时应仍然采集N_SAMPLE_PAGES(index)个页，换以下统计方式

1 可以统计n_diff_boundaries之间的区间大小，因为n_diff_for_this_prefix较小，所以这个统计成本较小

2 按区间比例来分配，区间越小，采样的页数相对应更多

3 总共采集N_SAMPLE_PAGES(index)个页

4 以上纯属YY

posted @ 2013-08-22 21:21 sstrive 阅读(1140) 评论(0) 编辑收藏举报

刷新页面返回顶部