InnoDB的ibd数据文件为什么比data_length+index_length+data_free的总和还要大?

问题描述:

同事在给jiradb做mysqldump时,发现dump出来的文件只有10MB左右,而ibd文件占用磁盘空间100MB左右。

最初,我们猜测可能是delete操作导致了大量的磁盘碎片,以及二级索引占用了很多空间。

但是对比了data_length+index_length+data_free的总和,与du的输出结果对比,还是相差较多。

版本信息:Server version: 5.6.48-log MySQL Community Server (GPL)

 

概念解释:

data_length:聚集索引所占用的空间,单位是bytes

For MyISAM, DATA_LENGTH is the length of the data file, in bytes.
For InnoDB, DATA_LENGTH is the approximate amount of space allocated for the clustered index, in bytes. Specifically, it is the clustered index size, in pages, multiplied by the InnoDB page size.
Refer to the notes at the end of this section for information regarding other storage engines.

 

index_length:二级索引所占用的空间,单位是bytes

For MyISAM, INDEX_LENGTH is the length of the index file, in bytes.
For InnoDB, INDEX_LENGTH is the approximate amount of space allocated for non-clustered indexes, in bytes. Specifically, it is the sum of non-clustered index sizes, in pages, multiplied by the InnoDB page size.
Refer to the notes at the end of this section for information regarding other storage engines.

 

data_free:已分配但是未使用的空间,单位是bytes

The number of allocated but unused bytes.
InnoDB tables report the free space of the tablespace to which the table belongs. For a table located in the shared tablespace, this is the free space of the shared tablespace. If you are using multiple tablespaces and the table has its own tablespace, the free space is for only that table. Free space means the number of bytes in completely free extents minus a safety margin. Even if free space displays as 0, it may be possible to insert rows as long as new extents need not be allocated.
For NDB Cluster, DATA_FREE shows the space allocated on disk for, but not used by, a Disk Data table or fragment on disk. (In-memory data resource usage is reported by the DATA_LENGTH column.)

参考链接:https://dev.mysql.com/doc/refman/5.6/en/tables-table.html

 

分析过程:

1、首先,抽查占用空间最大的changeitem.ibd,du显示它占用磁盘11268KB

2、analyze table changeitem之后,查询information_schema.tables,得出2637824 + 589824 + 4194304 = 7,421,952 = 7248KB,与du显示的结果相差4020KB

mysql> select data_length,index_length,data_free,table_name from information_schema.tables where table_name='changeitem';
+-------------+--------------+-----------+------------+
| data_length | index_length | data_free | table_name |
+-------------+--------------+-----------+------------+
|     2637824 |       589824 |   4194304 | changeitem |
+-------------+--------------+-----------+------------+
1 row in set (0.00 sec)

3、使用py_innodb_page_info工具(原作者在https://code.google.com/archive/p/david-mysql-tools/中的原版已经找不到了,我在github上找到了其它版本,并做了一点小的修改,下载地址https://github.com/johnliu2008/py_innodb_page_info)分析ibd文件:

[root@localhost ~]# python py_innodb_page_info/py_innodb_page_info.py changeitem.ibd
Total number of page: 704:
Freshly Allocated Page: 527
Insert Buffer Bitmap: 1
File Space Header: 1
B-tree Node: 174
File Segment inode: 1

可以看到,B-tree Node有174个,这个数目是包含了聚集索引页和二级索引页的数量;这个工具的原理,是通过逐个块地扫描ibd文件,通过每个块的page_type值来判断属于什么类型的块,其中用到的innodb_page_type字典,与源码storage/innobase/include/fil0fil.h中的定义一致(题外话:8.0版本新增了SDI页类型值,姜大原版的工具不能支持,我添加了SDI页类型值的字典信息)

Freshly Allocated Page有527个,约‭8,634,368字节,比上一步中查出的data_free‬值要大很多,通过参照官档关于data_free的描述(Free space means the number of bytes in completely free extents minus a safety margin.),知道data_free只是空闲空间的一部分。我好奇的是:InnoDB表的data_free值为什么都是1MB的整数倍?safety margin指的是什么?后面会通过源码分析来解释。

4、看看我们最初的猜测,是不是delete导致了很多磁盘碎片空间呢?

a.通过官档对data_free的解释,可以知道data_free≠碎片空间的容量,而是一种完全空闲的空间

b.实验证明,仅仅insert,也会导致本文标题所描述的现象

因此,关于delete造成碎片空间的假设不成立。

源码中关于data_free的计算方法:

storage/innobase/handler/i_s.cc:i_s_files_table_fill()中,avail_space = fsp_get_available_space_in_free_extents(space());

接着看看fsp_get_available_space_in_free_extents()函数,在storage\innobase\fsp\fsp0fsp.cc中:

 

1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
37
38
39
40
41
42
43
44
45
46
47
48
49
50
51
52
53
54
55
56
57
58
59
60
61
62
63
64
65
66
/** Calculate how many KiB of new data we will be able to insert to the
tablespace without running out of space.
@param[in]  space_id    tablespace ID
@return available space in KiB
@retval UINTMAX_MAX if unknown */
uintmax_t
fsp_get_available_space_in_free_extents(
    ulint   space_id)
{
    FilSpace    space(space_id);
    if (space() == NULL) {
        return(UINTMAX_MAX);
    }
 
    return(fsp_get_available_space_in_free_extents(space));
}
 
/** Calculate how many KiB of new data we will be able to insert to the
tablespace without running out of space. Start with a space object that has
been acquired by the caller who holds it for the calculation,
@param[in]  space       tablespace object from fil_space_acquire()
@return available space in KiB */
uintmax_t
fsp_get_available_space_in_free_extents(
//虽然函数的返回值是以KiB为单位,但是从函数名,大概可以猜到它还是以extent(1MB)为单位的,后续会有其它佐证
    const fil_space_t*  space)
{
    ut_ad(space->n_pending_ops > 0);
 
    ulint   size_in_header = space->size_in_header;
    if (size_in_header < FSP_EXTENT_SIZE) {
        return(0);      /* TODO: count free frag pages and
                    return a value based on that */
    }
 
    /* Below we play safe when counting free extents above the free limit:
    some of them will contain extent descriptor pages, and therefore
    will not be free extents */
    ut_ad(size_in_header >= space->free_limit);
//FSP_FREE_LIMIT:当前尚未初始化的最小Page No。从该Page往后的都尚未加入到表空间的FREE LIST上。 http://mysql.taobao.org/monthly/2016/02/01/
    ulint   n_free_up =
        (size_in_header - space->free_limit) / FSP_EXTENT_SIZE;
 
    page_size_t page_size(space->flags);
    if (n_free_up > 0) {
        n_free_up--;
        n_free_up -= n_free_up / (page_size.physical()
                      / FSP_EXTENT_SIZE);
    }
 
    /* We reserve 1 extent + 0.5 % of the space size to undo logs
    and 1 extent + 0.5 % to cleaning operations; NOTE: this source
    code is duplicated in the function above!
    这就是上面官档中提到的safety margin的解释*/
 
    ulint   reserve = 2 + ((size_in_header / FSP_EXTENT_SIZE) * 2) / 200;
    ulint   n_free = space->free_len + n_free_up;
 
    if (reserve > n_free) {
        return(0);
    }
 
    return(static_cast<uintmax_t>(n_free - reserve)
           * FSP_EXTENT_SIZE * (page_size.physical() / 1024));
// 因为n_free和reserve变量的数据类型是ulint无符号整型,与FSP_EXTENT_SIZE(16K页大小的话是1048576,也就是1MB)相乘之后,得到的数一定会是1MB的整数倍。这就是data_free的值为什么是1MB的整数倍的原因。
}

 

5、通过搜索,在Percona博客找到一篇关于如果计算InnoDB表占用磁盘空间的博文,不过是针对5.7以上版本的,不适用于5.6版本,记录下链接https://www.percona.com/blog/2016/01/26/finding_mysql_table_size_on_disk/,以供参考。

6、查看官档https://dev.mysql.com/doc/refman/5.6/en/innodb-file-per-table-tablespaces.html,其中描述ibd文件增长的步长是4MB,但是实际验证发现,当有足够多空闲空间的时候,ibd文件以小于4MB的步长增长,因为InnoDB分配磁盘空间以extent为单位,所以步长一定是1MB的整数倍:

The innodb_autoextend_increment variable, which defines the increment size for extending the size of an auto-extending system tablespace file when it becomes full, does not apply to file-per-table tablespace files, which are auto-extending regardless of the innodb_autoextend_increment setting. Initial file-per-table tablespace extensions are by small amounts, after which extensions occur in increments of 4MB.

那么,ibd文件的自增长时,为什么会有那么多的空闲空间呢?去看看源码中的fsp_reserve_free_extents()函数:

1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
@param[in]  n_ext       number of extents to reserve
fsp_reserve_free_extents()
.............省略若干行.............
switch (alloc_type) {
case FSP_NORMAL:
        /* We reserve 1 extent + 0.5 % of the space size to undo logs
        and 1 extent + 0.5 % to cleaning operations; NOTE: this source
        code is duplicated in the function below! */
        reserve = 2 + ((size / FSP_EXTENT_SIZE) * 2) / 200;
        if (n_free <= reserve + n_ext) {
            goto try_to_extend;
        }
        break;
//从fseg_create_general()函数和fseg_alloc_free_page_general()函数调用上述函数发现,n_ext参数传的值都是2,reserve = 2 + ((size / FSP_EXTENT_SIZE) * 2) / 200; 说明reserve最少为2,只要当n_free <= 2+2时,就会try_to_extend
.............省略若干行.............

 

结论:

通过上面的分析,我们可以知道:

1、data_free≠碎片空间的容量,而是一种完全空闲的空间,大小是1MB的整数倍

2、ibd文件空闲空间<=4个extents也就是4MB时,就会尝试进行扩展

3、我在官档和源码中,没有找到ibd文件需要自动扩展的原因,但是结合工作经验,我猜测:表空间文件扩展时开销比较大,所以通过预先分配空间,以减少事务在写入时遇到空间不足而临时进行扩展的开销。以前的项目中使用Oracle数据库时,生产环境中的库都会预先把表空间设置得比较大,这样虽然会造成空间浪费,但是对性能友好,通俗地说,就是空间换时间。

 

posted @ 2020-05-12 16:00  Johnliu2008  阅读(1586)  评论(0编辑  收藏  举报