Facebook对MySQL全表扫描性能的改进
原文博客如下:
http://yoshinorimatsunobu.blogspot.com/2013/10/making-full-table-scan-10x-faster-in.html
如下是读博客做的笔记:
Facebook对全表扫描的需求场景:
1.logical backups(eg: mysqldump);
2.online schema changes(eg: SELECT... INTO OUTFILE);
Facebook 使用logical backups(vs physical backup)的原因:
1.size is mush smaller(3x ~10 x);
2.eaier to parse backups;
drawbacks of logical backup:
full backup and full restore are much slower than physical backup/restore.
如果数据库大时,表一般都是fragmented,此时InnoDB的全表扫描是随机读而非顺序读。
Logical Readahead即为实现InnoDB全表扫描时的顺序读。
给出的性能提升数据:
1. 9~10 times faster under usual production workloads
2. 15~20 times faster under heavy production workloads
在HDD上效果明显。
InnoDB已经支持的prefetching feature "Linear Read Ahead"对提升全表扫描的性能并没有什么帮助,因为Linear Read Ahead是按照磁盘物理的page顺序做prefetch(默认64个page,这个范围太小了)。而稍大些的fragmented tables,逻辑上相邻的两个page在物理上距离一般会更大(文章中举例 page 3 -> page 5230 -> page 4).
Logical Read Ahead基本原理:
根据primary key的B+树的branch节点记录的leaf page number信息,按照page的顺序(大部分转化为顺序读)将很多leaf节点的数据读入buffer pool,提升后续全表扫描时buffer pool的命中率,进而避免随机读。
实现思路(步骤):
1.Reading many branch pages of the primary key
2.Collecting leaf page numbers
3.Reading many (configurable amount of) leaf pages by page number order (mostly sequential disk reads)
4.Reading rows by primary key order (same as usual full table scan, but buffer pool hit rate should be very high)
提供两个新的session variables:
1. innodb_lra_size; (the size of prefetch leaf pages)
2. innodb_lra_sleep; (???做啥用的???)
在需要使用LRA时将这个特性打开。
Submitting multiple async I/O requests at once
In 5.6, InnoDB开始默认使用aio。将多个连续的16KB读取一次submit,可以利用Linux对多个请求的merge,提高效率。
在HDD Raid 1+0的条件下,作者每次提交64个连续页面的读取请求,将读取效率由160MB/s提升至1000MB/s.
测试数据如下:
1. Full table scan on pure HDD (basic benchmark, no other workload)
Table size | Without LRA | With LRA | Improvement |
10GB | 10 min 55.00 sec | 1 min 15.20 sec | 8.71x |
50GB | 52 min 44.42 sec | 6 min 36.98 sec | 7.97x |
2. Online schema change under heavy workload
Table size | Without LRA | With LRA | Improvement |
1GB | 7 min 33 sec | 24 sec | 18.8x |
关于上述改进的三处改动都给了patch。