定义：

联接算法是Mysql数据库用于处理联接的物理策略，目前MySQL只支持nest loop join算法，而MySQL的分支版本MariaDB除了支持nested-loops Join 还支持 Classic Hash Join算法。

根据B+树的特性，其联接的时间复杂度为 O(N),若没有索引，则视为最坏的情况，时间复杂度为 O(N*N) 。

根据不同的使用场合，支持两种 Nested-Loops-Join算法，一直是simple Nested-Loops-Join（NLJ），一种是 Block Nested-Loops-Join（BNL）。

（1）simple Nested-Loops-Join（NLJ）算法：

从第一张表中每次读取一条记录，然后将记录与嵌套表中的记录进行比较，算法如下：

SELECT * FROM R INNER JOIN S ON P(R,S) ;

For each row r in R do
        For each row s in S do
            if r and s satified the join condition
                then output the tuple<r,s>
 

R代表外部表，S代表内部表。时间复杂度O(R*S)。

但是S内部表还有索引时，simple Nested-Loops-Join 算法可以利用索引的特性来进行快速的匹配，此时算法如下：

For each row r in R do
        lookup r in S index
            if found s == r
                then output the tuple<r,s>

对于联接的列含有索引的情况，外部表不再需要扫描整张内部表，只需要扫描内部表上的索引即可得到联接的判断结果。如果内部表联接列索引高度为S，一般B+树的高度为3-4层，所以内部表有索引的情况下，时间复杂度为O(R)。S高度可忽略。

在INNER JOIN中，两张连接表的顺序是可以变换的，即R INNER JOIN S ON condition 等效于 S INNER JOIN R ON condition，根据前面描述的simple Nested-Loops-Join算法，优化器一般情况下，总是选择将联接列含有索引的表作为内部表，如果两张表联接列均有索引，并且索引高度相同，则选取记录数少的表作为外部表，这是因为内部表的扫描次数总是索引的高度，与记录数无关。外部表记录数少，减少循环次数。

SELECT * FROM R INNER JOIN S ON P1(R,S) WHERE P2(R,S);

 
For each row r in R do
        For each row s in S do
            if r and s satified the join condition
                if P2(R,S)
                    then output the tuple<r,s>

在5.6中，MySQL执行计划新增 index condition pushdown 特性来优化检索，算法如下：

P2(R,S) = C1(R) & C2(S)
For each row r in R such thar（C1）do
        lookup r in S index such thar（C2）
            if found s == r
                then output the tuple<r,s>

ICP使用限制：

查询是否启动ICP： show variables like 'optimizer_switch' ;

当sql需要全表访问时,ICP的优化策略可用于range, ref, eq_ref, ref_or_null 类型的访问数据方法。
支持InnoDB和MyISAM表。
ICP只能用于二级索引，不能用于主索引。
并非全部where条件都可以用ICP筛选。如果where条件的字段不在索引列中,还是要读取整表的记录到server端做where过滤。
ICP的加速效果取决于在存储引擎内通过ICP筛选掉的数据的比例。
5.6 版本的不支持分表的ICP 功能，5.7 版本的开始支持。
当sql 使用覆盖索引时，不支持ICP 优化方法。

（2）Block Nested-Loops Join(BNL) 算法:

simple Nested-Loops-Join算法在内部表循环时，有索引的情况，每行外部表记录只需读取内部表一次，所以时间复杂度为O（R），若没有索引的情况下为O（R*S），执行效率会非常差，而Block Nested-Loop 算法针对没有索引的连接情况设计的，其使用Join Buffer（联接缓冲）来减少内部表的扫描次数。

Block Nested-Loop 算法先把对外部表每次读取10行记录，放入Join buffer，然后在内部表中直接匹配这10行数据，因此读内部表扫描减少1/10。

MySQL使用 join Buffer的原则如下：

系统变量join_buffer_size 决定了Join Buffer的大小。
Join Buffer可被用于连接是ALL,index，range 的类型。
每次联接使用一个 Join Buffer，因此多表的联接可以使用多个 Join Buffer。
Join Buffer在联接发生之前进行分配，在SQL语句执行完后释放。
Join Buffer只存储需要进行查询操作的相关列数据，而不是整行记录。
Join Buffer在MySQL5.5不支持OUTER JOIN，5.6支持OUTER JOIN

算法伪代码如下：

for each row in R matching range
   store used columns from R in join buffer
   if buffer is full
        for each row in S
            for each S conbination in join buffer
                if row satisfied join conditions, send to client
       empty buffer
    if buffer is not empty
        for each row in S
            for each S conbination in join buffer
                if row satisfied join conditions, send to client 

验证Simple Nested-Loops：

CREATE TABLE `employees` (
  `emp_no` int(11) NOT NULL,
  `birth_date` date NOT NULL,
  `first_name` varchar(14) NOT NULL,
  `last_name` varchar(16) NOT NULL,
  `gender` enum('M','F') NOT NULL,
  `hire_date` date NOT NULL,
  PRIMARY KEY (`emp_no`),
  KEY `idx_employees_first_last_name` (`first_name`,`last_name`)
) ENGINE=InnoDB DEFAULT CHARSET=latin1 ;
 
CREATE TABLE `titles` (
  `emp_no` int(11) NOT NULL,
  `title` varchar(50) NOT NULL,
  `from_date` date NOT NULL,
  `to_date` date DEFAULT NULL,
  PRIMARY KEY (`emp_no`,`title`,`from_date`),
  KEY `emp_no` (`emp_no`),
  CONSTRAINT `titles_ibfk_1` FOREIGN KEY (`emp_no`) REFERENCES `employees` (`emp_no`) ON DELETE CASCADE
) ENGINE=InnoDB DEFAULT CHARSET=latin1 ;

root@localhost [employees]> explain select * from employees  e inner join titles t on e.emp_no = t.emp_no ;

+----+-------------+-------+------+----------------+---------+---------+--------------------+--------+-------+
| id | select_type | table | type | possible_keys  | key     | key_len | ref                | rows   | Extra |
+----+-------------+-------+------+----------------+---------+---------+--------------------+--------+-------+
|  1 | SIMPLE      | e     | ALL  | PRIMARY        | NULL    | NULL    | NULL               | 299423 | NULL  |
|  1 | SIMPLE      | t     | ref  | PRIMARY,emp_no | PRIMARY | 4       | employees.e.emp_no |       1| NULL  |
+----+-------------+-------+------+----------------+---------+---------+--------------------+--------+-------+
2 rows in set (0.06 sec)
 
root@localhost [employees]> explain select * from titles t inner join employees e on t.emp_no = e.emp_no ;
+----+-------------+-------+------+----------------+---------+---------+--------------------+--------+-------+
| id | select_type | table | type | possible_keys  | key     | key_len | ref                | rows   | Extra |
+----+-------------+-------+------+----------------+---------+---------+--------------------+--------+-------+
|  1 | SIMPLE      | e     | ALL  | PRIMARY        | NULL    | NULL    | NULL               | 299423 | NULL  |
|  1 | SIMPLE      | t     | ref  | PRIMARY,emp_no | PRIMARY | 4       | employees.e.emp_no |      1 | NULL  |
+----+-------------+-------+------+----------------+---------+---------+--------------------+--------+-------+
2 rows in set (0.00 sec)
 
root@localhost [employees]> select count(*) from employees ;
+----------+
| count(*) |
+----------+
|   300025 |
+----------+
1 row in set (0.23 sec)
 
root@localhost [employees]> select count(*) from titles ;
+----------+
| count(*) |
+----------+
|   443308 |
+----------+
1 row in set (0.31 sec)

验证Block Nested-Loops：

--create table  emp_noindex
CREATE TABLE `emp_noindex` (
  `emp_no` int(11) NOT NULL,
  `birth_date` date NOT NULL,
  `first_name` varchar(14) NOT NULL,
  `last_name` varchar(16) NOT NULL,
  `gender` enum('M','F') NOT NULL,
  `hire_date` date NOT NULL
) ENGINE=InnoDB DEFAULT CHARSET=latin1 ;
 
--create table  title_noindex
CREATE TABLE `title_noindex` (
  `emp_no` int(11) NOT NULL,
  `title` varchar(50) NOT NULL,
  `from_date` date NOT NULL,
  `to_date` date DEFAULT NULL
) ENGINE=InnoDB DEFAULT CHARSET=latin1 ;

-- explain inner join
root@localhost [employees]> explain select * from emp_noindex e inner join title_noindex t on e.emp_no = t.emp_no ;
+----+-------------+-------+------+---------------+------+---------+------+--------+----------------------------------------------------+
| id | select_type | table | type | possible_keys | key  | key_len | ref  | rows   | Extra                                              |
+----+-------------+-------+------+---------------+------+---------+------+--------+----------------------------------------------------+
|  1 | SIMPLE      | e     | ALL  | NULL          | NULL | NULL    | NULL | 299088 | NULL                                               |
|  1 | SIMPLE      | t     | ALL  | NULL          | NULL | NULL    | NULL | 442806 | Using where; Using join buffer (Block Nested Loop) |
+----+-------------+-------+------+---------------+------+---------+------+--------+----------------------------------------------------+
2 rows in set (0.00 sec)
 
-- explain left outer join
root@localhost [employees]> explain select * from emp_noindex e left outer join title_noindex t on e.emp_no = t.emp_no ;
+----+-------------+-------+------+---------------+------+---------+------+--------+----------------------------------------------------+
| id | select_type | table | type | possible_keys | key  | key_len | ref  | rows   | Extra                                              |
+----+-------------+-------+------+---------------+------+---------+------+--------+----------------------------------------------------+
|  1 | SIMPLE      | e     | ALL  | NULL          | NULL | NULL    | NULL | 299088 | NULL                                               |
|  1 | SIMPLE      | t     | ALL  | NULL          | NULL | NULL    | NULL | 442806 | Using where; Using join buffer (Block Nested Loop) |
+----+-------------+-------+------+---------------+------+---------+------+--------+----------------------------------------------------+
2 rows in set (0.00 sec)

（3）Batched Key Access Join算法

从前两种Join算法，要想加快Join的执行速度，有如下两种方式：

加快每次search-for-match（查询比较）操作的速度。
search-for-match根据group（组）来进行，减少对内部表的访问次数。

第一种就是通过用户添加索引，让优化器选择索引来进行search-for=match操作，第二种方式就是采用Block Nested-Loops算法思想，使用组来进行匹配，减少内部表访问次数。在MySQL5.6中开始支持Batched Key Access Join算法（BKA），我们可以理解为group-index-lookup（组索引查询）。

工作步骤：

将外部表中相关列放入Join Buffer；
批量将Key发送到multi-range read 接口；
multi-range read（MRR）通过收到的Key，根据对应rowID进行排序，然后进行过数据的读取操作； --不是特别理解MRR所做的处理
返回数据给客户端。

（4）Classic Hash Join算法

MySQL不支持Hash Join，为MariaDB 5.3开始支持的算法。

classic hash join算法同样使用Join Buffer ，先将外部表中数据放入 join buffer中，然后根据键值产生一张散列表，这是第一阶段，称为build阶段。
随后读取内部表的一条数据，对其应用散列函数，将其和散列表的数据比较，这是第二阶段，称为probe阶段。

Hash Join 只能适用与等值联接，因为已通过散列函数生成新的联接值，不能将Hash Join 用于非等值的联接操作中。

posted on 2015-10-16 15:34 青尘居士阅读(319) 评论(0) 编辑收藏举报

刷新页面返回顶部

导航

定义：