Mysql优化_第十三篇（HashJoin篇）

Mysql优化_第十三篇（HashJoin篇）
- 1 适用场景

1 适用场景

纯等值查询，不能使用索引

从MYSQL 8.0.18开始，MYSQL实现了对于相等条件下的HASHJOIN，并且，join条件中无法使用任何索引，比如下面的语句：

SELECT *
    FROM t1
    JOIN t2
        ON t1.c1=t2.c1;

等值查询，使用到索引

当然，如果有一个或者多个索引可以适用于单表谓词，hash join也可以使用到。（这句话不是很懂？原句为：A hash join can also be used when there are one or more indexes that can be used for single-table predicates.

相对于Blocked Nested Loop Algorithm，以下简称BNL，hash join性能更高，并且两者的使用场景相同，所以从8.0.20开始，BNL已经被移除。使用hash join替代之。

通常在EXPLAIN的结果里面，在Extra列，会有如下描述：

Extra: Using where; Using join buffer (hash join)

说明使用到了hash join。

多个join条件中至少包含一个等值查询（可以包含非等值）

虽然hash join适用于等值join，但是，从原则上来讲，在多个join条件中，只要有每对join条件中，至少存在一个等值，Mysql就可以使用到hash join来提升速度，比如下面的语句：

SELECT * FROM t1
    JOIN t2 ON (t1.c1 = t2.c1 AND t1.c2 < t2.c2)  该语句包含非等值的join条件
    JOIN t3 ON (t2.c1 = t3.c1);

EXPLAIN FORMAT=TREE的结果如下：

EXPLAIN: -> Inner hash join (t3.c1 = t1.c1)  (cost=1.05 rows=1)
    -> Table scan on t3  (cost=0.35 rows=1)
    -> Hash
        -> Filter: (t1.c2 < t2.c2)  (cost=0.70 rows=1)
            -> Inner hash join (t2.c1 = t1.c1)  (cost=0.70 rows=1)
                -> Table scan on t2  (cost=0.35 rows=1)
                -> Hash
                    -> Table scan on t1  (cost=0.35 rows=1)

多个join条件对中完全没有等值查询（从8.0.20开始）

在Mysql8.0.20之前，如果join条件中有任何一个条件没有包含等值，那么BNL就会被应用，但是从8.0.20开始，hash join也可以应用到下面的语句：

mysql> EXPLAIN FORMAT=TREE
    -> SELECT * FROM t1
    ->     JOIN t2 ON (t1.c1 = t2.c1)
    ->     JOIN t3 ON (t2.c1 < t3.c1)\G   该join条件不包含等值，会作为filter来使用
*************************** 1. row ***************************
EXPLAIN: -> Filter: (t1.c1 < t3.c1)  (cost=1.05 rows=1)
    -> Inner hash join (no condition)  (cost=1.05 rows=1)
        -> Table scan on t3  (cost=0.35 rows=1)
        -> Hash
            -> Inner hash join (t2.c1 = t1.c1)  (cost=0.70 rows=1)
                -> Table scan on t2  (cost=0.35 rows=1)
                -> Hash
                    -> Table scan on t1  (cost=0.35 rows=1)

笛卡尔积

当然，也可以适用于笛卡尔积（没有指定join条件）：

mysql> EXPLAIN FORMAT=TREE
    -> SELECT *
    ->     FROM t1
    ->     JOIN t2
    ->     WHERE t1.c2 > 50\G
*************************** 1. row ***************************
EXPLAIN: -> Inner hash join  (cost=0.70 rows=1)
    -> Table scan on t2  (cost=0.35 rows=1)
    -> Hash
        -> Filter: (t1.c2 > 50)  (cost=0.35 rows=1)  where条件提早过滤
            -> Table scan on t1  (cost=0.35 rows=1)

普通inner join完全没有等值

mysql> EXPLAIN FORMAT=TREE SELECT * FROM t1 JOIN t2 ON t1.c1 < t2.c1\G
*************************** 1. row ***************************
EXPLAIN: -> Filter: (t1.c1 < t2.c1)  (cost=4.70 rows=12)  //join条件变成了filter
    -> Inner hash join (no condition)  (cost=4.70 rows=12)
        -> Table scan on t2  (cost=0.08 rows=6)
        -> Hash
            -> Table scan on t1  (cost=0.85 rows=6)

Semijoin（Mysql文档EXPLAIN有误，这里更正下）

mysql> EXPLAIN FORMAT=TREE SELECT * FROM t1 
    ->     WHERE t1.c1 IN (SELECT t2.c2 FROM t2)\G
*************************** 1. row ***************************
| -> Filter: (t1.c1 < t2.c1)  (cost=0.70 rows=1)
    -> Inner hash join (no condition)  (cost=0.70 rows=1)
        -> Table scan on t2  (cost=0.35 rows=1)
        -> Hash
            -> Table scan on t1  (cost=0.35 rows=1)
 |

Antijoin（Mysql文档EXPLAIN有误，这里更正下）

mysql> EXPLAIN FORMAT=TREE SELECT * FROM t2 
    ->     WHERE NOT EXISTS (SELECT * FROM t1 WHERE t1.col1 = t2.col1)\G
*************************** 1. row ***************************
| -> Hash antijoin (t1.c1 = t2.c2)  (cost=0.70 rows=1)
    -> Table scan on t2  (cost=0.35 rows=1)
    -> Hash
        -> Table scan on t1  (cost=0.35 rows=1)
 |

Left outer join

mysql> EXPLAIN FORMAT=TREE SELECT * FROM t1 LEFT JOIN t2 ON t1.c1 = t2.c1\G
*************************** 1. row ***************************
EXPLAIN: -> Left hash join (t2.c1 = t1.c1)  (cost=3.99 rows=36)
    -> Table scan on t1  (cost=0.85 rows=6)
    -> Hash
        -> Table scan on t2  (cost=0.14 rows=6)

Right outer join(MYSQL会把所有的右外连接转换为左外连接)：

mysql> EXPLAIN FORMAT=TREE SELECT * FROM t1 RIGHT JOIN t2 ON t1.c1 = t2.c1\G
*************************** 1. row ***************************
EXPLAIN: -> Left hash join (t1.c1 = t2.c1)  (cost=3.99 rows=36)
    -> Table scan on t2  (cost=0.85 rows=6)
    -> Hash
        -> Table scan on t1  (cost=0.14 rows=6)

秋风五丈原

Mysql优化_第十三篇（HashJoin篇）