MySQL 8.0 Reference Manual(读书笔记47节--Optimizing SELECT Statements(2))
3.Index Merge Optimization
The Index Merge access method retrieves rows with multiple range scans and merges【ˈmɜːrdʒɪz (使)合并,结合,并入;融入;相融;渐渐消失在某物中;】 their results into one. This access method merges index scans from a single table only, not scans across multiple tables. The merge can produce unions, intersections, or unions-of-intersections of its underlying scans.
Example queries for which Index Merge may be used:
SELECT * FROM tbl_name WHERE key1 = 10 OR key2 = 20; SELECT * FROM tbl_name WHERE (key1 = 10 OR key2 = 20) AND non_key = 30; SELECT * FROM t1, t2 WHERE (t1.key1 IN (1,2) OR t1.key2 LIKE 'value%') AND t2.key1 = t1.some_col; SELECT * FROM t1, t2 WHERE t1.key1 = 1 AND (t2.key1 = t1.some_col OR t2.key2 = t1.some_col2);
【 The Index Merge optimization algorithm has the following known limitations:
• If your query has a complex WHERE clause with deep AND/OR nesting and MySQL does not choose the optimal plan, try distributing terms using the following identity transformations:
(x AND y) OR z => (x OR z) AND (y OR z) (x OR y) AND z => (x AND z) OR (y AND z)
• Index Merge is not applicable to full-text indexes. 】
In EXPLAIN output, the Index Merge method appears as index_merge in the type column. In this case, the key column contains a list of indexes used, and key_len contains a list of the longest key parts for those indexes.
The Index Merge access method has several algorithms, which are displayed in the Extra field of EXPLAIN output:
• Using intersect(...) • Using union(...) • Using sort_union(...)
The following sections describe these algorithms in greater detail. The optimizer chooses between different possible Index Merge algorithms and other access methods based on cost estimates of the various available options.
3.1 Index Merge Intersection Access Algorithm
This access algorithm is applicable when a WHERE clause is converted to several range conditions on different keys combined with AND, and each condition is one of the following:
• An N-part expression of this form, where the index has exactly N parts (that is, all index parts are covered):
key_part1 = const1 AND key_part2 = const2 ... AND key_partN = constN
• Any range condition over the primary key of an InnoDB table.
Examples:
SELECT * FROM innodb_table WHERE primary_key < 10 AND key_col1 = 20; SELECT * FROM tbl_name WHERE key1_part1 = 1 AND key1_part2 = 2 AND key2 = 2;
The Index Merge intersection【ˌɪntərˈsekʃn 十字路口;相交;交点;交叉;交叉路口;横断;】 algorithm performs simultaneous scans on all used indexes and produces the intersection of row sequences that it receives from the merged index scans.
If all columns used in the query are covered by the used indexes, full table rows are not retrieved (EXPLAIN output contains Using index in Extra field in this case). Here is an example of such a query:
SELECT COUNT(*) FROM t1 WHERE key1 = 1 AND key2 = 1;
If the used indexes do not cover all columns used in the query, full rows are retrieved only when the range conditions for all used keys are satisfied.
If one of the merged conditions is a condition over the primary key of an InnoDB table, it is not used for row retrieval, but is used to filter out rows retrieved using other conditions.
3.2 Index Merge Union Access Algorithm
The criteria【kraɪ'tɪriə 标准;(评判或作决定的)准则;原则;】 for this algorithm are similar to those for the Index Merge intersection algorithm. The algorithm is applicable when the table's WHERE clause is converted to several range conditions on different keys combined with OR, and each condition is one of the following:
• An N-part expression of this form, where the index has exactly N parts (that is, all index parts are covered):
key_part1 = const1 OR key_part2 = const2 ... OR key_partN = constN
• Any range condition over a primary key of an InnoDB table.
• A condition for which the Index Merge intersection algorithm is applicable.
Examples:
SELECT * FROM t1 WHERE key1 = 1 OR key2 = 2 OR key3 = 3; SELECT * FROM innodb_table WHERE (key1 = 1 AND key2 = 2) OR (key3 = 'foo' AND key4 = 'bar') AND key5 = 5;
3.3 Index Merge Sort-Union Access Algorithm
This access algorithm is applicable when the WHERE clause is converted to several range conditions combined by OR, but the Index Merge union algorithm is not applicable.
Examples:
SELECT * FROM tbl_name WHERE key_col1 < 10 OR key_col2 < 20; SELECT * FROM tbl_name WHERE (key_col1 > 10 OR key_col2 = 20) AND nonkey_col = 30;
The difference between the sort-union algorithm and the union algorithm is that the sort-union algorithm must first fetch row IDs for all rows and sort them before returning any rows.
3.4 Influencing Index Merge Optimization
Use of Index Merge is subject to the value of the index_merge, index_merge_intersection, index_merge_union, and index_merge_sort_union flags of the optimizer_switch system variable.By default, all those flags are on. To enable only certain algorithms, set index_merge to off, and enable only such of the others as should be permitted.
In addition to using the optimizer_switch system variable to control optimizer use of the Index Merge algorithms session-wide, MySQL supports optimizer hints to influence the optimizer on a per-statement basis.
4 Hash Join Optimization
By default, MySQL (8.0.18 and later) employs hash joins whenever possible. It is possible to control whether hash joins are employed using one of the BNL and NO_BNL optimizer hints, or by setting block_nested_loop=on or block_nested_loop=off as part of the setting for the optimizer_switch server system variable.
【MySQL 8.0.18 supported setting a hash_join flag in optimizer_switch, as well as the optimizer hints HASH_JOIN and NO_HASH_JOIN. In MySQL 8.0.19 and later, none of these have any effect any longer.】
Beginning with MySQL 8.0.18, MySQL employs a hash join for any query for which each join has an equi-join condition, and in which there are no indexes that can be applied to any join conditions, such as this one:
SELECT * FROM t1 JOIN t2 ON t1.c1=t2.c1;
A hash join can also be used when there are one or more indexes that can be used for single-table predicates.
A hash join is usually faster than and is intended to be used in such cases instead of the block nested loop algorithm employed in previous versions of MySQL. Beginning with MySQL 8.0.20, support for block nested loop is removed, and the server employs a hash join wherever a block nested loop would have been used previously.
In the example just shown and the remaining examples in this section, we assume that the three tables t1, t2, and t3 have been created using the following statements:
CREATE TABLE t1 (c1 INT, c2 INT); CREATE TABLE t2 (c1 INT, c2 INT); CREATE TABLE t3 (c1 INT, c2 INT);
You can see that a hash join is being employed by using EXPLAIN, like this:
mysql> EXPLAIN -> SELECT * FROM t1 -> JOIN t2 ON t1.c1=t2.c1\G *************************** 1. row *************************** id: 1 select_type: SIMPLE table: t1 partitions: NULL type: ALL possible_keys: NULL key: NULL key_len: NULL ref: NULL rows: 1 filtered: 100.00 Extra: NULL *************************** 2. row *************************** id: 1 select_type: SIMPLE table: t2 partitions: NULL type: ALL possible_keys: NULL key: NULL key_len: NULL ref: NULL rows: 1 filtered: 100.00 Extra: Using where; Using join buffer (hash join)
(Prior to MySQL 8.0.20, it was necessary to include the FORMAT=TREE option to see whether hash joins were being used for a given join.)
EXPLAIN ANALYZE also displays information about hash joins used.
The hash join is used for queries involving multiple joins as well, as long as at least one join condition for each pair of tables is an equi-join, like the query shown here:
SELECT * FROM t1 JOIN t2 ON (t1.c1 = t2.c1 AND t1.c2 < t2.c2) JOIN t3 ON (t2.c1 = t3.c1);
In cases like the one just shown, which makes use of an inner join, any extra conditions which are not equi-joins are applied as filters after the join is executed. (For outer joins, such as left joins, semijoins, and antijoins, they are printed as part of the join.) This can be seen here in the output of EXPLAIN:
mysql> EXPLAIN FORMAT=TREE -> SELECT * -> FROM t1 -> JOIN t2 -> ON (t1.c1 = t2.c1 AND t1.c2 < t2.c2) -> JOIN t3 -> ON (t2.c1 = t3.c1)\G *************************** 1. row *************************** EXPLAIN: -> Inner hash join (t3.c1 = t1.c1) (cost=1.05 rows=1) -> Table scan on t3 (cost=0.35 rows=1) -> Hash -> Filter: (t1.c2 < t2.c2) (cost=0.70 rows=1) -> Inner hash join (t2.c1 = t1.c1) (cost=0.70 rows=1) -> Table scan on t2 (cost=0.35 rows=1) -> Hash -> Table scan on t1 (cost=0.35 rows=1)
As also can be seen from the output just shown, multiple hash joins can be (and are) used for joins having multiple equi-join conditions.
Prior to MySQL 8.0.20, a hash join could not be used if any pair of joined tables did not have at least one equi-join condition, and the slower block nested loop algorithm was employed. In MySQL 8.0.20 and later, the hash join is used in such cases, as shown here:
mysql> EXPLAIN FORMAT=TREE -> SELECT * FROM t1 -> JOIN t2 ON (t1.c1 = t2.c1) -> JOIN t3 ON (t2.c1 < t3.c1)\G *************************** 1. row *************************** EXPLAIN: -> Filter: (t1.c1 < t3.c1) (cost=1.05 rows=1) -> Inner hash join (no condition) (cost=1.05 rows=1) -> Table scan on t3 (cost=0.35 rows=1) -> Hash -> Inner hash join (t2.c1 = t1.c1) (cost=0.70 rows=1) -> Table scan on t2 (cost=0.35 rows=1) -> Hash -> Table scan on t1 (cost=0.35 rows=1)
A hash join is also applied for a Cartesian product—that is, when no join condition is specified, as shown here:
mysql> EXPLAIN FORMAT=TREE -> SELECT * -> FROM t1 -> JOIN t2 -> WHERE t1.c2 > 50\G *************************** 1. row *************************** EXPLAIN: -> Inner hash join (cost=0.70 rows=1) -> Table scan on t2 (cost=0.35 rows=1) -> Hash -> Filter: (t1.c2 > 50) (cost=0.35 rows=1) -> Table scan on t1 (cost=0.35 rows=1)
In MySQL 8.0.20 and later, it is no longer necessary for the join to contain at least one equi-join condition in order for a hash join to be used. This means that the types of queries which can be optimized using hash joins include those in the following list (with examples):
• Inner non-equi-join:
mysql> EXPLAIN FORMAT=TREE SELECT * FROM t1 JOIN t2 ON t1.c1 < t2.c1\G *************************** 1. row *************************** EXPLAIN: -> Filter: (t1.c1 < t2.c1) (cost=4.70 rows=12) -> Inner hash join (no condition) (cost=4.70 rows=12) -> Table scan on t2 (cost=0.08 rows=6) -> Hash -> Table scan on t1 (cost=0.85 rows=6)
• Semijoin:
mysql> EXPLAIN FORMAT=TREE SELECT * FROM t1 -> WHERE t1.c1 IN (SELECT t2.c2 FROM t2)\G *************************** 1. row *************************** EXPLAIN: -> Hash semijoin (t2.c2 = t1.c1) (cost=0.70 rows=1) -> Table scan on t1 (cost=0.35 rows=1) -> Hash -> Table scan on t2 (cost=0.35 rows=1)
• Antijoin:
mysql> EXPLAIN FORMAT=TREE SELECT * FROM t2 -> WHERE NOT EXISTS (SELECT * FROM t1 WHERE t1.c1 = t2.c1)\G *************************** 1. row *************************** EXPLAIN: -> Hash antijoin (t1.c1 = t2.c1) (cost=0.70 rows=1) -> Table scan on t2 (cost=0.35 rows=1) -> Hash -> Table scan on t1 (cost=0.35 rows=1) 1 row in set, 1 warning (0.00 sec) mysql> SHOW WARNINGS\G *************************** 1. row *************************** Level: Note Code: 1276 Message: Field or reference 't3.t2.c1' of SELECT #2 was resolved in SELECT #1
• Left outer join:
mysql> EXPLAIN FORMAT=TREE SELECT * FROM t1 LEFT JOIN t2 ON t1.c1 = t2.c1\G *************************** 1. row *************************** EXPLAIN: -> Left hash join (t2.c1 = t1.c1) (cost=0.70 rows=1) -> Table scan on t1 (cost=0.35 rows=1) -> Hash -> Table scan on t2 (cost=0.35 rows=1)
• Right outer join (observe that MySQL rewrites all right outer joins as left outer joins):
mysql> EXPLAIN FORMAT=TREE SELECT * FROM t1 RIGHT JOIN t2 ON t1.c1 = t2.c1\G *************************** 1. row *************************** EXPLAIN: -> Left hash join (t1.c1 = t2.c1) (cost=0.70 rows=1) -> Table scan on t2 (cost=0.35 rows=1) -> Hash -> Table scan on t1 (cost=0.35 rows=1)
By default, MySQL 8.0.18 and later employs hash joins whenever possible. It is possible to control whether hash joins are employed using one of the BNL and NO_BNL optimizer hints.
(MySQL 8.0.18 supported hash_join=on or hash_join=off as part of the setting for the optimizer_switch server system variable as well as the optimizer hints HASH_JOIN or NO_HASH_JOIN. In MySQL 8.0.19 and later, these no longer have any effect.)
Memory usage by hash joins can be controlled using the join_buffer_size system variable; a hash join cannot use more memory than this amount. When the memory required for a hash join exceeds the amount available, MySQL handles this by using files on disk. If this happens, you should be aware that the join may not succeed if a hash join cannot fit into memory and it creates more files than set for open_files_limit. To avoid such problems, make either of the following changes:
• Increase join_buffer_size so that the hash join does not spill over to disk.
• Increase open_files_limit.
Beginning with MySQL 8.0.18, join buffers for hash joins are allocated incrementally; thus, you can set join_buffer_size higher without small queries allocating very large amounts of RAM, but outer joins allocate the entire buffer. In MySQL 8.0.20 and later, hash joins are used for outer joins (including antijoins and semijoins) as well, so this is no longer an issue.
5. Engine Condition Pushdown Optimization
This optimization improves the efficiency of direct comparisons between a nonindexed column and a constant. In such cases, the condition is “pushed down” to the storage engine for evaluation. This optimization can be used only by the NDB storage engine.
For NDB Cluster, this optimization can eliminate the need to send nonmatching rows over the network between the cluster's data nodes and the MySQL server that issued the query, and can speed up queries where it is used by a factor of 5 to 10 times over cases where condition pushdown could be but is not used.
Suppose that an NDB Cluster table is defined as follows:
CREATE TABLE t1 ( a INT, b INT, KEY(a) ) ENGINE=NDB;
Engine condition pushdown【下推;叠加;下推操作;下推存储器;】 can be used with queries such as the one shown here, which includes a comparison between a nonindexed column and a constant:
SELECT a, b FROM t1 WHERE b = 10;
The use of engine condition pushdown can be seen in the output of EXPLAIN:
mysql> EXPLAIN SELECT a, b FROM t1 WHERE b = 10\G *************************** 1. row *************************** id: 1 select_type: SIMPLE table: t1 type: ALL possible_keys: NULL key: NULL key_len: NULL ref: NULL rows: 10 Extra: Using where with pushed condition
6 Index Condition Pushdown Optimization
Index Condition Pushdown (ICP) is an optimization for the case where MySQL retrieves rows from a table using an index. Without ICP, the storage engine traverses the index to locate rows in the base table and returns them to the MySQL server which evaluates the WHERE condition for the rows. With ICP enabled, and if parts of the WHERE condition can be evaluated by using only columns from the index, the MySQL server pushes this part of the WHERE condition down to the storage engine. The storage engine then evaluates the pushed index condition by using the index entry and only if this is satisfied is the row read from the table. ICP can reduce the number of times the storage engine must access the base table and the number of times the MySQL server must access the storage engine.
Applicability of the Index Condition Pushdown optimization is subject to these conditions:
• ICP is used for the range, ref, eq_ref, and ref_or_null access methods when there is a need to access full table rows.
• ICP can be used for InnoDB and MyISAM tables, including partitioned InnoDB and MyISAM tables.
• For InnoDB tables, ICP is used only for secondary indexes. The goal of ICP is to reduce the number of full-row reads and thereby reduce I/O operations. For InnoDB clustered indexes, the complete record is already read into the InnoDB buffer. Using ICP in this case does not reduce I/O.
• ICP is not supported with secondary indexes created on virtual generated columns. InnoDB supports secondary indexes on virtual generated columns.
• Conditions that refer to subqueries cannot be pushed down.
• Conditions that refer to stored functions cannot be pushed down. Storage engines cannot invoke stored functions.
• Triggered conditions cannot be pushed down.
• (MySQL 8.0.30 and later:) Conditions cannot be pushed down to derived tables containing references to system variables.
To understand how this optimization works, first consider how an index scan proceeds when Index Condition Pushdown is not used:
1. Get the next row, first by reading the index tuple, and then by using the index tuple to locate and read the full table row.
2. Test the part of the WHERE condition that applies to this table. Accept or reject the row based on the test result.
Using Index Condition Pushdown, the scan proceeds like this instead:
1. Get the next row's index tuple (but not the full table row).
2. Test the part of the WHERE condition that applies to this table and can be checked using only index columns. If the condition is not satisfied, proceed to the index tuple for the next row.
3. If the condition is satisfied, use the index tuple to locate and read the full table row.
4. Test the remaining part of the WHERE condition that applies to this table. Accept or reject the row based on the test result.
EXPLAIN output shows Using index condition in the Extra column when Index Condition Pushdown is used. It does not show Using index because that does not apply when full table rows must be read.
Suppose that a table contains information about people and their addresses and that the table has an index defined as INDEX (zipcode, lastname, firstname). If we know a person's zipcode value but are not sure about the last name, we can search like this:
SELECT * FROM people WHERE zipcode='95054' AND lastname LIKE '%etrunia%' AND address LIKE '%Main Street%';
MySQL can use the index to scan through people with zipcode='95054'. The second part (lastname LIKE '%etrunia%') cannot be used to limit the number of rows that must be scanned, so without Index Condition Pushdown, this query must retrieve full table rows for all people who have zipcode='95054'.
With Index Condition Pushdown, MySQL checks the lastname LIKE '%etrunia%' part before reading the full table row. This avoids reading full rows corresponding to index tuples that match the zipcode condition but not the lastname condition.
Index Condition Pushdown is enabled by default. It can be controlled with the optimizer_switch system variable by setting the index_condition_pushdown flag:
SET optimizer_switch = 'index_condition_pushdown=off'; SET optimizer_switch = 'index_condition_pushdown=on';
7 Nested-Loop Join Algorithms
MySQL executes joins between tables using a nested-loop algorithm or variations【ˌvɛriˈeɪʃənz (数量、水平等的)变化,变更,变异;变体;变种;变奏曲;变奏;变异的东西;】 on it.
• Nested-Loop Join Algorithm • Block Nested-Loop Join Algorithm
7.1 Nested-Loop Join Algorithm
A simple nested-loop join (NLJ) algorithm reads rows from the first table in a loop one at a time, passing each row to a nested loop that processes the next table in the join. This process is repeated as many times as there remain tables to be joined.
Assume that a join between three tables t1, t2, and t3 is to be executed using the following join types:
Table Join Type t1 range t2 ref t3 ALL
If a simple NLJ algorithm is used, the join is processed like this:
for each row in t1 matching range { for each row in t2 matching reference key { for each row in t3 { if row satisfies join conditions, send to client } } }
Because the NLJ algorithm passes rows one at a time from outer loops to inner loops, it typically reads tables processed in the inner loops many times.
7.2 Block Nested-Loop Join Algorithm
A Block Nested-Loop (BNL) join algorithm uses buffering of rows read in outer loops to reduce the number of times that tables in inner loops must be read. For example, if 10 rows are read into a buffer and the buffer is passed to the next inner loop, each row read in the inner loop can be compared against all 10 rows in the buffer. This reduces by an order of magnitude【ˈmæɡnɪtuːd 巨大;震级;星等;重大;重要性;星的亮度;】 the number of times the inner table must be read.
Prior to MySQL 8.0.18, this algorithm was applied for equi-joins when no indexes could be used; in MySQL 8.0.18 and later, the hash join optimization is employed in such cases. Starting with MySQL 8.0.20, the block nested loop is no longer used by MySQL, and a hash join is employed for in all cases where the block nested loop was used previously.
MySQL join buffering has these characteristics【ˌkɛrəktəˈrɪstɪks 特征;特点;品质;】:
• Join buffering can be used when the join is of type ALL or index (in other words, when no possible keys can be used, and a full scan is done, of either the data or index rows, respectively【rɪˈspektɪvli 分别;各自;分别地;依次为;顺序为;】), or range.
• A join buffer is never allocated for the first nonconstant table, even if it would be of type ALL or index.
• Only columns of interest to a join are stored in its join buffer, not whole rows.
• The join_buffer_size system variable determines the size of each join buffer used to process a query.
• One buffer is allocated for each join that can be buffered, so a given query might be processed using multiple join buffers.
• A join buffer is allocated prior to executing the join and freed after the query is done.
For the example join described previously for the NLJ algorithm (without buffering), the join is done as follows using join buffering:
for each row in t1 matching range { for each row in t2 matching reference key { store used columns from t1, t2 in join buffer if buffer is full { for each row in t3 { for each t1, t2 combination in join buffer { if row satisfies join conditions, send to client } } empty join buffer } } } if buffer is not empty { for each row in t3 { for each t1, t2 combination in join buffer { if row satisfies join conditions, send to client } } }
If S is the size of each stored t1, t2 combination in the join buffer and C is the number of combinations in the buffer, the number of times table t3 is scanned is:
(S * C)/join_buffer_size + 1
The number of t3 scans decreases【dɪˈkriːsɪz (使大小、数量等)减少,减小,降低;】 as the value of join_buffer_size increases, up to the point when join_buffer_size is large enough to hold all previous row combinations【ˌkɑmbəˈneɪʃənz 结合;联合;混合;结合体;联合体;混合体;(用于开密码锁的)数码组合,字码组合;】. At that point, no speed is gained by making it larger.