复合索引 multiple-column index Composite Indexes explain 字段 group by 索引优化 松散索引扫描 紧索引扫描
MySQL :: MySQL 8.0 Reference Manual :: 8.2.1.17 GROUP BY Optimization https://dev.mysql.com/doc/refman/8.0/en/group-by-optimization.html
MySQL :: MySQL 8.0 Reference Manual :: 8.3.1 How MySQL Uses Indexes https://dev.mysql.com/doc/refman/8.0/en/mysql-indexes.html
-
To sort or group a table if the sorting or grouping is done on a leftmost prefix of a usable index (for example,
ORDER BY
). If all key parts are followed bykey_part1
,key_part2
DESC
, the key is read in reverse order. (Or, if the index is a descending index, the key is read in forward order.) See Section 8.2.1.16, “ORDER BY Optimization”, Section 8.2.1.17, “GROUP BY Optimization”, and Section 8.3.13, “Descending Indexes”.
The most important preconditions for using indexes for GROUP BY
are that all GROUP BY
columns reference attributes from the same index, and that the index stores its keys in order (as is true, for example, for a BTREE
index, but not for a HASH
index). Whether use of temporary tables can be replaced by index access also depends on which parts of an index are used in a query, the conditions specified for these parts, and the selected aggregate functions.
8.2.1.17 GROUP BY Optimization
The most general way to satisfy a GROUP BY
clause is to scan the whole table and create a new temporary table where all rows from each group are consecutive, and then use this temporary table to discover groups and apply aggregate functions (if any). In some cases, MySQL is able to do much better than that and avoid creation of temporary tables by using index access.
The most important preconditions for using indexes for GROUP BY
are that all GROUP BY
columns reference attributes from the same index, and that the index stores its keys in order (as is true, for example, for a BTREE
index, but not for a HASH
index). Whether use of temporary tables can be replaced by index access also depends on which parts of an index are used in a query, the conditions specified for these parts, and the selected aggregate functions.
There are two ways to execute a GROUP BY
query through index access, as detailed in the following sections. The first method applies the grouping operation together with all range predicates (if any). The second method first performs a range scan, and then groups the resulting tuples.
Loose Index Scan can also be used in the absence of GROUP BY
under some conditions. See Skip Scan Range Access Method.
The most efficient way to process GROUP BY
is when an index is used to directly retrieve the grouping columns. With this access method, MySQL uses the property of some index types that the keys are ordered (for example, BTREE
). This property enables use of lookup groups in an index without having to consider all keys in the index that satisfy all WHERE
conditions. This access method considers only a fraction of the keys in an index, so it is called a Loose Index Scan. When there is no WHERE
clause, a Loose Index Scan reads as many keys as the number of groups, which may be a much smaller number than that of all keys. If the WHERE
clause contains range predicates (see the discussion of the range
join type in Section 8.8.1, “Optimizing Queries with EXPLAIN”), a Loose Index Scan looks up the first key of each group that satisfies the range conditions, and again reads the smallest possible number of keys. This is possible under the following conditions:
-
The query is over a single table.
-
The
GROUP BY
names only columns that form a leftmost prefix of the index and no other columns. (If, instead ofGROUP BY
, the query has aDISTINCT
clause, all distinct attributes refer to columns that form a leftmost prefix of the index.) For example, if a tablet1
has an index on(c1,c2,c3)
, Loose Index Scan is applicable if the query hasGROUP BY c1, c2
. It is not applicable if the query hasGROUP BY c2, c3
(the columns are not a leftmost prefix) orGROUP BY c1, c2, c4
(c4
is not in the index). -
The only aggregate functions used in the select list (if any) are
MIN()
andMAX()
, and all of them refer to the same column. The column must be in the index and must immediately follow the columns in theGROUP BY
. -
Any other parts of the index than those from the
GROUP BY
referenced in the query must be constants (that is, they must be referenced in equalities with constants), except for the argument ofMIN()
orMAX()
functions. -
For columns in the index, full column values must be indexed, not just a prefix. For example, with
c1 VARCHAR(20), INDEX (c1(10))
, the index uses only a prefix ofc1
values and cannot be used for Loose Index Scan.
If Loose Index Scan is applicable to a query, the EXPLAIN
output shows Using index for group-by
in the Extra
column.
Assume that there is an index idx(c1,c2,c3)
on table t1(c1,c2,c3,c4)
. The Loose Index Scan access method can be used for the following queries:
SELECT c1, c2 FROM t1 GROUP BY c1, c2;
SELECT DISTINCT c1, c2 FROM t1;
SELECT c1, MIN(c2) FROM t1 GROUP BY c1;
SELECT c1, c2 FROM t1 WHERE c1 < const GROUP BY c1, c2;
SELECT MAX(c3), MIN(c3), c1, c2 FROM t1 WHERE c2 > const GROUP BY c1, c2;
SELECT c2 FROM t1 WHERE c1 < const GROUP BY c1, c2;
SELECT c1, c2 FROM t1 WHERE c3 = const GROUP BY c1, c2;
The following queries cannot be executed with this quick select method, for the reasons given:
-
There are aggregate functions other than
MIN()
orMAX()
:SELECT c1, SUM(c2) FROM t1 GROUP BY c1;
-
The columns in the
GROUP BY
clause do not form a leftmost prefix of the index:SELECT c1, c2 FROM t1 GROUP BY c2, c3;
-
The query refers to a part of a key that comes after the
GROUP BY
part, and for which there is no equality with a constant:SELECT c1, c3 FROM t1 GROUP BY c1, c2;
Were the query to include
WHERE c3 =
, Loose Index Scan could be used.const
The Loose Index Scan access method can be applied to other forms of aggregate function references in the select list, in addition to the MIN()
and MAX()
references already supported:
-
AVG(DISTINCT)
,SUM(DISTINCT)
, andCOUNT(DISTINCT)
are supported.AVG(DISTINCT)
andSUM(DISTINCT)
take a single argument.COUNT(DISTINCT)
can have more than one column argument. -
There must be no
GROUP BY
orDISTINCT
clause in the query. -
The Loose Index Scan limitations described previously still apply.
Assume that there is an index idx(c1,c2,c3)
on table t1(c1,c2,c3,c4)
. The Loose Index Scan access method can be used for the following queries:
SELECT COUNT(DISTINCT c1), SUM(DISTINCT c1) FROM t1;
SELECT COUNT(DISTINCT c1, c2), COUNT(DISTINCT c2, c1) FROM t1;
A Tight Index Scan may be either a full index scan or a range index scan, depending on the query conditions.
When the conditions for a Loose Index Scan are not met, it still may be possible to avoid creation of temporary tables for GROUP BY
queries. If there are range conditions in the WHERE
clause, this method reads only the keys that satisfy these conditions. Otherwise, it performs an index scan. Because this method reads all keys in each range defined by the WHERE
clause, or scans the whole index if there are no range conditions, it is called a Tight Index Scan. With a Tight Index Scan, the grouping operation is performed only after all keys that satisfy the range conditions have been found.
For this method to work, it is sufficient that there be a constant equality condition for all columns in a query referring to parts of the key coming before or in between parts of the GROUP BY
key. The constants from the equality conditions fill in any “gaps” in the search keys so that it is possible to form complete prefixes of the index. These index prefixes then can be used for index lookups. If the GROUP BY
result requires sorting, and it is possible to form search keys that are prefixes of the index, MySQL also avoids extra sorting operations because searching with prefixes in an ordered index already retrieves all the keys in order.
Assume that there is an index idx(c1,c2,c3)
on table t1(c1,c2,c3,c4)
. The following queries do not work with the Loose Index Scan access method described previously, but still work with the Tight Index Scan access method.
-
There is a gap in the
GROUP BY
, but it is covered by the conditionc2 = 'a'
:SELECT c1, c2, c3 FROM t1 WHERE c2 = 'a' GROUP BY c1, c3;
-
The
GROUP BY
does not begin with the first part of the key, but there is a condition that provides a constant for that part:SELECT c1, c2, c3 FROM t1 WHERE c1 = 'a' GROUP BY c2, c3;
explain 字段
https://dev.mysql.com/doc/refman/8.0/en/explain-output.html https://dev.mysql.com/doc/refman/8.0/en/explain-output.html
MySQL :: MySQL 8.0 Reference Manual :: 8.3.6 Multiple-Column Indexes https://dev.mysql.com/doc/refman/8.0/en/multiple-column-indexes.html
MySQL :: MySQL 8.0 Reference Manual :: 8.2.1.3 Index Merge Optimization https://dev.mysql.com/doc/refman/8.0/en/index-merge-optimization.html
8.3.6 Multiple-Column Indexes
MySQL can create composite indexes (that is, indexes on multiple columns). An index may consist of up to 16 columns. For certain data types, you can index a prefix of the column (see Section 8.3.5, “Column Indexes”).
MySQL can use multiple-column indexes for queries that test all the columns in the index, or queries that test just the first column, the first two columns, the first three columns, and so on. If you specify the columns in the right order in the index definition, a single composite index can speed up several kinds of queries on the same table.
A multiple-column index can be considered a sorted array, the rows of which contain values that are created by concatenating the values of the indexed columns.
As an alternative to a composite index, you can introduce a column that is “hashed” based on information from other columns. If this column is short, reasonably unique, and indexed, it might be faster than a “wide” index on many columns. In MySQL, it is very easy to use this extra column:
SELECT * FROM tbl_name
WHERE hash_col=MD5(CONCAT(val1,val2))
AND col1=val1 AND col2=val2;
Suppose that a table has the following specification:
CREATE TABLE test (
id INT NOT NULL,
last_name CHAR(30) NOT NULL,
first_name CHAR(30) NOT NULL,
PRIMARY KEY (id),
INDEX name (last_name,first_name)
);
The name
index is an index over the last_name
and first_name
columns. The index can be used for lookups in queries that specify values in a known range for combinations of last_name
and first_name
values. It can also be used for queries that specify just a last_name
value because that column is a leftmost prefix of the index (as described later in this section). Therefore, the name
index is used for lookups in the following queries:
SELECT * FROM test WHERE last_name='Jones';
SELECT * FROM test
WHERE last_name='Jones' AND first_name='John';
SELECT * FROM test
WHERE last_name='Jones'
AND (first_name='John' OR first_name='Jon');
SELECT * FROM test
WHERE last_name='Jones'
AND first_name >='M' AND first_name < 'N';
However, the name
index is not used for lookups in the following queries:
SELECT * FROM test WHERE first_name='John';
SELECT * FROM test
WHERE last_name='Jones' OR first_name='John';
Suppose that you issue the following SELECT
statement:
SELECT * FROM tbl_name
WHERE col1=val1 AND col2=val2;
If a multiple-column index exists on col1
and col2
, the appropriate rows can be fetched directly. If separate single-column indexes exist on col1
and col2
, the optimizer attempts to use the Index Merge optimization (see Section 8.2.1.3, “Index Merge Optimization”), or attempts to find the most restrictive index by deciding which index excludes more rows and using that index to fetch the rows.
If the table has a multiple-column index, any leftmost prefix of the index can be used by the optimizer to look up rows. For example, if you have a three-column index on (col1, col2, col3)
, you have indexed search capabilities on (col1)
, (col1, col2)
, and (col1, col2, col3)
.
MySQL cannot use the index to perform lookups if the columns do not form a leftmost prefix of the index. Suppose that you have the SELECT
statements shown here:
SELECT * FROM tbl_name WHERE col1=val1;
SELECT * FROM tbl_name WHERE col1=val1 AND col2=val2;
SELECT * FROM tbl_name WHERE col2=val2;
SELECT * FROM tbl_name WHERE col2=val2 AND col3=val3;
If an index exists on (col1, col2, col3)
, only the first two queries use the index. The third and fourth queries do involve indexed columns, but do not use an index to perform lookups because (col2)
and (col2, col3)
are not leftmost prefixes of (col1, col2, col3)
.