MySQL中的COUNT(*)和COUNT(col)

2023-01-12 18:09 abce 阅读(131) 评论(0) 编辑收藏举报

另一篇：difference between count(1) and count(*)

看看人们是如何使用COUNT(*)和COUNT(col)的，看起来大多数人都认为它们是同义词，只是使用他们喜欢的，而在性能甚至查询结果上都有实质性的差异。此外，我们还发现InnoDB和MyISAM引擎在执行上存在差异。

注意:所有测试都应用于MySQL 8.0.30版本，在后台，我运行每个查询三到五次，以确保所有查询都完全缓存在缓冲池中(对于InnoDB)或文件系统中(对于MyISAM)。

InnoDB引擎下的count()函数

来看看一个InnoDB的例子：

CREATE TABLE count_innodb (
  id int(10) unsigned NOT NULL AUTO_INCREMENT,
  val_with_nulls int(11) default NULL,
  val_no_null int(10) unsigned NOT NULL,
  PRIMARY KEY idx (id)
) ENGINE=InnoDB DEFAULT CHARSET=latin1;

(mysql) > select count(*) from count_innodb;
+----------+
| count(*) |
+----------+
| 10000000 |
+----------+
1 row in set (0.38 sec)

(mysql) > select count(val_no_null) from count_innodb;
+--------------------+
| count(val_no_null) |
+--------------------+
|           10000000 |
+--------------------+
1 row in set (0.38 sec)

在innodb引擎中，count(*)和count(val_no_null)耗费了相同的时间。我们继续往下看，相同的条件下，MyISAM执行COUNT(*)比innodb快多了。

但是，为什么我们不能直接缓存表的行记录数呢？innodb不会记录表的记录数，因为同一时间，并发事务看到的行数可能不一样。因此，select count(*)语句只能计算当前事务可以看到的记录数。我们可以借助information schema来获取表的大致数量：

(mysql) >  select table_rows from information_schema.tables where table_name='count_innodb';
+------------+
| TABLE_ROWS |
+------------+
|    9980586 |
+------------+
1 row in set (0.00 sec)

可以看到，这个记录数是不准确的。不过，有些场景只要一个粗略的统计就可以了。

再来看看COUNT(val_with_nulls)

(mysql) >  select count(val_with_nulls) from count_innodb;
+-----------------------+
| count(val_with_nulls) |
+-----------------------+
|               9990001 |
+-----------------------+
1 row in set (2.14 sec)

可以看到返回的值和count(*)是不同的。

这是为什么呢？因为val_with_nulls列没有定义成not null，可能会有null值，mysql必须通过扫描表来发现。这也是为什么查询的结果不同。

所以，COUNT(*)和COUNT(col)不仅有性能上的差异，还有功能上的差异。

我们再来看看另外的查询，研究一下innodb在相同的where条件下，是如何管理COUNT(*), COUNT(val_no_null), COUNT(val_with_nulls)的。

(mysql) >  select count(*) from count_innodb where id<1000000;
+----------+
| count(*) |
+----------+
|   980000 |
+----------+
1 row in set (0.30 sec)

(mysql) > explain select count(*) from count_innodb where id<1000000\G
*************************** 1. row ***************************
  select_type: SIMPLE
        table: count_innodb
         type: range
possible_keys: PRIMARY
          key: PRIMARY
         rows: 1955802
     filtered: 100.00
        Extra: Using where; Using index

(mysql) >  select count(val_no_null) from count_innodb where id<1000000;
+--------------------+
| count(val_no_null) |
+--------------------+
|             980000 |
+--------------------+
1 row in set (0.33 sec)

(mysql) >  explain select count(val_no_null) from count_innodb where id<1000000\G
*************************** 1. row ***************************
  select_type: SIMPLE
        table: count_innodb
         type: range
possible_keys: PRIMARY
          key: PRIMARY
         rows: 2013804
     filtered: 100.00
        Extra: Using where

我们可以看到，在这两种情况下，查询的性能是相等的，只有10%的差异，如果你更仔细地关注COUNT(*)查询的EXPLAIN，将注意到Using index。这意味着MySQL可以只使用索引，而不涉及其他表数据，这可能足以获得大型表的行数。

你也可能希望使用已经有索引的列来加快大型表的查询速度。

COUNT(val_with_nulls)会有什么惊喜吗?让我们来看看:

(mysql) > select count(val_with_nulls) from count_innodb where id<1000000;
+-----------------------+
| count(val_with_nulls) |
+-----------------------+
|                970001 |
+-----------------------+
1 row in set (0.33 sec)

(mysql) > explain select count(val_with_nulls) from count_innodb where id<1000000\G
*************************** 1. row ***************************
           id: 1
  select_type: SIMPLE
        table: count_innodb
   partitions: NULL
         type: range
possible_keys: PRIMARY
          key: PRIMARY
      key_len: 4
          ref: NULL
         rows: 1955802
     filtered: 100.00
        Extra: Using where
1 row in set, 1 warning (0.00 sec)

我们可以看到，在所有COUNT(*)， COUNT(val_with_nulls)， COUNT(val_with_nulls)中，查询的性能相当均匀。

MyISAM引擎下的count()函数

来看看MyISAM引擎下的count函数：

CREATE TABLE count_myisam (
  id int(10) unsigned NOT NULL,
  val_with_nulls int(11) default NULL,
  val_no_null int(10) unsigned NOT NULL,
  KEY idx (id)
) ENGINE=MyISAM DEFAULT CHARSET=utf8mb4 COLLATE=utf8mb4_0900_ai_ci;

(mysql) > select count(*) from count_myisam;
+----------+
| count(*) |
+----------+
| 10000000 |
+----------+
1 row in set (0.00 sec)

(mysql) > select count(val_no_null) from count_myisam;
+--------------------+
| count(val_no_null) |
+--------------------+
|           10000000 |
+--------------------+
1 row in set (0.00 sec)

速度很快！

由于这是一个MyISAM表，我们已经缓存了表中的行数，这就是MyISAM引擎的工作方式。这就是为什么它可以立即回答COUNT(*)和COUNT(val_no_null)查询。

请注意引擎之间的区别:InnoDB是事务引擎，而MyISAM是非事务存储引擎。

(mysql) > select count(val_with_nulls) from count_myisam;
+-----------------------+
| count(val_with_nulls) |
+-----------------------+
|               9990001 |
+-----------------------+
1 row in set (14.18 sec)

但是当涉及到MyISAM表的COUNT(val_with_nulls)时，我们可以看到比InnoDB慢了40倍;差别太大了。

同样，我们可以看到COUNT(val_with_nulls)的行为，因为NULL值显然不会被考虑。MySQL Optimizer在这种情况下做得很好，只在需要时进行全表扫描，因为列可以为NULL。

现在让我们用WHERE子句对MyISAM表进行更多的查询:

(mysql) >  select count(*) from count_myisam where id<1000000;
+----------+
| count(*) |
+----------+
|  1001237 |
+----------+
1 row in set (0.41 sec)

(mysql) >  explain select count(*) from count_myisam where id<1000000 \G
*************************** 1. row ***************************
  select_type: SIMPLE
        table: count_myisam
         type: range
possible_keys: idx
          key: idx
         rows: 1041561
     filtered: 100.00
        Extra: Using where; Using index

(mysql) > select count(val_no_null) from count_myisam where id<1000000;
+--------------------+
| count(val_no_null) |
+--------------------+
|            1001237 |
+--------------------+
1 row in set (2.55 sec)

(mysql) >  explain select count(val_no_null) from count_myisam where id<1000000\G
*************************** 1. row ***************************
  select_type: SIMPLE
        table: count_myisam
         type: range
possible_keys: idx
          key: idx
         rows: 1041561
     filtered: 100.00
        Extra: Using index condition; Using MRR
(mysql) >  select count(val_with_nulls) from count_myisam where id<1000000;
+-----------------------+
| count(val_with_nulls) |
+-----------------------+
|               1000281 |
+-----------------------+
1 row in set (2.55 sec)

(mysql) >  explain select count(val_with_nulls) from count_myisam where id<1000000\G
*************************** 1. row ***************************
  select_type: SIMPLE
        table: count_myisam
         type: range
possible_keys: idx
          key: idx
         rows: 1041561
     filtered: 100.00
        Extra: Using index condition; Using MRR

可以看到，即使使用WHERE子句，COUNT(*)和COUNT(col)的性能也会有显著不同。

事实上，这个示例显示了5倍的性能差异，因为所有数据都位于内存(请注意，由于它是MyISAM引擎，数据缓存发生在文件系统缓存级别)。对于io绑定的工作负载，在这种情况下，经常可以看到甚至是100倍的性能差异。

COUNT(*)查询可以使用索引覆盖，而COUNT(col)则不能。当然，可以将索引扩展为(id,val_with_nulls)，并使查询再次被索引覆盖，但我只在无法更改查询(即，它是第三方应用程序)或由于某种原因在查询中出现列名，并且需要非null值的计数时才会使用这种解决方法。

值得注意的是，在这种情况下，MySQL Optimizer并没有很好地优化查询。可能会注意到(val_no_null)列是not null，因此COUNT(val_no_null)与COUNT(*)相同，查询可以作为索引覆盖的查询运行。但是mysql没有，两个查询都执行行读取。

(mysql) >  alter table count_myisam drop key idx, add key idx (id,val_with_nulls);
Query OK, 10000000 rows affected (1 min 38.71 sec)
Records: 10000000  Duplicates: 0  Warnings: 0

(mysql) >  select count(val_with_nulls) from count_myisam where id<1000000;
+-----------------------+
| count(val_with_nulls) |
+-----------------------+
|               1000281 |
+-----------------------+
1 row in set (0.42 sec)

(mysql) >  select count(*) from count_myisam where id<1000000;
+----------+
| count(*) |
+----------+
|  1000762 |
+----------+
1 row in set (0.56 sec)

正如所看到的，与没有索引的COUNT(val_with_nulls)查询相比，扩展索引有助于提高对空值的COUNT(val_with_nulls)查询大约7倍。但同时，你可以看到COUNT(*)变慢了0.6倍，可能是因为在这种情况下索引变长了两倍。

最后，我想消除一些关于COUNT(0)和COUNT(1)的错觉。

(mysql) >  select count(1) from count_innodb where id<1000000;
+----------+
| count(1) |
+----------+
|   980000 |
+----------+
1 row in set (0.30 sec)

(mysql) >  select count(0) from count_innodb where id<1000000;
+----------+
| count(0) |
+----------+
|   980000 |
+----------+
1 row in set (0.30 sec)

(mysql) > explain select count(1) from count_innodb where id<1000000\G
*************************** 1. row ***************************
  select_type: SIMPLE
        table: count_innodb
         type: range
possible_keys: PRIMARY
          key: PRIMARY
         rows: 1955802
     filtered: 100.00
        Extra: Using where; Using index

(mysql) >  explain select count(0) from count_innodb where id<1000000\G
*************************** 1. row ***************************
  select_type: SIMPLE
        table: count_innodb
         type: range
possible_keys: PRIMARY
          key: PRIMARY
         rows: 1955802
     filtered: 100.00
        Extra: Using where; Using index

如你所见，查询的性能和解释是相同的，在COUNT()函数的括号内放入什么数字并不重要。它可以是你想要的任何数字，根据性能和此查询的实际输出，它将完全等于COUNT(*)。

刷新页面返回顶部

abce don't dream it , be it

MySQL中的COUNT(*)和COUNT(col)

About