ClickHouse 语法优化细节(一)

ClickHouse 的 SQL 优化规则是基于RBO(Rule Based Optimization),下面是一些优化规则细节,

1、准备测试表

1、上传官方的数据集:将visits_v1.tar和hits_v1.tar上传到虚拟机,解压到clickhouse数据路径下:将visits_v1.tar和hits_v1.tar上传到虚拟机,解压到clickhouse数据路径下
/ 解压到clickhouse数据路径
sudo tar -xvf hits_v1.tar -C /var/lib/clickhouse
sudo tar -xvf visits_v1.tar -C /var/lib/clickhouse
 
//修改所属用户
sudo chown -R clickhouse:clickhouse /var/lib/clickhouse/data/datasets
sudo chown -R clickhouse:clickhouse /var/lib/clickhouse/metadata/datasets
2、重启clickhouse-server
[hui@hadoop201 ~]$ sudo clickhouse restart
3、执行查询
[root@hadoop201 metadata]# clickhouse-client --query "SELECT COUNT(*) FROM datasets.hits_v1"
8873898
[root@hadoop201 metadata]# clickhouse-client --query "SELECT COUNT(*) FROM datasets.visits_v1"
1676861
注意:官方的tar包,包含了建库、建表语句、数据内容,这种方式不需要手动建库、建表,最方便。
            hits_v1表有130多个字段,880多万条数据
            visits_v1表有180多个字段,160多万条数据

2、COUNT 优化

在调用 count 函数时,如果使用的是 count() 或者 count(*),且没有 where 条件,则会直接使用 system.tables 的 total_rows,例如:
 
XPLAIN SELECT count()FROM datasets.hits_v1;
 
Union
  Expression (Projection)
    Expression (Before ORDER BY and SELECT)
      MergingAggregated
        ReadNothing (Optimized trivial count)
注意 Optimized trivial count ,这是对 count 的优化。如果 count 具体的列字段,则不会使用此项优化:
 
XPLAIN SELECT count(CounterID) FROM datasets.hits_v1;
 
Union
  Expression (Projection)
    Expression (Before ORDER BY and SELECT)
      Aggregating
        Expression (Before GROUP BY)
          ReadFromStorage (Read from MergeTree)

说明:由于使用 count(*) 和 count(1) 底层会自动优化为 count() ,所有 count(*) 和 count(1)  也可以进行自动优化;

EXPLAIN syntax SELECT count(*) FROM datasets.hits_v1;
SELECT count()
FROM datasets.hits_v1;
EXPLAIN  syntax SELECT count(1) FROM datasets.hits_v1;
SELECT count()
FROM datasets.hits_v1;

3、消除重复字段

下面语句子查询中有两个重复的 id 字段,会被去重:
EXPLAIN SYNTAX
SELECT a.UserID,
       b.VisitID,
       a.URL,
       b.UserID
FROM datasets.hits_v1 AS a
         LEFT JOIN (
    SELECT UserID,
           UserID as HaHa,
           VisitID
    FROM datasets.visits_v1) AS b
                   USING (UserID)
limit 3;
//优化后结果
SELECT
    UserID,
    VisitID,
    URL,
    b.UserID
FROM datasets.hits_v1 AS a
ALL LEFT JOIN
(
    SELECT
        UserID,
        VisitID
    FROM datasets.visits_v1
) AS b USING (UserID)
LIMIT 3

4、谓词下推

当group by有having子句,但是没有with cube、with rollup 或者with totals修饰的时候,having过滤会下推到where提前过滤。例如下面的查询,HAVING name变成了WHERE name,在group by之前过滤:
EXPLAIN SYNTAX
SELECT UserID
FROM datasets.hits_v1
GROUP BY UserID
HAVING UserID = '8585742290196126178';
//优化后
SELECT UserID
FROM datasets.hits_v1
WHERE UserID = '8585742290196126178'
GROUP BY UserID
子查询也支持谓词下推:
EXPLAIN SYNTAX
SELECT *
FROM (SELECT UserID
      FROM datasets.visits_v1)
WHERE UserID = '8585742290196126178';
//优化后效果
SELECT UserID
FROM (SELECT UserID
      FROM datasets.visits_v1
      WHERE UserID = '8585742290196126178')
WHERE UserID = '8585742290196126178'

再看另外一个栗子

EXPLAIN SYNTAX
SELECT *
FROM (SELECT *
      FROM (SELECT UserID
            FROM datasets.visits_v1)
      UNION ALL
      SELECT *
      FROM (SELECT UserID
            FROM datasets.visits_v1))
WHERE UserID = '8585742290196126178';
//优化后效果
SELECT UserID
FROM
(
    SELECT UserID
    FROM
    (
        SELECT UserID
        FROM datasets.visits_v1
        WHERE UserID = '8585742290196126178'
    )
    WHERE UserID = '8585742290196126178'
    UNION ALL
    SELECT UserID
    FROM
    (
        SELECT UserID
        FROM datasets.visits_v1
        WHERE UserID = '8585742290196126178'
    )
    WHERE UserID = '8585742290196126178'
)
WHERE UserID = '8585742290196126178'
 
posted @ 2022-11-18 09:41  晓枫的春天  阅读(87)  评论(0编辑  收藏  举报