ClickHouse 语法优化细节(一)
ClickHouse 的 SQL 优化规则是基于RBO(Rule Based Optimization),下面是一些优化规则细节,
1、准备测试表
1、上传官方的数据集:将visits_v1.tar和hits_v1.tar上传到虚拟机,解压到clickhouse数据路径下:将visits_v1.tar和hits_v1.tar上传到虚拟机,解压到clickhouse数据路径下
/ 解压到clickhouse数据路径 sudo tar -xvf hits_v1.tar -C /var/lib/clickhouse sudo tar -xvf visits_v1.tar -C /var/lib/clickhouse //修改所属用户 sudo chown -R clickhouse:clickhouse /var/lib/clickhouse/data/datasets sudo chown -R clickhouse:clickhouse /var/lib/clickhouse/metadata/datasets
2、重启clickhouse-server
[hui@hadoop201 ~]$ sudo clickhouse restart
3、执行查询
[root@hadoop201 metadata]# clickhouse-client --query "SELECT COUNT(*) FROM datasets.hits_v1" 8873898 [root@hadoop201 metadata]# clickhouse-client --query "SELECT COUNT(*) FROM datasets.visits_v1" 1676861
注意:官方的tar包,包含了建库、建表语句、数据内容,这种方式不需要手动建库、建表,最方便。
hits_v1表有130多个字段,880多万条数据
visits_v1表有180多个字段,160多万条数据
2、COUNT 优化
在调用 count 函数时,如果使用的是 count() 或者 count(*),且没有 where 条件,则会直接使用 system.tables 的 total_rows,例如:
XPLAIN SELECT count()FROM datasets.hits_v1; Union Expression (Projection) Expression (Before ORDER BY and SELECT) MergingAggregated ReadNothing (Optimized trivial count)
注意 Optimized trivial count ,这是对 count 的优化。如果 count 具体的列字段,则不会使用此项优化:
XPLAIN SELECT count(CounterID) FROM datasets.hits_v1; Union Expression (Projection) Expression (Before ORDER BY and SELECT) Aggregating Expression (Before GROUP BY) ReadFromStorage (Read from MergeTree)
说明:由于使用 count(*) 和 count(1) 底层会自动优化为 count() ,所有 count(*) 和 count(1) 也可以进行自动优化;
EXPLAIN syntax SELECT count(*) FROM datasets.hits_v1; SELECT count() FROM datasets.hits_v1; EXPLAIN syntax SELECT count(1) FROM datasets.hits_v1; SELECT count() FROM datasets.hits_v1;
3、消除重复字段
下面语句子查询中有两个重复的 id 字段,会被去重:
EXPLAIN SYNTAX SELECT a.UserID, b.VisitID, a.URL, b.UserID FROM datasets.hits_v1 AS a LEFT JOIN ( SELECT UserID, UserID as HaHa, VisitID FROM datasets.visits_v1) AS b USING (UserID) limit 3; //优化后结果 SELECT UserID, VisitID, URL, b.UserID FROM datasets.hits_v1 AS a ALL LEFT JOIN ( SELECT UserID, VisitID FROM datasets.visits_v1 ) AS b USING (UserID) LIMIT 3
4、谓词下推
当group by有having子句,但是没有with cube、with rollup 或者with totals修饰的时候,having过滤会下推到where提前过滤。例如下面的查询,HAVING name变成了WHERE name,在group by之前过滤:
EXPLAIN SYNTAX SELECT UserID FROM datasets.hits_v1 GROUP BY UserID HAVING UserID = '8585742290196126178'; //优化后 SELECT UserID FROM datasets.hits_v1 WHERE UserID = '8585742290196126178' GROUP BY UserID
子查询也支持谓词下推:
EXPLAIN SYNTAX SELECT * FROM (SELECT UserID FROM datasets.visits_v1) WHERE UserID = '8585742290196126178'; //优化后效果 SELECT UserID FROM (SELECT UserID FROM datasets.visits_v1 WHERE UserID = '8585742290196126178') WHERE UserID = '8585742290196126178'
再看另外一个栗子
EXPLAIN SYNTAX SELECT * FROM (SELECT * FROM (SELECT UserID FROM datasets.visits_v1) UNION ALL SELECT * FROM (SELECT UserID FROM datasets.visits_v1)) WHERE UserID = '8585742290196126178'; //优化后效果 SELECT UserID FROM ( SELECT UserID FROM ( SELECT UserID FROM datasets.visits_v1 WHERE UserID = '8585742290196126178' ) WHERE UserID = '8585742290196126178' UNION ALL SELECT UserID FROM ( SELECT UserID FROM datasets.visits_v1 WHERE UserID = '8585742290196126178' ) WHERE UserID = '8585742290196126178' ) WHERE UserID = '8585742290196126178';