ClickHouse-语法优化规则
ClickHouse 的 SQL 优化规则是基于 RBO(Rule Based Optimization),下面是一些优化规则
以下规则中使用了ClickHouse官方测试数据集,数据集使用步骤移步这里。
1.COUNT优化
在调用 count 函数时,如果使用的是 count() 或者 count(*),且没有 where 条件,则会直接使用 system.tables 的 total_rows,例如:
date1002 :) EXPLAIN SELECT count() FROM datasets.hits_v1; EXPLAIN SELECT count() FROM datasets.hits_v1 Query id: 5bc13813-9749-47c2-876a-71816f306358 ┌─explain─────────────────────────────────────────────────────────────────────────────┐ │ Expression ((Projection + Before ORDER BY)) │ │ SettingQuotaAndLimits (Set limits and quota after reading from storage) │ │ ReadFromStorage (MergeTree(with Aggregate projection _minmax_count_projection)) │ └─────────────────────────────────────────────────────────────────────────────────────┘ 3 rows in set. Elapsed: 0.003 sec.
ReadFromStorage,直接从持久化文件中读取count,这是对count的优化
如果count某一具体列字段,则不会使用此项优化
date1002 :) EXPLAIN SELECT count(CounterID) FROM datasets.hits_v1; EXPLAIN SELECT count(CounterID) FROM datasets.hits_v1 Query id: 4a8a69d4-a8b1-478a-a0fb-fbad12a84fa5 ┌─explain───────────────────────────────────────────────────────────────────────┐ │ Expression ((Projection + Before ORDER BY)) │ │ Aggregating │ │ Expression (Before GROUP BY) │ │ SettingQuotaAndLimits (Set limits and quota after reading from storage) │ │ ReadFromMergeTree │ └───────────────────────────────────────────────────────────────────────────────┘ 5 rows in set. Elapsed: 0.002 sec.
执行SQL看区别
执行SELECT count() FROM datasets.hits_v1;
date1002 :) SELECT count() FROM datasets.hits_v1; SELECT count() FROM datasets.hits_v1 Query id: 96ee9fc8-29ee-4e45-9917-33b6c9b99574 ┌─count()─┐ │ 8873898 │ └───────── 1 rows in set. Elapsed: 0.002 sec.
执行指定某一列的SQL可看到扫描了所有数据
date1002 :) SELECT count(CounterID) FROM datasets.hits_v1; SELECT count(CounterID) FROM datasets.hits_v1 Query id: 1a8bb3fa-9864-4e33-a92e-55b89bd8e9e5 ┌─count(CounterID)─┐ │ 8873898 │ └──────────────────┘ 1 rows in set. Elapsed: 0.008 sec. Processed 8.87 million rows, 35.50 MB (1.09 billion rows/s., 4.34 GB/s.)
2.消除子查询重复字段
下面语句子查询中有两个重复的 id 字段,会被去重:
date1002 :) EXPLAIN SYNTAX SELECT a.UserID, b.VisitID, a.URL, b.UserID FROM datasets.hits_v1 AS a LEFT JOIN ( SELECT UserID, UserID AS HaHa, VisitID FROM datasets.visits_v1 ) AS b USING (UserID) LIMIT 3; EXPLAIN SYNTAX SELECT a.UserID, b.VisitID, a.URL, b.UserID FROM datasets.hits_v1 AS a LEFT JOIN ( SELECT UserID, UserID AS HaHa, VisitID FROM datasets.visits_v1 ) AS b USING (UserID) LIMIT 3 Query id: e6907eb2-abe9-4a39-95bd-cfdcc4b87939 ┌─explain─────────────────────┐ │ SELECT │ │ UserID, │ │ VisitID, │ │ URL, │ │ b.UserID │ │ FROM datasets.hits_v1 AS a │ │ ALL LEFT JOIN │ │ ( │ │ SELECT │ │ UserID, │ │ VisitID │ │ FROM datasets.visits_v1 │ │ ) AS b USING (UserID) │ │ LIMIT 3 │ └─────────────────────────────┘ 14 rows in set. Elapsed: 0.002 sec.
3.谓词下推
当 group by 有 having 子句,但是没有 with cube、with rollup 或者 with totals 修饰的时候,having 过滤会下推到 where 提前过滤。例如下面的查询,HAVING name 变成了 WHERE name,在 group by 之前过滤:
date1002 :) EXPLAIN SYNTAX SELECT UserID FROM datasets.hits_v1 GROUP BY UserID HAVING UserID ='8585742290196126178'; EXPLAIN SYNTAX SELECT UserID FROM datasets.hits_v1 GROUP BY UserID HAVING UserID = '8585742290196126178' Query id: 87cd76aa-947a-4cf3-bd00-b21d2a5951cf ┌─explain──────────────────────────────┐ │ SELECT UserID │ │ FROM datasets.hits_v1 │ │ WHERE UserID = '8585742290196126178' │ │ GROUP BY UserID │ └──────────────────────────────────────┘ 4 rows in set. Elapsed: 0.002 sec.
子查询也支持谓词下推:
EXPLAIN SYNTAX SELECT * FROM ( SELECT UserID FROM datasets.visits_v1 ) WHERE UserID = '8585742290196126178' Query id: 4db29eee-5dc5-4bf5-84de-564e2f58a4f6 ┌─explain──────────────────────────────────┐ │ SELECT UserID │ │ FROM │ │ ( │ │ SELECT UserID │ │ FROM datasets.visits_v1 │ │ WHERE UserID = '8585742290196126178' │ │ ) │ │ WHERE UserID = '8585742290196126178' │ └──────────────────────────────────────────┘ 8 rows in set. Elapsed: 0.002 sec.
另外一个例子
EXPLAIN SYNTAX SELECT * FROM ( SELECT * FROM ( SELECT UserID FROM datasets.visits_v1 ) UNION ALL SELECT * FROM ( SELECT UserID FROM datasets.visits_v1 ) ) WHERE UserID = '8585742290196126178' Query id: b317cb59-9594-4d4f-8e5e-6ddf2f3bd52f ┌─explain──────────────────────────────────────┐ │ SELECT UserID │ │ FROM │ │ ( │ │ SELECT UserID │ │ FROM │ │ ( │ │ SELECT UserID │ │ FROM datasets.visits_v1 │ │ WHERE UserID = '8585742290196126178' │ │ ) │ │ WHERE UserID = '8585742290196126178' │ │ UNION ALL │ │ SELECT UserID │ │ FROM │ │ ( │ │ SELECT UserID │ │ FROM datasets.visits_v1 │ │ WHERE UserID = '8585742290196126178' │ │ ) │ │ WHERE UserID = '8585742290196126178' │ │ ) │ │ WHERE UserID = '8585742290196126178' │ └──────────────────────────────────────────────┘ 22 rows in set. Elapsed: 0.005 sec.
4.聚合计算外推
聚合函数内的计算,会外推,例如:
EXPLAIN SYNTAX SELECT sum(UserID * 2) FROM datasets.visits_v1 Query id: a1fa9181-51b0-427f-96cc-dc106b8eade6 ┌─explain─────────────────┐ │ SELECT sum(UserID) * 2 │ │ FROM datasets.visits_v1 │ └─────────────────────────┘ 2 rows in set. Elapsed: 0.001 sec.
5.聚合函数消除
如果对聚合键,也就是 group by key 使用 min、max、any 聚合函数,则将函数消除,例如:
EXPLAIN SYNTAX SELECT sum(UserID * 2), max(VisitID), max(UserID) FROM datasets.visits_v1 GROUP BY UserID Query id: df74bce8-e31b-4ff5-8b7e-83bcbb3b8849 ┌─explain─────────────────┐ │ SELECT │ │ sum(UserID) * 2, │ │ max(VisitID), │ │ UserID │ │ FROM datasets.visits_v1 │ │ GROUP BY UserID │ └─────────────────────────┘ 6 rows in set. Elapsed: 0.002 sec.
6.删除重复的order by key
例如下面的语句,重复的聚合键 id 字段会被去重:
EXPLAIN SYNTAX SELECT * FROM datasets.visits_v1 ORDER BY UserID ASC, UserID ASC, VisitID ASC, VisitID ASC Query id: 5c368e05-f17e-4600-8543-fb87c1035f7a ┌─explain───────────────────────────────────┐ │ SELECT │ │ CounterID, │ │ StartDate, │ │ Sign, │ │ IsNew, │ │ VisitID, │ │ UserID, │ │ StartTime, │ │ Duration, │ │ UTCStartTime, │ │ PageViews, │ │ Hits, │ │ IsBounce, │ │ Referer, │ │ StartURL, │ │ RefererDomain, │ │ StartURLDomain, │ │ EndURL, │ │ LinkURL, │ │ IsDownload, │ │ TraficSourceID, │ │ SearchEngineID, │ │ SearchPhrase, │ │ AdvEngineID, │ │ PlaceID, │ │ RefererCategories, │ │ URLCategories, │ │ URLRegions, │ │ RefererRegions, │ │ IsYandex, │ │ GoalReachesDepth, │ │ GoalReachesURL, │ │ GoalReachesAny, │ │ SocialSourceNetworkID, │ │ SocialSourcePage, │ │ MobilePhoneModel, │ │ ClientEventTime, │ │ RegionID, │ │ ClientIP, │ │ ClientIP6, │ │ RemoteIP, │ │ RemoteIP6, │ │ IPNetworkID, │ │ SilverlightVersion3, │ │ CodeVersion, │ │ ResolutionWidth, │ │ ResolutionHeight, │ │ UserAgentMajor, │ │ UserAgentMinor, │ │ WindowClientWidth, │ │ WindowClientHeight, │ │ SilverlightVersion2, │ │ SilverlightVersion4, │ │ FlashVersion3, │ │ FlashVersion4, │ │ ClientTimeZone, │ │ OS, │ │ UserAgent, │ │ ResolutionDepth, │ │ FlashMajor, │ │ FlashMinor, │ │ NetMajor, │ │ NetMinor, │ │ MobilePhone, │ │ SilverlightVersion1, │ │ Age, │ │ Sex, │ │ Income, │ │ JavaEnable, │ │ CookieEnable, │ │ JavascriptEnable, │ │ IsMobile, │ │ BrowserLanguage, │ │ BrowserCountry, │ │ Interests, │ │ Robotness, │ │ GeneralInterests, │ │ Params, │ │ `Goals.ID`, │ │ `Goals.Serial`, │ │ `Goals.EventTime`, │ │ `Goals.Price`, │ │ `Goals.OrderID`, │ │ `Goals.CurrencyID`, │ │ WatchIDs, │ │ ParamSumPrice, │ │ ParamCurrency, │ │ ParamCurrencyID, │ │ ClickLogID, │ │ ClickEventID, │ │ ClickGoodEvent, │ │ ClickEventTime, │ │ ClickPriorityID, │ │ ClickPhraseID, │ │ ClickPageID, │ │ ClickPlaceID, │ │ ClickTypeID, │ │ ClickResourceID, │ │ ClickCost, │ │ ClickClientIP, │ │ ClickDomainID, │ │ ClickURL, │ │ ClickAttempt, │ │ ClickOrderID, │ │ ClickBannerID, │ │ ClickMarketCategoryID, │ │ ClickMarketPP, │ │ ClickMarketCategoryName, │ │ ClickMarketPPName, │ │ ClickAWAPSCampaignName, │ │ ClickPageName, │ │ ClickTargetType, │ │ ClickTargetPhraseID, │ │ ClickContextType, │ │ ClickSelectType, │ │ ClickOptions, │ │ ClickGroupBannerID, │ │ OpenstatServiceName, │ │ OpenstatCampaignID, │ │ OpenstatAdID, │ │ OpenstatSourceID, │ │ UTMSource, │ │ UTMMedium, │ │ UTMCampaign, │ │ UTMContent, │ │ UTMTerm, │ │ FromTag, │ │ HasGCLID, │ │ FirstVisit, │ │ PredLastVisit, │ │ LastVisit, │ │ TotalVisits, │ │ `TraficSource.ID`, │ │ `TraficSource.SearchEngineID`, │ │ `TraficSource.AdvEngineID`, │ │ `TraficSource.PlaceID`, │ │ `TraficSource.SocialSourceNetworkID`, │ │ `TraficSource.Domain`, │ │ `TraficSource.SearchPhrase`, │ │ `TraficSource.SocialSourcePage`, │ │ Attendance, │ │ CLID, │ │ YCLID, │ │ NormalizedRefererHash, │ │ SearchPhraseHash, │ │ RefererDomainHash, │ │ NormalizedStartURLHash, │ │ StartURLDomainHash, │ │ NormalizedEndURLHash, │ │ TopLevelDomain, │ │ URLScheme, │ │ OpenstatServiceNameHash, │ │ OpenstatCampaignIDHash, │ │ OpenstatAdIDHash, │ │ OpenstatSourceIDHash, │ │ UTMSourceHash, │ │ UTMMediumHash, │ │ UTMCampaignHash, │ │ UTMContentHash, │ │ UTMTermHash, │ │ FromHash, │ │ WebVisorEnabled, │ │ WebVisorActivity, │ │ `ParsedParams.Key1`, │ │ `ParsedParams.Key2`, │ │ `ParsedParams.Key3`, │ │ `ParsedParams.Key4`, │ │ `ParsedParams.Key5`, │ │ `ParsedParams.ValueDouble`, │ │ `Market.Type`, │ │ `Market.GoalID`, │ │ `Market.OrderID`, │ │ `Market.OrderPrice`, │ │ `Market.PP`, │ │ `Market.DirectPlaceID`, │ │ `Market.DirectOrderID`, │ │ `Market.DirectBannerID`, │ │ `Market.GoodID`, │ │ `Market.GoodName`, │ │ `Market.GoodQuantity`, │ │ `Market.GoodPrice`, │ │ IslandID │ │ FROM datasets.visits_v1 │ │ ORDER BY │ │ UserID ASC, │ │ VisitID ASC │ └───────────────────────────────────────────┘ 186 rows in set. Elapsed: 0.004 sec.
7.删除重复的limit by key
例如下面的语句,重复声明的 name 字段会被去重:
EXPLAIN SYNTAX SELECT * FROM datasets.visits_v1 LIMIT 3 BY VisitID, VisitID LIMIT 10 Query id: cfb47dca-ebf5-48c2-a913-032d2f054cb9 ┌─explain───────────────────────────────────┐ │ SELECT │ │ CounterID, │ │ StartDate, │ │ Sign, │ │ IsNew, │ │ VisitID, │ │ UserID, │ │ StartTime, │ │ Duration, │ │ UTCStartTime, │ │ PageViews, │ │ Hits, │ │ IsBounce, │ │ Referer, │ │ StartURL, │ │ RefererDomain, │ │ StartURLDomain, │ │ EndURL, │ │ LinkURL, │ │ IsDownload, │ │ TraficSourceID, │ │ SearchEngineID, │ │ SearchPhrase, │ │ AdvEngineID, │ │ PlaceID, │ │ RefererCategories, │ │ URLCategories, │ │ URLRegions, │ │ RefererRegions, │ │ IsYandex, │ │ GoalReachesDepth, │ │ GoalReachesURL, │ │ GoalReachesAny, │ │ SocialSourceNetworkID, │ │ SocialSourcePage, │ │ MobilePhoneModel, │ │ ClientEventTime, │ │ RegionID, │ │ ClientIP, │ │ ClientIP6, │ │ RemoteIP, │ │ RemoteIP6, │ │ IPNetworkID, │ │ SilverlightVersion3, │ │ CodeVersion, │ │ ResolutionWidth, │ │ ResolutionHeight, │ │ UserAgentMajor, │ │ UserAgentMinor, │ │ WindowClientWidth, │ │ WindowClientHeight, │ │ SilverlightVersion2, │ │ SilverlightVersion4, │ │ FlashVersion3, │ │ FlashVersion4, │ │ ClientTimeZone, │ │ OS, │ │ UserAgent, │ │ ResolutionDepth, │ │ FlashMajor, │ │ FlashMinor, │ │ NetMajor, │ │ NetMinor, │ │ MobilePhone, │ │ SilverlightVersion1, │ │ Age, │ │ Sex, │ │ Income, │ │ JavaEnable, │ │ CookieEnable, │ │ JavascriptEnable, │ │ IsMobile, │ │ BrowserLanguage, │ │ BrowserCountry, │ │ Interests, │ │ Robotness, │ │ GeneralInterests, │ │ Params, │ │ `Goals.ID`, │ │ `Goals.Serial`, │ │ `Goals.EventTime`, │ │ `Goals.Price`, │ │ `Goals.OrderID`, │ │ `Goals.CurrencyID`, │ │ WatchIDs, │ │ ParamSumPrice, │ │ ParamCurrency, │ │ ParamCurrencyID, │ │ ClickLogID, │ │ ClickEventID, │ │ ClickGoodEvent, │ │ ClickEventTime, │ │ ClickPriorityID, │ │ ClickPhraseID, │ │ ClickPageID, │ │ ClickPlaceID, │ │ ClickTypeID, │ │ ClickResourceID, │ │ ClickCost, │ │ ClickClientIP, │ │ ClickDomainID, │ │ ClickURL, │ │ ClickAttempt, │ │ ClickOrderID, │ │ ClickBannerID, │ │ ClickMarketCategoryID, │ │ ClickMarketPP, │ │ ClickMarketCategoryName, │ │ ClickMarketPPName, │ │ ClickAWAPSCampaignName, │ │ ClickPageName, │ │ ClickTargetType, │ │ ClickTargetPhraseID, │ │ ClickContextType, │ │ ClickSelectType, │ │ ClickOptions, │ │ ClickGroupBannerID, │ │ OpenstatServiceName, │ │ OpenstatCampaignID, │ │ OpenstatAdID, │ │ OpenstatSourceID, │ │ UTMSource, │ │ UTMMedium, │ │ UTMCampaign, │ │ UTMContent, │ │ UTMTerm, │ │ FromTag, │ │ HasGCLID, │ │ FirstVisit, │ │ PredLastVisit, │ │ LastVisit, │ │ TotalVisits, │ │ `TraficSource.ID`, │ │ `TraficSource.SearchEngineID`, │ │ `TraficSource.AdvEngineID`, │ │ `TraficSource.PlaceID`, │ │ `TraficSource.SocialSourceNetworkID`, │ │ `TraficSource.Domain`, │ │ `TraficSource.SearchPhrase`, │ │ `TraficSource.SocialSourcePage`, │ │ Attendance, │ │ CLID, │ │ YCLID, │ │ NormalizedRefererHash, │ │ SearchPhraseHash, │ │ RefererDomainHash, │ │ NormalizedStartURLHash, │ │ StartURLDomainHash, │ │ NormalizedEndURLHash, │ │ TopLevelDomain, │ │ URLScheme, │ │ OpenstatServiceNameHash, │ │ OpenstatCampaignIDHash, │ │ OpenstatAdIDHash, │ │ OpenstatSourceIDHash, │ │ UTMSourceHash, │ │ UTMMediumHash, │ │ UTMCampaignHash, │ │ UTMContentHash, │ │ UTMTermHash, │ │ FromHash, │ │ WebVisorEnabled, │ │ WebVisorActivity, │ │ `ParsedParams.Key1`, │ │ `ParsedParams.Key2`, │ │ `ParsedParams.Key3`, │ │ `ParsedParams.Key4`, │ │ `ParsedParams.Key5`, │ │ `ParsedParams.ValueDouble`, │ │ `Market.Type`, │ │ `Market.GoalID`, │ │ `Market.OrderID`, │ │ `Market.OrderPrice`, │ │ `Market.PP`, │ │ `Market.DirectPlaceID`, │ │ `Market.DirectOrderID`, │ │ `Market.DirectBannerID`, │ │ `Market.GoodID`, │ │ `Market.GoodName`, │ │ `Market.GoodQuantity`, │ │ `Market.GoodPrice`, │ │ IslandID │ │ FROM datasets.visits_v1 │ │ LIMIT 3 BY VisitID │ │ LIMIT 10 │ └───────────────────────────────────────────┘ 185 rows in set. Elapsed: 0.004 sec.
8.删除重复的USING Key
例如下面的语句,重复的关联键 id 字段会被去重:
EXPLAIN SYNTAX SELECT a.UserID, a.UserID, b.VisitID, a.URL, b.UserID FROM datasets.hits_v1 AS a LEFT JOIN datasets.visits_v1 AS b USING (UserID, UserID) Query id: 83d2ea38-4d51-48fd-8674-618eb1379d33 ┌─explain──────────────────────────────────────────────┐ │ SELECT │ │ UserID, │ │ UserID, │ │ VisitID, │ │ URL, │ │ b.UserID │ │ FROM datasets.hits_v1 AS a │ │ ALL LEFT JOIN datasets.visits_v1 AS b USING (UserID) │ └──────────────────────────────────────────────────────┘ 8 rows in set. Elapsed: 0.003 sec.
9.标量替换
如果子查询只返回一行数据,在被引用的时候用标量替换,例如下面语句中的total_disk_usage 字段:
EXPLAIN SYNTAX WITH ( SELECT sum(bytes) FROM system.parts WHERE active ) AS total_disk_usage SELECT (sum(bytes) / total_disk_usage) * 100 AS table_disk_usage, table FROM system.parts GROUP BY table ORDER BY table_disk_usage DESC LIMIT 10 Query id: 499e49b3-62e8-4a26-afd0-e3936d8c43f7 ┌─explain─────────────────────────────────────────────────────────────────────────┐ │ WITH identity(_CAST(0, 'Nullable(UInt64)')) AS total_disk_usage │ │ SELECT │ │ (sum(bytes_on_disk AS bytes) / total_disk_usage) * 100 AS table_disk_usage, │ │ table │ │ FROM system.parts │ │ GROUP BY table │ │ ORDER BY table_disk_usage DESC │ │ LIMIT 10 │ └─────────────────────────────────────────────────────────────────────────────────┘ 8 rows in set. Elapsed: 0.003 sec.
10.三元运算优化
如果开启了 optimize_if_chain_to_multiif 参数,三元运算符会被替换成 multiIf 函数,例如:
EXPLAIN SYNTAX SELECT if(number = 1, 'hello', if(number = 2, 'world', 'fengzhen')) FROM numbers(10) Query id: 9601b406-349b-4af8-bccf-8ccfd62d8638 ┌─explain─────────────────────────────────────────────────────────────┐ │ SELECT if(number = 1, 'hello', if(number = 2, 'world', 'fengzhen')) │ │ FROM numbers(10) │ └─────────────────────────────────────────────────────────────────────┘ 2 rows in set. Elapsed: 0.001 sec.
开启multiif:
set optimize_if_chain_to_multiif = 1;
EXPLAIN SYNTAX SELECT if(number = 1, 'hello', if(number = 2, 'world', 'fengzhen')) FROM numbers(10) Query id: 36d98f1c-d0b1-4ecb-952f-1a7a8393d187 ┌─explain──────────────────────────────────────────────────────────────┐ │ SELECT multiIf(number = 1, 'hello', number = 2, 'world', 'fengzhen') │ │ FROM numbers(10) │ └──────────────────────────────────────────────────────────────────────┘ 2 rows in set. Elapsed: 0.001 sec.
【推荐】国内首个AI IDE,深度理解中文开发场景,立即下载体验Trae
【推荐】编程新体验,更懂你的AI,立即体验豆包MarsCode编程助手
【推荐】抖音旗下AI助手豆包,你的智能百科全书,全免费不限次数
【推荐】轻量又高性能的 SSH 工具 IShell:AI 加持,快人一步
· 分享4款.NET开源、免费、实用的商城系统
· 全程不用写代码,我用AI程序员写了一个飞机大战
· MongoDB 8.0这个新功能碉堡了,比商业数据库还牛
· 白话解读 Dapr 1.15:你的「微服务管家」又秀新绝活了
· 记一次.NET内存居高不下排查解决与启示