clickhouse数据库SQL关联查询优化一例
【背景说明】
业务反馈业务查询clickhouse报表视图报错:DB::Exception: Memory limit (total) exceeded: 12G
根据这个报错,业务SQL查询内存超过了限制,增加内存是个方法,如果多个会话查询占用内存的话还是会报错,同时业务查询也是超时,看看SQL层面能否优化
查看到业务多个SQL分组查询进行union all,并且格式都是差不多,以下是其中的一个,SQL部分屏蔽一些内容,使用客户端查询6分钟以上
select COUNT(DISTINCT v.INPATIENT_BO) BWZRC ,toYear(v.CURR_DATE) year1 ,toMonth(v.CURR_DATE) mounth1 FROM DWD.DWD_INPATIENT_xxx v,DWD.DWD_ITEM_xxx dis where v.INPATIENT_BO=dis.VISIT_ID and v.PAT_FLAG='0' and dis.CHARGE_ITEM_ID IN('99xxx','99xxx') and toYear(v.CURR_DATE) in ('2024') and toMonth(v.CURR_DATE)+1 in ('12') group by toYear(v.CURR_DATE),toMonth(v.CURR_DATE);
查看SQL的话,一般关联查询跟子查询是可以互相转换的,并且关联查询写法也修改
这个SQL中由于dis表数据过大,可以先查询后作为一个表来进行查询,效果明显,改写后如下:
SELECT COUNT(DISTINCT v.INPATIENT_BO) AS BWZRC, toYear(v.CURR_DATE) AS year1, toMonth(v.CURR_DATE) AS mounth1 FROM DWD.DWD_INPATIENT_xxx v,(SELECT VISIT_ID FROM DWD.DWD_ITEM_xxx WHERE CHARGE_ITEM_ID IN ('99xxx', '99xxx'))dis WHERE v.INPATIENT_BO=dis.VISIT_ID and v.PAT_FLAG='0' and toYear(v.CURR_DATE) in ('2024') and toMonth(v.CURR_DATE)+1 in ('12') group by toYear(v.CURR_DATE),toMonth(v.CURR_DATE);
改为子查询方式,也是非常快,毫秒级别
SELECT COUNT(DISTINCT v.INPATIENT_BO) AS BWZRC, toYear(v.CURR_DATE) AS year1, toMonth(v.CURR_DATE) AS mounth1 FROM DWD.DWD_INPATIENT_xxx v WHERE v.PAT_FLAG = '0' AND v.INPATIENT_BO IN ( SELECT dis.VISIT_ID FROM DWD.DWD_ITEM_xxx dis WHERE dis.CHARGE_ITEM_ID IN ('99xxx', '99xxx') ) AND toYear(v.CURR_DATE) = '2024' AND toMonth(v.CURR_DATE) = '12' GROUP BY toYear(v.CURR_DATE), toMonth(v.CURR_DATE);