Mysql-Orace-Impala分组求和求占比Sql对比
实现效果,求各年龄段用户数及占比
1)oracle版本
1 with temp as 2 (select t.birth_type, sum(t.user_cnt) as user_cnt 3 from dw_yq.bst_rpt_yq_pag_ratio_m t 4 where t.nc_date >= '20200726' --时间过滤 5 and t.nc_date <= '20200726' 6 group by t.birth_type) 7 select t2.type_name, 8 nvl(t1.user_cnt, 0) as user_cnt, 9 round(ratio_to_report(nvl(t1.user_cnt, 0)) over() * 100, 4) as tkt_cnt_ratio 10 from temp t1, dim_user_birth_type t2 11 where t2.type_name = t1.birth_type(+) 12 order by t2.sort_flag
2)mysql版本
1 SELECT 2 t1.type_name AS cntname, 3 ifnull(t2.user_cnt, 0) AS cnt, 4 5 IF ( 6 ifnull(t3.all_user_cnt, 0) = 0, 7 0, 8 ifnull(t2.user_cnt, 0) / ifnull(t3.all_user_cnt, 0) * 100 9 ) AS ratio 10 FROM 11 dw_center.dim_user_birth_type t1 12 LEFT JOIN ( 13 SELECT 14 t.birth_type, 15 sum(t.user_cnt) AS user_cnt 16 FROM 17 dw_yq.bst_rpt_yq_pag_ratio_d t 18 WHERE 19 t.nc_date >= '20200726' 20 AND t.nc_date <= '20200726' 21 GROUP BY 22 t.birth_type 23 ) t2 ON t1.type_name = t2.birth_type 24 JOIN ( 25 SELECT 26 sum(t.user_cnt) AS all_user_cnt 27 FROM 28 dw_yq.bst_rpt_yq_pag_ratio_d t 29 WHERE 30 t.nc_date >= '20200726' 31 AND t.nc_date <= '20200726' 32 ) t3 33 ORDER BY 34 t1.sort_flag
3)Impala版本
with t1 as (select tp.birth_type, tp.user_cnt, sum(tp.user_cnt) over() as all_user_cnt from (select t.birth_type, sum(t.user_cnt) as user_cnt from dw_yq.bst_rpt_yq_pag_ratio_d t where t.dt_date >= '20200726' --时间过滤 and t.dt_date <= '20200726' group by t.birth_type) tp) select t2.type_name AS cntname, ifnull(t1.user_cnt, 0) AS cnt, IF(ifnull(t1.all_user_cnt, 0) = 0, 0, round(ifnull(t1.user_cnt, 0) / ifnull(t1.all_user_cnt, 0) * 100, 4)) AS ratio from dw_dim_center.dim_user_birth_type t2 left join t1 on t2.type_name = t1.birth_type
上面dim_user_birth_type表用于补充维度
根据上面代码看来,oracle版本的比较简短,支持with子句,ratio_to_report() over() 函数很方便的就求出占比;Impala也支持with子句,是的代码结构更加清晰,但是需要自己计算占比;mysql使用效果最差。
简单记录下.....