maxcompute mapjoin

如果要join两张表,其中一张很大,另外一张很小,我们应该broadcast小表,把小表广播到各个节点。

 

还有另外一种,使用Cartesian product笛卡尔积,也就是直接join 不on

那么必须使用map join 

 

举例,如果要计算会员数量及所占比例,需要除以会员总数,而这个总数我们就需要进行广播


with
customer as
(select userid, customeruid from dwd_store_member_ticket_di where pt>=20210221 and customeruid is not null and customeruid!='0' group by userid,customeruid),
customer_cnt as
(select COUNT(userid) as cnt_total from customer)
SELECT
/*+ mapjoin(d) */
c.*,round(c.cnt/d.cnt_total,4) as ratio FROM
(select b.parentuserid,b.parent_account,b.parent_company,COUNT(a.userid) as cnt
FROM
customer a
INNER JOIN
(select userid, parentuserid, parent_account, parent_company from dim_usergroup_df where pt='20210221') b
on a.userid=b.userid
group by b.parentuserid,b.parent_account,b.parent_company
having COUNT(a.userid)>300
) c
left join
customer_cnt d
order by c.cnt desc
;
 
只要在select之后加上/*+ mapjoin(d) */, 不用逗号
 
left join 左边必须是大表,d是小表
posted @ 2021-02-22 15:31  yjy888  阅读(413)  评论(0编辑  收藏  举报