hive函数总结及应用场景
1. COALESCE
返回参数中的第一个非空值;如果所有值都为NULL,那么返回NULL
用途:
如果该表为商品表(维度表),以flink解析mysql binlog同步数据过来,需要将昨日的存量数据tb_sku_dim_all,今日增量tb_sku_dim_incre结合起来。
1)首先会对增量表tb_sku_dim_incre按skuOID进行分组,取最近时间的那条数据;
2)然后再与历史表tb_sku_dim_all,进行full outer join,如果增量表中有对应数据,则先使用增量表中的字段,没有则使用全量表中的字段,即COALESCE(t2.updateDT, t1.updateDT),举例如下:
SELECT COALESCE( t2.oid, t1.oid ) AS oid, COALESCE( t2.skuName, t1.skuName ) AS skuName, COALESCE( t2.stock, t1.stock ) AS stock, COALESCE( t2.cstatus, t1.cstatus ) AS cstatus, COALESCE( t2.updateDT, t1.updateDT ) AS updateDT FROM ods.tb_sku_dim_all t1 FULL OUTER JOIN ( SELECT oid, skuName, stock, cstatus, updateDT FROM ( SELECT oid, skuName, stock, cstatus, updateDT, row_number ( ) over ( PARTITION BY id ORDER BY event_time DESC ) AS rank FROM tb_sku_dim_incre WHERE dt = '20211209' ) temp WHERE rank = 1 ) t2 ON t1.oid = t2.oid;