HIVE增量同步方案2
SELECT student_temp.id ,coalesce(student_temp.age,student.age) as age ,student_temp.name ,coalesce(student_temp.dt,student.dt) as dt FROM student_temp FULL OUTER JOIN student ON student_temp.id = student.id ; ———————————————— 版权声明:本文为CSDN博主「混子风闲」的原创文章,遵循CC 4.0 BY-SA版权协议,转载请附上原文出处链接及本声明。 原文链接:https://blog.csdn.net/weixin_46905895/art
TMP表是增量表 , 增量表作为主表 ,full outer join 大表, 会返回两个表所有数据,两个表中其中一个表不存在的数据用null代替
获取数据的时候用
coalesce 函数 获取 temp字段值, 为空的话获取 b表数据。
COALESCE 案例:
方法三
先将 base_table 表和 incremental_table 表 left join,将未修改的数据覆盖写到 base_table 表,再将修改的数据插入到 base_table 表。
hive> select * from base_table;
OK
1 lijie chongqing 20191020
2 zhangshan sz 20191020
3 lisi shanghai 20191020
4 wangwu usa 20191020
hive> select * from incremental_table;
OK
1 lijie chongqing 20191020
2 zhangshan sz 20191020
3 lisi shanghai 20191020
4 wangwu usa 20191020
1 lijie chengdu 20191021
2 zhangshan huoxing 20191021
4 wangwu lalalala 20191021
5 xinzeng hehe 20191021
insert overwrite table base_table
select a.id,
a.name,
a.addr,
a.dt
from base_table a
left join (select * from incremental_table where dt='20191021') b
on a.id=b.id
where b.id is null
union all
select c.id,
c.name,
c.addr,
c.dt
from (select * from incremental_table where dt='20191021') c;
hive> select * from base_table;
OK
3 lisi shanghai 20191020
1 lijie chengdu 20191021
2 zhangshan huoxing 20191021
4 wangwu lalalala 20191021
5 xinzeng hehe 20191021