数据追溯

数据追溯:

数仓需要追溯,就是看以前的历史变化,比如一个月前的某一天的状态.

比如回溯2018-05-12
--query ".....where updated_time>=2018-05-12 00:00:00" ---->stage.tmp_a
#方法一,分区

每天保留一个快照.
insert overwirite table a partition (dt='2018-05-12')
select
coalesce(tb.id,ta.id) as id,
coalesce(tb.name,ta.name) as name,
.....
.....
from
(select * from a where dt='2018-05-11') as ta full join stage.tmp_a as tb on ta.id=tb.id
#方法二,全表

tmp表是昨天发生变化的,插入前天的并覆盖.所以只保存一张表,但要看历史的变化,不方便追溯.
insert overwirite table a
select
coalesce(tb.id,ta.id) as id,
coalesce(tb.name,ta.name) as name,
.....
.....
from
a as ta full join stage.tmp_a as tb on ta.id=tb.id

posted on 2020-01-31 11:35  锋锋2019  阅读(556)  评论(0编辑  收藏  举报

导航