数据追溯
数据追溯:
数仓需要追溯,就是看以前的历史变化,比如一个月前的某一天的状态.
比如回溯2018-05-12
--query ".....where updated_time>=2018-05-12 00:00:00" ---->stage.tmp_a
#方法一,分区
每天保留一个快照.
insert overwirite table a partition (dt='2018-05-12')
select
coalesce(tb.id,ta.id) as id,
coalesce(tb.name,ta.name) as name,
.....
.....
from
(select * from a where dt='2018-05-11') as ta full join stage.tmp_a as tb on ta.id=tb.id
#方法二,全表
tmp表是昨天发生变化的,插入前天的并覆盖.所以只保存一张表,但要看历史的变化,不方便追溯.
insert overwirite table a
select
coalesce(tb.id,ta.id) as id,
coalesce(tb.name,ta.name) as name,
.....
.....
from
a as ta full join stage.tmp_a as tb on ta.id=tb.id
有些人,自甘堕落;有些人,即便身处深渊,也依旧笑容灿烂。