hive 数据清理--数据去重


hive> select * from (select *,row_number() over (partition by id) num from t_link) t where t.num=1;

  

保留crt_time最新的一个数据

select * from (select *,row_number() over (partition by id order by crt_time desc) num from t_link) t where t.num=1;

将查询的去重数据保存到新表t_link2中,新表比源表t_link多一列

insert overwrite table t_link2 select * from (select *,row_number() over (partition by id order by crt_time desc) num from t_link) t where t.num=1;

  

posted @ 2016-11-23 22:05  OnTheWay_duking  阅读(13492)  评论(0编辑  收藏  举报