数据核对与验证标准(一)
所有表从大方向上有
1.表的验证:指标临时表的验证和 指标合并表的验证
2.数据验证:数据总量的验证,数据质量的验证(连续指标和离散指标)
1.临时表的总量的验证 show partitions app.xxx_t_xxx; select * from app.xxx_t_xxx where batch_date='批次日期' limit100; select batch_date ,count(1) from app.xxx_t_xxx group by batch_date; 2.合并表的总量验证 show partitions app.xxx_r_xxx; select * from app.xxx_r_xxx where batch_date='批次日期' limit100; 3.中间表各个指标的统计 a)非空填充率,最大值,最小值 select batch_date ,指标 ,count(1) ,sum(if trim(指标)<>'' and 指标 is not null,1,0) ,max(指标) ,min(指标) from app.xxx_t_xxx app.xxx_t_xxx group by batch_date,指标 order by batch_date,指标 ; b)枚举类型的各个类型的数量分布 select batch_date,指标,count(1) from app.xxx_r_xxx group by batch_date ,指标 order by batch_date,指标; 4.合并表各个指标的统计 a)非空填充率,最大值,最小值 select 指标 ,count(1) ,sum(if trim(指标)<>'' and 指标 is not null,1,0) ,max(指标) ,min(指标) from app.xxx_t_xxx app.xxx_t_xxx group by 指标 order by 指标 ; b)枚举类型的各个类型的数量分布 select 指标,count(1) from app.xxx_r_xxx group by 指标 order by 指标;
大多数人都以为是才智成就了科学家,他们错了,是品格。---爱因斯坦