2021/10/23
1、将数据导入hive在hive进行数据的处理,对数据进行清洗将括号去掉;
导入表并删除括号;
创建test1和test2来分别存储只出不进和只进不出的企业;
Test1建表
create table test1(nsr_id String)
ROW format delimited fields terminated by ',' STORED AS TEXTFILE ;
在是纳税人表中但是出方的id没有他
insert into test1(nsr_id) select distinct nsr_id from nsrxx where nsr_id not in (select xf_id from zzsfp);
来判断出不出的;
建立test2
create table test2(nsr_id String)
ROW format delimited fields terminated by ',' STORED AS TEXTFILE ;
在是纳税人表中但是入方的id没有他
insert into test2(nsr_id) select distinct nsr_id from nsrxx where nsr_id not in (select gf_id from zzsfp);
判断出不入的:
将两个表整合,统计出只进不出和只出不进
insert into data(nsr_id) select distinct nsr_id from yc3 where nsr_id not in (select nsr_id from yc2);
存放在data将test1和test2进行关联