数据清洗
根据标准维度将地域维度字段清洗完成。
(1)删除表的第一行
alter table diyu set TBLPROPERTIES ('skip.header.line.count'='1');
(2)创建表aa_2019存放地域维度清洗完的数据:
create table aa_19(
ID String, QA04 String, QA05 String,
QA07 String, QA15 String, QA19 String,
Hangye String, QB03 String, QB03ONE String,
QB03TWO String, QB03_1 String, QB06 String,
QB16 String, QB16V String, Gaoxin String,
QB16_1 String, QB16_1V String, QC02 String,
QC05_0 String, QC24 String, QC40 String,
QD01 String, QD28 String, QJ09 String,
QJ20 String, QJ55 String, QJ74 String,
Diyu String, SYEAR String
)ROW format delimited fields terminated by ',' STORED AS TEXTFILE;
(3)清洗数据:
insert into table aa_19 select aa_2019.ID as ID , aa_2019.QA04 as QA04, aa_2019.QA05 as QA05, aa_2019.QA07 as QA07, aa_2019.QA15 as QA15, aa_2019.QA19 as QA19, aa_2019.Hangye as Hangye, aa_2019.QB03 as QB03,aa_2019.QB03ONE as QB03ONE, aa_2019.QB03TWO as QB03TWO, aa_2019.QB03_1 as QB03_1, aa_2019.QB06 as QB06, aa_2019.QB16 as QB16, aa_2019.QB16V as QB16V, aa_2019.Gaoxin as Gaoxin, aa_2019.QB16_1 as QB16_1, aa_2019.QB16_1V as QB16_1V, aa_2019.QC02 as QC02, aa_2019.QC05_0 as QC05_0, aa_2019.QC24 as QC24, aa_2019.QC40 as QC40, aa_2019.QD01 as QD01, aa_2019.QD28 as QD28, aa_2019.QJ09 as QJ09, aa_2019.QJ20 as QJ20, aa_2019.QJ55 as QJ55, aa_2019.QJ74 as QJ74, concat(aa_2019.QA19,diyu.dmms) as Diyu, aa_2019.SYEAR as SYEAR from aa_2019 join diyu on (aa_2019.QA19 =diyu.dm)
(4)清洗结果:
select * from table aa_19 limit 10;
【推荐】国内首个AI IDE,深度理解中文开发场景,立即下载体验Trae
【推荐】编程新体验,更懂你的AI,立即体验豆包MarsCode编程助手
【推荐】抖音旗下AI助手豆包,你的智能百科全书,全免费不限次数
【推荐】轻量又高性能的 SSH 工具 IShell:AI 加持,快人一步
· TypeScript + Deepseek 打造卜卦网站:技术与玄学的结合
· Manus的开源复刻OpenManus初探
· 三行代码完成国际化适配,妙~啊~
· .NET Core 中如何实现缓存的预热?
· 如何调用 DeepSeek 的自然语言处理 API 接口并集成到在线客服系统