Hive实现WordCount的两种方式
1.数据准备
hadoop hadoop hadoop
spark spark spark
hive
impala
2.创建表
create table wordcount(rowdata string);
3.加载数据
load data local inpath '/opt/modules/data/word.txt' into table wordcount;
4.SQL编程实现WordCount
4.1explode函数
作用:为每个输入行生成零个或多个输出行,即行转列
例子:
select explode(split(rowdata," ")) word from wordcount;
--结果输出
hadoop
hadoop
hadoop
spark
spark
spark
hive
impala
4.2Lateral View 的使用
作用:Lateral View一般与用户自定义表生成函数(如explode())结合使用,Lateral View 首先将UDTF应用于基表的每一行,然后将结果输出行连接到输入行,以形成具有提供的表别名的虚拟表。
例子:
select
word
from wordcount lateral view explode(split(rowdata," ")) alaistable as word;
--结果输出
hadoop
hadoop
hadoop
spark
spark
spark
hive
impala
4.3实现WordCount
--方式1:
select
word ,
count(1) cnt
from wordcount lateral view explode(split(rowdata," ")) alaistable as word
group by word
order by cnt ;
--结果:
impala 1
hive 1
spark 3
hadoop 3
--方式2:
select
word ,
count(1) cnt
from
(select
explode(split(rowdata," ")) word
from
wordcount) vocu
group by word
order by cnt;
--结果:
impala 1
hive 1
spark 3
hadoop 3
公众号「大数据技术与数仓」
专注分享数据仓库与大数据技术