Hive实现WordCount的两种方式

1.数据准备

hadoop hadoop hadoop
spark spark spark
hive
impala

2.创建表

create table wordcount(rowdata string);

3.加载数据

load data local inpath '/opt/modules/data/word.txt' into table wordcount;

4.SQL编程实现WordCount

4.1explode函数

作用:为每个输入行生成零个或多个输出行,即行转列

例子:

select explode(split(rowdata," ")) word from wordcount;

--结果输出
hadoop
hadoop
hadoop
spark
spark
spark
hive
impala

4.2Lateral View 的使用

作用:Lateral View一般与用户自定义表生成函数(如explode())结合使用,Lateral View 首先将UDTF应用于基表的每一行,然后将结果输出行连接到输入行,以形成具有提供的表别名的虚拟表。

例子:

select 
     word 
from wordcount lateral view explode(split(rowdata," ")) alaistable as  word;
--结果输出
hadoop
hadoop
hadoop
spark
spark
spark
hive
impala

4.3实现WordCount

--方式1:
select 
     word ,
     count(1) cnt 
from wordcount lateral view explode(split(rowdata," ")) alaistable as  word
group by word 
order by cnt ;
--结果:
impala  1
hive    1
spark   3
hadoop  3


--方式2:
select 
     word ,
     count(1) cnt 
from 
    (select
        explode(split(rowdata," ")) word 
     from 
      wordcount) vocu 
group by word
order by cnt;
--结果:
impala  1
hive    1
spark   3
hadoop  3

 

posted @ 2019-03-21 10:08  大数据技术与数仓  阅读(388)  评论(0编辑  收藏  举报