hive_1218

启动hadoop集群
```
1 start-dfs.sh
2 start-yarn.sh
3 jps
```
查找hive压缩包
```
1 find / -name *hive*.tar.gz
```

解压hive压缩包到/opt目录下

1 tar -zxvf /root/experiment/file/apache-hive-2.1.1-bin.tar.gz -C /opt/
2 # 查看是否解压成功
3 ls /opt/
4 # 重命名apache-hive-2.1.1-bin为hive
5 mv apache-hive-2.1.1-bin hive
6 # 查看是否重命名成功
7 ls /opt/

在profile中配置hive环境变量

1 vi /etc/profile
2 # 在HADOOP环境变量下方添加HIVE环境变量
3 export HIVE_HOME=/opt/hive
4 export PATH=$HIVE_HOME/bin:$PATH
5 # 使profile文件配置生效
6 source /etc/profile
7 # 查看所有环境变量，是否有/opt/hive/bin
8 echo $PATH

本地模式安装

进入Mysql

1 mysql
2 # 创建数据库
3 create database hive;
4 # 查看是否创建成功
5 show databases;

Mysql授权

 1 grant all privileges on *.* to 'root'@'master' identified by 'root';
 2 grant all privileges on *.* to 'root'@'%' identified by 'root';
 3 # 刷新系统权限相关表
 4 flush privileges;
 5 # 查看权限
 6 show databases;
 7 use mysql;
 8 show tables;
 9 desc user;
10 select Host,User,Super_priv from user;
11 # 退出Mysql
12 quit();

拷贝Hive需要的mysql依赖包mysql-connector-java-5.1.42.jar 至hive/lib目录下

1 # 查找jar包的位置
2 find / -name mysql*.jar
3 # 拷贝jar包
4 cp /root/experiment/file/mysql-connector-java-5.1.42.jar /opt/hive/lib
5 # 查看/opt/hive/lib目录是否有
6 ls /opt/hive/lib

进入hive的conf目录下，配置hive相关配置文件参数

1 cd /opt/hive/conf
2 # 查看conf目录下内容
3 ls

hive-site.xml

 1 cp hive-default.xml.template hive-site.xml
 2 # 查看是否生成hive-site.xml
 3 ls
 4 # 配置hive-site.xml文件
 5 vi hive-site.xml
 6 # 查找ConnectionURL
 7 :?ConnectionURL
 8 # 显示行号
 9 :set nu
10 # 删除无关内容
11 :18,498d
12 :21,25d
13 :22,4862d
14 # 取消显示行号
15 :set nonu

配置文件

 1 <configuration>
 2     <property>
 3         <name>javax.jdo.option.ConnectionURL</name>
 4         <value>jdbc:mysql://master:3306/hive</value>
 5     </property>
 6     <property>
 7         <name>javax.jdo.option.ConnectionDriverName</name>
 8         <value>com.mysql.jdbc.Driver</value>
 9     </property>
10     <property>
11         <name>javax.jdo.option.ConnectionUserName</name>
12         <value>root</value>
13     </property>
14     <property>
15         <name>javax.jdo.option.ConnectionPassword</name>
16         <value>root</value>
17     </property>
18 </configuration>

初始化数据库

1 schematool -dbType mysql -initSchema
2 hive
3 show tables;
4 show functions;

操作Hive

在Mysql中查看TBLS表来查看Hive是否建表 Hive中创建的表的字段信息会自动存入到MySQL的hive数据库COLUMNS_V2表中

一、准备数据

vim stu.txt

二、启动集群

start-dfs.sh
start-yarn.sh
jps

三、启动HIVE

hive

Hive中创建表

1 create table mytable(id int,name string,gender string,age int) row format delimited fields terminated by '\t';

将准备好的数据导入表mytable中

1 load data local inpath '/root/stu.txt' overwrite into table mytable;

查看数据是否上传成功

1 select * from mytable;

在HDFS中查看文件信息
Hive会为每个数据库创建一个目录，数据库中的表以数据库的子目录形式存储
默认存储位置可以在hive-site.xml里设置 hive.metastore.warehouse.dir 来更改

1 dfs -ls -R /data/hive/warehouse;
2 dfs -cat /data/hive/warehouse/mytable/stu.txt;

　　查询mytable表的记录

1 select * from mytable;

　　使用Web端查看HDFS中的目录与文件

　　在指定位置（/自己姓名字母缩写）创建表t1

1 create table t1(id int,name string) location '/自己姓名字母缩写/t1';

　　验证创建操作

1 dfs -ls -R /自己姓名字母缩写;

　　创建表t2的同时加载数据，数据间隔用逗号

1 create table t2 row format delimited fields terminated by ',' as select * from mytable;

　　用dfs –cat进行验证

1 dfs -cat /data/hive/warehouse/t2/*;

2. Hive中创建分区表

创建分区tb_part（分区字段gender string） -

1 create table tb_part(id int,name string,age int) partitioned by(gender string) row format delimited fields terminated by ',';

显示分区表信息，验证是否创建了分区字段

1 desc tb_part;

将mytable表中gender值为male的记录插入分区 gender='M’中

1 insert overwrite table tb_part partition(gender='M') select id,name,age from mytable where gender='male';

查询tb_part表

1 select * from tb_part;

查看HDFS上的文件

1 dfs -ls -R /data/hive/warehouse/tb_part;
2 dfs -cat /data/hive/warehouse/tb_part/gender=M/*;

--------------------------------------------------------------------------------------------------------------------

创建mytable

通过文本文件导入mytable

查看导入是否成功

1.将结果保存的创建的新表中

1 create  table result as select avg(salary) from mytable;

2.将结果导出到指定路径下（文件夹）

1 insert overwrite local directory '/root/res' select avg(salary) from mytable;

3.将结果保存到集群上

1 insert overwrite directory '/sjw/out' select avg(salary) from mytable;

1 create table member(id int,name string,gender int,age int) row format delimited fields terminated by '\t';
2 
3 load data local inpath '/root/member.txt' overwrite into table member;
4 
5 select * from member;
6 
7 insert overwrite directory '/sjw' row format delimited fields terminated by ',' select * from member order by age desc limit 3;
8 
9 dfs -cat /sjw/*;

Hive词频统计

数据准备

在/root目录下新建word.txt
输入几行单词，单词之间以空格隔开
Hello Hadoop
Hello HDFS
Hello MapReduce
Hello Hive
Hello HBase
Hello Pig

在Hive中创建表text(line string)

将word.txt加载到表text中

load data local inpath '/root/word.txt' overwrite into table text;

查看text表

select * from text;

词频统计

对于这个text表，我们如何将其中的每行的单词进行统计呢？
由于一行文本有多个单词，所以我们需要将每行的文本切割成单个的单词，这里我们需要使用split函数：

select split(line,' ') from text;

每行文本已经被切割开来，但是得到的是数组类型，并不是Hive能直接通过group by处理的形式，所以我们需要使用Hive的另一个高级函数explode。
explode函数的功能是行转列（俗称炸裂），也就是说将上面我们得到的数组中的每个元素生成一行。

select explode(split(line,' ')) as word from text;

使用group by来对炸裂开来的数据进行统计。
将上面得到的结果作为另一张表t(子查询)，然后对这张表进行统计。

select t.word,count(*) from (select explode(split(line,' '))as word from text) as t group by t.word;

将所有单词按照降序排列，同时输出最高频次的三个单词

select t.word,count(*) as c from (select explode(split(line,' '))as word from text) as t group by t.word order by c desc limit 3;

将查询结果存入另一张表wc中。

create table wc as select t.word,count(*) c from (select explode(split(line,' '))as word from text) as t group by t.word order by c desc limit 3;

查看wc表

本实验介绍了如何通过hive实现单词统计，旨在加深了解Hive这个基于HDFS的数据仓库。

posted @ 2020-12-18 12:01 小石小石摩西摩西阅读(101) 评论(0) 编辑收藏举报

刷新页面返回顶部

小石小石摩西摩西

欢迎来到我的菜园子！！!

hive_1218

本地模式安装

Hive词频统计

数据准备

公告