Hive 组件安装配置
下载和解压安装文件 基础环境和安装准备
- Hive组件的部署规划和软件包路径如下:
(1)当前环境中已安装 Hadoop全分布系统
(2)本地安装 MySQL数据库(账号 root,密码 Password123$),软件包在/opt/software/mysql-5.7.18路径下
(3)MySQL端口号(3306)
(4)MySQL的 JDBC驱动包/opt/software/mysql-connector-java-5.1.47.jar,在此基础上更新 Hive元数据存储
(5)Hive软件包/opt/software/apache-hive-2.0.0-bin.tar.gz - 解压安装文件
(1)使用 root用户,将 Hive安装包
/opt/software/apache-hive-2.0.0-bin.tar.gz路解压到/usr/local/src路径下
[root@master ~]# tar -zxvf /opt/software/apache-hive-2.0.0-bin.tar.gz -C /usr/local/src
(2)将解压后的 apache-hive-2.0.0-bin文件夹更名为 hive
[root@master ~]# mv /usr/local/src/apache-hive-2.0.0-bin usr/local/src/hive
(3)修改 hive目录归属用户和用户组为 hadoop
[root@master ~]# chown -R hadoop:hadoop /usr/local/src/hive
设置 Hive 环境
- 卸载 MariaDB 数据库
Hive元数据存储在 MySQL数据库中,因此在部署 Hive组件前需要首先在 Linux系统下安装 MySQL数据库,并进行 MySQL字符集、安全初始化、远程访问权限等相关配置。
需要使用 root用户登录,执行如下操作步骤:
(1)关闭 Linux系统防火墙,并将防火墙设定为系统开机并不自动启动。
关闭防火墙服务
[root@master ~]# systemctl stop firewalld
(2)设置防火墙服务开机不启动
[root@master ~]# systemctl disable firewalld
(3)卸载 Linux系统自带的 MariaDB。
首先查看 Linux系统中 MariaDB的安装情况
查询已安装的 mariadb软件包
[root@ master ~]# rpm -qa | grep mariadb
mariadb-libs-5.5.52-2.el7.x86_64
以上结果显示 Linux系统中已经按照了 mariadb-libs-5.5.52-2.el7.x86_64软件包
需要将其卸载。
(4)卸载 MariaDB软件包
[root@master ~]# rpm -e --nodeps mariadb-libs-5.5.56-2.el7.x86_64
安装 MySQL 数据库
按如下顺序依次按照 MySQL数据库的 mysql common、mysql libs、mysql client软件包
(1)MySQL软件包路径
yum -y install unzip
unzip mysql-5.7.18/
[root@master ~]# cd /opt/software/mysql-5.7.18/
[root@master ~]# rpm -ivh mysql-community-common-5.7.18-1.el7.x86_64.rpm
[root@master ~]# rpm -ivh mysql-community-libs-5.7.18-1.el7.x86_64.rpm
[root@master ~]# rpm -ivh mysql-community-client-5.7.18-1.el7.x86_64.rpm
(2)安装 mysql server软件包
[root@master ~]# rpm -ivh mysql-community-server-5.7.18-1.el7.x86_64.rpm
(3)修改 MySQL数据库配置,在/etc/my.cnf文件中添加如表 6-1所示的 MySQL数据库配置项
将以下配置信息添加到/etc/my.cnf文件 symbolic-links=0配置信息的下方
default-storage-engine=innodb
innodb_file_per_table
collation-server=utf8_general_ci
init-connect='SET NAMES utf8'
character-set-server=utf8
(4)启动 MySQL数据库
[root@master ~]# systemctl start mysqld
(5)查询 MySQL数据库状态。mysqld进程状态为 active (running),则表示 MySQL数据库正常运行
如果 mysqld 进程状态为 failed,则表示 MySQL 数据库启动异常。此时需要排查/etc/my.cnf文件
[root@master ~]# systemctl status mysqld
(6)查询 MySQL数据库默认密码
MySQL 数据库安装后的默认密码保存在/var/log/mysqld.log 文件中,在该文件中以password关键字搜索默认密码
[root@master ~]# cat /var/log/mysqld.log | grep password
2020-05-07T02:34:03.336724Z 1 [Note] A temporary password is generated for root@localhost: MPg5lhk4?>Ui
# 默 认 密 码 为:MPg5lhk4?>Ui
MySQL数据库是安装后随机生成的,所以每次安装后生成的默认密码不相同
(7)MySQL数据库初始化
执行mysql_secure_installation命令初始化 MySQL数据库,初始化过程中需要设定
数据库 root 用户登录密码,密码需符合安全规则,包括大小写字符、数字和特殊符号,
可设定密码为 Password123$
在进行 MySQL数据库初始化过程中会出现以下交互确认信息:
1)Change the password for root ? ((Press y|Y for Yes, any other key for No)
表示是否更改 root用户密码
2)Do you wish to continue with the password provided?(Press y|Y for Yes,any other key for No)
表示是否使用设定的密码继续,在键盘输入 y和回车。
3)Remove anonymous users? (Press y|Y for Yes, any other key for No)
表示是否删除匿名用户
4)Disallow root login remotely? (Press y|Y for Yes, any other key for No)
表示是否拒绝 root 用户远程登录,在键盘输入 n 和回车
5)Remove test database and access to it? (Press y|Y for Yes, any other key for No)
表示是否删除测试数据库
6)Reload privilege tables now? (Press y|Y for Yes, any other key for No)
表示是否重新加载授权表
#mysql_secure_installation命令执行过程如下:
[root@master ~]# mysql_secure_installation
Securing the MySQL server deployment.
Enter password for user root: # 输入/var/log/mysqld.log 文件中查询到的默认 root用户登录密码
The 'validate_password' plugin is installed on the server.
The subsequent steps will run with the existing configuration
of the plugin.
Using existing password for root.
Estimated strength of the password: 100
Change the password for root ? ((Press y|Y for Yes, any other
key for No) : y
New password: #输入新密码 Password123$
Re-enter new password: #再次输入新密码 Password123$
Estimated strength of the password: 100
Do you wish to continue with the password provided?(Press y|Y for Yes, any other key for No) : y #输入 y
By default, a MySQL installation has an anonymous user,
allowing anyone to log into MySQL without having to have
a user account created for them. This is intended only for
testing, and to make the installation go a bit smoother.
You should remove them before moving into a production
environment.
Remove anonymous users? (Press y|Y for Yes, any other key for No) : y #输入 y
Success.
Normally, root should only be allowed to connect from
'localhost'. This ensures that someone cannot guess at
the root password from the network.
Disallow root login remotely? (Press y|Y for Yes, any other key for No) : n #输入 n
... skipping.
By default, MySQL comes with a database named 'test' that
anyone can access. This is also intended only for testing,
and should be removed before moving into a production
environment.
Remove test database and access to it? (Press y|Y for Yes, any other key for No) : y #输入 y
- Dropping test database...
Success.
- Removing privileges on test database...
Success.
Reloading the privilege tables will ensure that all changes
made so far will take effect immediately.
Reload privilege tables now? (Press y|Y for Yes, any other key for No) : y #输入
Success.
All done!
(8)添加 root用户从本地和远程访问 MySQL数据库表单的授权
[root@master ~]# mysql -uroot -p
Enter password: #输入新设定的密码 Password123$
Welcome to the MySQL monitor. Commands end with ; or \g.
Your MySQL connection id is 20
Server version: 5.7.18 MySQL Community Server (GPL)
Copyright (c) 2000, 2017, Oracle and/or its affiliates. All
rights reserved.
Oracle is a registered trademark of Oracle Corporation and/or its
affiliates. Other names may be trademarks of their respective
owners.
Type 'help;' or '\h' for help. Type '\c' to clear the current
input statement.
#进入之后
mysql> grant all privileges on *.* to root@'localhost'
identified by 'Password123$'; # 添加 root用户本地访问授权
Query OK, 0 rows affected, 1 warning (0.01 sec)
mysql> grant all privileges on *.* to root@'%' identified by 'Password123$'; # 添加 root用户远程访问授权
Query OK, 0 rows affected, 1 warning (0.00 sec)
mysql> flush privileges; # 刷新授权 Query OK, 0 rows affected (0.00 sec)
mysql> select user,host from mysql.user where user='root'; # 查询 root用户授权情况
+------+-----------+
| user | host |
+------+-----------+
| root | % |
| root | localhost |
+------+-----------+
2 rows in set (0.00 sec)
mysql> exit; # 退出 MySQL数据库
Bye
配置 Hive 组件
(1)设置 Hive环境变量并使其生效。
在文件末尾追加以下配置内容
[root@master ~]# vi /etc/profile
# set hive environment
export HIVE_HOME=/usr/local/src/hive
export PATH=$PATH:$HIVE_HOME/bin
使环境变量配置生效
[root@master ~]# source /etc/profile
(2)修改 Hive组件配置文件
切换到 hadoop用户执行以下对 Hive组件的配置操作
将/usr/local/src/hive/conf 文件夹下 hive-default.xml.template 文件,更名为hive-site.xml
[root@master ~]# su - hadoop
[hadoop@master ~]$ cp /usr/local/src/hive/conf/hive-default.xml.template /usr/local/src/hive/conf/hive-site.xml
(3)通过 vi 编辑器修改 hive-site.xml 文件实现 Hive 连接 MySQL 数据库,并设定Hive临时文件存储路径
[hadoop@master ~]$ vi /usr/local/src/hive/conf/hive-site.xml
<?xml version="1.0"?>
<?xml-stylesheet type="text/xsl" href="configuration.xsl"?>
<configuration>
<!-- jdbc连接的URL -->
<property>
<name>javax.jdo.option.ConnectionURL</name>
<value>jdbc:mysql://hadoop001:3306/hive?useSSL=false</value>
</property>
<!-- jdbc连接的Driver-->
<property>
<name>javax.jdo.option.ConnectionDriverName</name>
<value>com.mysql.jdbc.Driver</value>
</property>
<!-- jdbc连接的username-->
<property>
<name>javax.jdo.option.ConnectionUserName</name>
<value>root</value>
</property>
<!-- jdbc连接的password -->
<property>
<name>javax.jdo.option.ConnectionPassword</name>
<value>123456</value>
</property>
<!-- Hive默认在HDFS的工作目录 -->
<property>
<name>hive.metastore.warehouse.dir</name>
<value>/user/hive/warehouse</value>
</property>
</configuration>
(4)在 Hive安装目录中创建临时文件夹 tmp
[hadoop@master ~]$ mkdir /usr/local/src/hive/tmp
至此,Hive组件安装和配置完成!
初始化 hive 元数据
#将 MySQL数据库驱动(/opt/software/mysql-connector-java-5.1.46.jar)拷贝到Hive安装目录的 lib下
[hadoop@master ~]$ cp /opt/software/mysql-connector-java-5.1.46.jar /usr/local/src/hive/lib/
#重新启动 hadooop即可
[hadoop@master lib]$ stop-all.sh
[hadoop@master lib]$ start-all.sh #初始化数据库
[hadoop@master ~]$ schematool -initSchema -dbType mysql
看到 schemaTool completed 即为成功!
#启动 hive
[hadoop@master ~]$ hive
进入hive即为成功!
语句示例:
--创建库
create database 库名;
--使用库
use 库名;
--查看库里面的表
show tables;
--创建库
--语法create database [if not exists] databaseName [comment '注释'];
create database if not exists log;
create database if not exists logtemp comment 'this is log database';
--查看数据库信息
desc database 库名;
--创建表
create table if not exists jiying20105.teacher(
id int,
name string
)
row format delimited fields terminated by '\t'
stored as textfile;
--row format delimited fields terminated by '\t'
--在表里面显示是以\t 的形式显示出来
--stored as textfile;
--此表存储个是为textfile text文件
--显示当前所在库
set hive.cli.print.current.db=true;
--删除表
--语法: drop table 表名;
drop table ying20105.student;
drop 和 delete 区别?
--删除库
drop database log;
--修改表
--关键字 alter
--修改表名
alter table 原表名 rename to 要修改的表名;
--查看表结构
desc 表名;
每日固定执行:
create database ji20105;
create table if not exists ji20105.student(
id int,
name string
)
row format delimited fields terminated by ''
stored as textfile;
--加载 本地 数据
load data local inpath '/home/demo/student' into table ji20105.student;
-------数据------
001 易孟滔
002 贺曼雄
003 文盛华
004 周楚钟
005 梁子敬
--查看数据
select * from student;
--加载 本地 数据 覆盖
load data local inpath '/home/demo/student' overwrite into table ji20105.student;
--从hdfs上面添加数据 并覆盖
load data inpath '/input' overwrite into table ji20105.student;
--创建表
--like 只创建表结构:类型、字段
create table if not exists teacher like student;
--as 创建表结构 并且 复制数据
create table if not exists teacher01 as select * from student;
--增加字段名 在最后添加
alter table student add columns(
sex string,
age int comment 'this is student age'
);
基本数据类型:
int 4byte有符号整数
bigint 8byte有符号整数
boolean 布尔类型,true或false
float 单精度浮点型
double 双精度浮点型
string 字符序列
复杂数据类型
struct 通过对元素的first来指定数据的元素
map <key,value>以键值对的形式存储
array 下标从0开始存储
分隔符:
\n 换行符
^A 分隔符与\t 类似
^B 用来分割array和struct的元素
^C 用来map中的键值对之间的分隔
hive元数据的存储:
存储在关系型数据库中,默认存储在derby,内嵌在hive里面
实际工作中我们使用最多的是mysql
因为我们可以将hive的元数据存储在mysql里面不放在derby数据库中。
--创建新的表
create table ji20105.score(
name string,
kemu1 string,
cj1 double,
kmeu2 string,
cj2 double
)
row format delimited fields terminated by ' '
stored as textfile;
--从hdfs上面导入数据
load data inpath '/input/score' into table ji20105.score;
--数据--
易孟滔 hadoop 59 mysql 94
贺曼雄 hadoop 76 mysql 79
文胜华 hadoop 74 mysql 47
周楚钟 hadoop 73 mysql 89
梁子敬 hadoop 67 mysql 69
唐勇 hadoop 63 mysql 84
黄凯 hadoop 86 mysql 46
颜超斌 hadoop 95 mysql 76
廖书明 hadoop 78 mysql 78
朱广林 hadoop 48 mysql 100
--查看数据
select * from ji20105.score;
--需求1:求出mysql成绩在80分以上的学生姓名
select name
from score
where kmeu2='mysql' and cj2>=80;
--and 和
--需求2:求出成绩总和前三名学生
select name,
sum(cj1+cj2) as a
from score
group by name
order by a desc
limit 3;
--但凡没有出现在函数里面的字段都要在group by里面
--------------------------------------join练习-------------------------------------------
a表:(id int,name string)
b表:(id int,job_id int,num int)
c表:(job_id int,job string)
--建表语句
--use hadoop2;
create table if not exists ji20105.a(
id int,
name string
)
row format delimited fields terminated by ','
lines terminated by '\n'
stored as textfile;
create table if not exists ji20105.b(
id int,
job_id int,
num int
)
row format delimited fields terminated by ','
lines terminated by '\n'
stored as textfile;
create table if not exists ji20105.c(
job_id int,
job string
)
row format delimited fields terminated by ','
lines terminated by '\n'
stored as textfile;
--加载数据
load data local inpath '/home/demo/a' into table ji20105.a;
load data local inpath '/home/demo/b' into table ji20105.b;
load data local inpath '/home/demo/c' into table ji20105.c;
--inner join 内连接 inner join
---- a.*,b.* * 代表所有,表里面所有的内容字段标签
select a.*,b.*
from a inner join b on a.id= b.id;
inner join on
--left join 左连接
select a.*,b.*
from a left join b on a.id=b.id;
select a.*,b.*
from b left join a on a.id=b.id;
--right join 右连接
select a.*,b.*
from a right join b on a.id=b.id;
--full join 全连接
select a.*,b.*
from a full join b on a.id=b.id;
--查出人名和工作
select t.name,t.job_id,c.job
from (select a.name,b.job_id
from hadoop2.a as a left join hadoop2.b as b
on a.id = b.id) as t left join hadoop2.c as c
on t.job_id = c.job_id;
a表:(id int,name string)
b表:(id int,job_id int,num int)
c表:(job_id int,job string)
--哪边是主表,连接哪边
select t.name,c.job
from(select a.name,b.job_id
from a left join b
on a.id=b.id)as t left join c
on t.job_id = c.job_id;
张3 开车
李四 开飞机
王武 开飞机
老张 NULL
select
a.name,
if(b.job_id is null,0,b.job_id),
if(c.job is not null,c.job,'无业')
from hadoop2.a as a
left join hadoop2.b as b on a.id = b.id
left join hadoop2.c as c on b.job_id=c.job_id;
select a.name,
if(b.job_id is null,'无业','有工作')
from a left join b
on a.id=b.id
left join c
on b.job_id=c.job_id;
--------------------------------
person:(name string,station int)
ticket:(station int,price int)
--建表
create table if not exists ji20105.person(
name string,
station int
)
row format delimited fields terminated by '\t'
lines terminated by '\n'
stored as textfile;
create table if not exists ji20105.ticket(
station int,
price int
)
row format delimited fields terminated by '\t'
lines terminated by '\n'
stored as textfile;
--加载数据
load data local inpath '/home/demo/person' into table ji20105.person;
load data local inpath '/home/demo/ticket' into table ji20105.ticket;
--按照站数算对应价格 --每个人车票多少钱
select p.name,t.price
from person as p left join ticket as t
on p.station=t.station;
-- 10站以上打8折,15站7折
-- 一个判断 使用if
select p.name,
if(p.station>10,if(p.station>15,t.price*0.7,t.price*0.8),t.price)
from person as p left join ticket as t
on p.station=t.station;
--5站到10站打9折、11站以上打8折
--多个判断 case when ... then ... else ... end
select p.name,
case when t.station>11 then t.price*0.8 else(
case when t.station>=5 then t.price*0.9 else t.price end
)
end
from person as p left join ticket as t
on p.station=t.station;
---------------------练习3--------------------
class:(name string,class string)
score:(name string,chinese int,math int)
--创建表
create table if not exists ji20105.class(
name string,
class string
)
row format delimited fields terminated by ','
lines terminated by '\n'
stored as textfile;
create table if not exists ji20105.score1(
name string,
chinese int,
math int
)
row format delimited fields terminated by ','
lines terminated by '\n'
stored as textfile;
--加载数据
load data local inpath '/home/demo/class' into table ji20105.class;
load data local inpath '/home/demo/score1' into table ji20105.score1;
问题:计算每个班的语文总成绩和数学总成绩
select class,sum(chinese),sum(math)
from class as c left join score1 as s
on c.name=s.name
group by class;
计算每个班的语文总成绩
select class,sum(chinese)
from class as c left join score1 as s
on c.name=s.name
group by class;
问题:计算每个班的语文总成绩和数学总成绩,要求有哪科低于60分,该名学生成绩不计入计算.
--and 和 并且
select class,sum(chinese),sum(math)
from class as c left join score1 as s
on c.name=s.name
where chinese>60 and math>60
group by class;
---------------4--------------
创建班级表 my_student(sno int,sname string,sex string,sage int,sdept string)
创建课程表 my_course(cno int,cname string)
创建成绩表 my_score(sno int,cno int,grade int)
create table if not exists ji20105.my_student(
sno int,
sname string,
sex string,
sage int,
sdept string
)
row format delimited fields terminated by ','
lines terminated by '\n'
stored as textfile;
create table if not exists ji20105.my_course(
cno int,
cname string
)
row format delimited fields terminated by ','
lines terminated by '\n'
stored as textfile;
create table if not exists ji20105.my_score(
sno int,
cno int,
grade int
)
row format delimited fields terminated by ','
lines terminated by '\n'
stored as textfile;
--加载数据
load data local inpath '/home/demo/my_student' into table ji20105.my_student;
load data local inpath '/home/demo/my_course' into table ji20105.my_course;
load data local inpath '/home/demo/my_score' into table ji20105.my_score;
问题1:查询全体学生的学号与姓名
select sno,sname
from my_student;
问题2:查询选修了课程的学生姓名
select sname
from my_student as s left join my_score as c
on s.sno=c.cno;
-- 使用 distinct去重 在select后面使用,distict后面跟要去重的字段
select distinct s1.sname
from ji20105.my_student as s1 left join ji20105.my_score s2
on s1.sno = s2.sno;
SELECT distinct sname FROM my_student;
问题3:查询学生的总人数 count
SELECT count(sname) FROM my_student;
问题4:计算1号课程的学生平均成绩
分析:所有学生的平均分数 要把所有学生分数加起来
select AVG(grade)
from my_score
where cno=1;
问题5:查询选修1号课程的最高分数和学生姓名
--进行order by 的字段要出现在select里面
select s2.sname,s1.grade
from my_score as s1 left join my_student as s2
on s1.sno=s2.sno
where cno=1
order by s1.grade desc
limit 1;
select cno,max(grade),min(grade)
from ji20105.my_score where cno=1 group by cno;
问题6:求各个课程号及相应的选课人数
select c.cno,count(s.sno)
from my_course as c left join my_score as s
on c.cno=s.cno
group by c.cno;
问题7:查询选修了4门以上的课程的学生学号
select sno
from ji20105.my_score
group by sno
having count(1) <> 3;
问题8:查询选修了3门以上的课程的学生学号 优化
------开窗函数
--所有课程成绩总排名
--row_number() 开窗函数
--select
--cno,
--grade,
--row_number() over(order by grade desc) as rank
--from ji20105.my_score;
count() 返回记录条数
sum() 求和
avg() 求平均值
min() 返回最小值
max() 返回最大值
-----排序-----
order by 全局排序,把所有的数据发送到一个阶段上
sort by 保证reduce中的数据有序,但不一定是全局有序,asc 正序,desc 倒序
distributed by 从map端以指定hash算法到reduce端按照指定字段分发数据
cluster by (不能指定排序规则,只能正序)==distributed by+ sort by
group by 是什么? 分组排序
distributed by order by 不能放在一起使用
----------------------------------分桶------------------------------------
--分桶意义
--1.提高查询效率。使用链接查询的两个字段进行过分桶。
--2.提高抽样效率
--分桶关键字:bucket
-----提示----
--分区字段的源数据不可以是中文,但分桶字段的源数据可以
--hive 怎么知道对那个字段分桶??分几个桶??
clustered by(id) into 4 buckets
---创建分桶表
--分桶字段必须出现在表定义的字段中
create table hivetable171104(
id int,
name string,
gender string
)
clustered by (gender) into 2 buckets
row format delimited fields terminated by '\t';