程序源宝宝

  博客园  :: 首页  :: 新随笔  :: 联系 :: 订阅 订阅  :: 管理

Hive应用部分:
准备工作:
删除bnyw库
1、创建员工表:
create table if not exists emp(eno int,ename string,eage int,bithday date,sal double,com double,gender string,dno int) row format delimited fields terminated by '\t';
2、创建部门表:
create table dept(dno int,dname string,loc string) row format delimited fields terminated by '\t';
a、根据部门id和性别
hive> select dno,gender,count(1) from emp group by dno,gender;
b、根据部门id和性别,然后根据人数降序排列
hive> select dno,gender,count(1) renshu from emp group by dno,gender order by renshu desc;
c、多列排序
hive> select eno,ename,sal,com from emp order by sal desc,com desc;
d、多表连接与子查询
hive> select e.*,d.* from emp e ,dept d where e.dno=d.dno;(sql92语法)
hive> select e.*,d.* from emp e inner join dept d on e.dno=d.dno;(sql99语法)
hive> select d.dno avg(sal) avgsal from emp e inner join dept d on e.dno=d.dno where eage>20 group by dno order by avgsal;
子查询:
select d.dname,avgsal from (select d.dno,avg(sal) avgsal from emp e inner join dept d on e.dno=d.dno where eage>20 group by d.dno order by avgsal) mid,dept d where mid.dno=d.dno;
分页查询:
hive> select row_number() over(),e.* from emp e;
hive> select row_number() over(order by sal desc),e.* from emp e;
select * from (select row_number() over(order by sal desc) seq,e.* from emp e) mid where mid.seq>5 and mid.seq<11;

Hive分区
A、静态分区表操作:
---------
待续。。。
./hive -S -e "create table if not exists bnyw.pemp(eno int,ename string,eage int,gender string,sal int,com int) partitioned by (dno int) row format delimited fields terminated by '\t'"
创建分区表
hive> create table if not exists t_user(uid int,uname string,uage int) partitioned by (dno int) row format delimited fields terminated by '\s+';
1、建分区下的表(dno=1、dno=2、dno=3、dno=4)
hive> alter table t_user add partition(dno=1);---->添加分区

2、分区上传用put方式到dno=1区下:
[hadoop@CloudDeskTop usergroup]$ hdfs dfs -put bak01 /user/hive/warehouse/bunfly.db/t_user/dno=1/
3、分区上传用load方式到dno=2区下:
hive> load data local inpath '/home/hadoop/test/usergroup/bak02' into table t_user partition(dno=2);
4、分区上传用insert方式到dno=3区下:
hive> insert into t_user partition(dno=3) values(3,'zhaoyun',34);
从其他表中取数据放入到分区dno=3区下:
hive> insert into t_user partition(dno=3) select eno,ename,eage from emp where eno<4;
hive> insert overwrite table t_user partition(dno=4) select eno,ename,eage where eno>4 and eno<9;

B、动态分区表操作(自动根据给的字段放入对应的分区中):
设置动态分区配置:
----------
1、当前设置:
hive> set hive.exec.dynamic.partition.mode=nonstrict;
2、永久设置:
进入hive-site.xml
修改hive.exec.dynamic.partition.mode的name值改为nonstrict;

创建分区表与静态一致!
truncate table t_user--->清空表内容
alter table t_user drop partition(dno=1);--->删除分区
动态上传不支持put、load。只支持insert三种语句:
1、hive> insert into t_user partition(dno) values(1,'zhaoyung',35,1);--->可自动创建没有的分区
2、insert into t_user partition(dno) select eno,ename,eage,dno from emp;---->可将含有分区字段的表插入到新的表中
3、insert overwrite table t_user partition(dno) select eno,ename,eage,dno from emp;

C、 多级分区:
-------------
1、创建多级分区表:
hive> create table if not exists t_user(uid int,uname string,uage int) partitioned by (dno int,gender string) row format delimited fields terminated by '\s+';

2、多级分区数据导入:
insert into t_user partition(dno,gender) select eno,ename,eage,dno,gender from emp;
------------

1、创建外部表:----->在其他路径下的数据表放入到hive表中
hive> create external table t_user(uid int,uname string,uage int) row format delimited fields terminated by '\t' location '/hive/user/';

2、创建含有数组类型的表(collection items terminated by ','定义数组元素之间的间隔符):
create table if not exists tuser(id bigint,name string,loves array<string>) row format delimited fields terminated by '\t' collection items terminated by ',';

3、创建含有map对象的表(map keys terminated by ':' 定义key与value值之间的链接符,在解析文本时需要):
create table if not exists tuser(id bigint,name string,info map<string,string>) row format delimited fields terminated by '\t' collection items terminated by ',' map keys terminated by ':';

4、复合类型创建表:
create table zuser(id bigint,name string,loves array<string>,info map<String,double>) row format delimited fields terminated by '\t' collection items terminated by ',' map keys terminated by ':';
hive> select * from zuser where info.age=23;

使用MySql存储Hive元数据:
上传MySql安装包,解压安装---sh install.sh
--------
[root@CloudDeskTop sbin]# ./mysqld start
[root@CloudDeskTop sbin]# lsof -i:3306

注:切换到hadoop用户下操作一下内容

将mysql驱动jar包放入hive环境下:
[hadoop@CloudDeskTop ~]$ cd /software/hive-1.2.2/lib/
【mysql-connector-java-3.0.17-ga-bin.jar】

配置环境:
[hadoop@CloudDeskTop lib]$ vi /software/hive-1.2.2/conf/hive-site.xml
<property>
<name>javax.jdo.option.ConnectionURL</name>
<value>jdbc:mysql://192.168.29.134:3306/hive?createDatabaseIfNotExist=true&amp;characterEncoding=utf8
</value>
<description>JDBC connect string for a JDBC metastore</description>
</property>

<property>
<name>javax.jdo.option.ConnectionDriverName</name>
<value>com.mysql.jdbc.Driver</value>
<description>Driver class name for a JDBC metastore</description>
</property>

<property>
<name>javax.jdo.option.ConnectionUserName</name>
<value>root</value>
<description>Username to use against metastore database</description>
</property>

<property>
<name>javax.jdo.option.ConnectionPassword</name>
<value>123456</value>
<description>password to use against metastore database</description>
</property>

[root@CloudDeskTop lib]# vi /etc/my.cnf
binlog_format=row
[root@CloudDeskTop sbin]# ./mysqld restart
创建数据库测试:
[root@CloudDeskTop bin]# ./mysql -uroot -p123456 -e "drop database if exists hive;create database hive character set latin1"

数据移植小结:
A、从本地到Hive表:
使用HDFS的put命令上传数据文件
使用Hive的load data local inpath句法导入数据文件
B、从Hive表到Hive表
使用HDFS的cp命令实现数据文件拷贝
使用普通的insert into句法插入单条记录数据
使用insert....select...from...句法实现批量条件数据拷贝
使用insert overwrite table....select....句法实现数据拷贝
C、从Hive表到本地
使用HDFS的get命令下载数据文件
使用insert overwrite local directory句法实现Hive表批量条件数据导出
使用输出定向符(>或>>)直接通过标准输出流将select查询结果其写入本地文件

 

posted on 2017-12-27 08:24  程序源宝宝  阅读(160)  评论(0编辑  收藏  举报