Hadoop-Hive
1、配置
1)解压到/opt/moduels
2)配置HIVE_HOME
3)配置HADOOP_HOME和HIVE_CONF_DIR到hive-env.sh
4)在HDFS文件系统上创建HIVE元数据存储目录并赋予权限
5)bin/hive -->使用sql语句
2、安装MySQL并配置
1)unzip mysql
2)rpm -e --nodeps mysql
3)rpm -ivh mysql-server
4)cat /root/.mysql_secret
5)rpm -ivh mysql-client
6)mysql -uroot -p[password]
7)set password=password('123456');
8)update user set Host='%'-> where User='root' and Host = 'localhost';
9)flush privileges;
10)tar -zxvf mysql-connector
11) cp mysql-connector-java /opt/moduels/hive/lib
12)配置hive-site.xml中URL、DrvierName、UserName、Password -->端口号3306,DataBase=metastore
-->https://cwiki.apache.org/confluence/display/Hive/AdminManual+MetastoreAdmin
3、hive基本操作
1)列分隔符
ROW FORMAT DELIMITED FIELDS TERMINATED BY '\t';
2)加载本地数据
load data local inpath '/opt/datas/student.txt' (overwrite) into table student;
3)desc formated(extended) student;
4)show functions; -->desc function(extended) substring;
5)数据的清除 truncate table table_name [partition parition_spec];
4、一些配置
1)配置client.header和client.currentdb来显示当前数据库
2)日志文件配置
3)set; -->查看配置信息 -->set hive.root.logger=INFO,console;设置日志信息打印在控制台
4)常用交互式命令 bin/hive -help(-i,-f,-e)
5、创建表的三种方式
1)create table test01(ip string comment '...',user string)
comment 'access log'
row format delimited fields terminated by ' '
stored as textfile
location '/user/hive/warehouse/logs'
2)create table test02
as select ip,user from test01;
3)create table test03
like test01;
6、Hive的数据类型
1)table ,load E
2)select,python T
3)sub table L
7、Hive中表的类型
1)管理表 --> 删除表时,会删除表数据以及元数据
2)托管表(外部表,external) --> 删除表时,只会删除元数据而不会删除表数据
3)分区表(partitioned tables) --> 查询时可以通过where子句来指定分区
create table dept_partition(deptno int,dname string,loc string)
partitioned by(event_month string[,event_day string]) -->二级分区
row format delimited fields terminated by '\t';
加载数据:
load data local inpath '/opt/datas/emp.txt' into table emp_partition partition (mouth='201509');
查询:
where mouth = '201509';
注意事项:
a.自己手动创建分区表文件夹并put数据,并没有将分区元数据写入元数据库,所以无法读取数据,可以手动修复:
msck repair table dept_partition;
或者 alter table dept_part add partition(day='20150913');
b.查看表的分区数:show partitions dept_partition;
8、导出表的方式
1)insert overwrite local directory '/opt/datas/hive_exp_emp' -->去掉local导出到Hdfs文件系统上
row format delimited fields terminated by '\t'
collection items terminated by '\n'
select * from db_hive.emp;
2)bin/hive -e "select * from db_hive.emp" > /opt/datas/hive_exp_exp.txt -->没有跑MapReduce任务
3)scoop hdfs/hive->rdbms or rdbms->hdfs/hive/hbase
9、Hive中常见的查询
[WITH CommonTableExpression (, CommonTableExpression)*] (Note: Only available starting with Hive
0.13
.
0
)
SELECT [ALL | DISTINCT] select_expr, select_expr, ... -->全部和查重
FROM table_reference
[WHERE where_condition]
[GROUP BY col_list]
[ORDER BY col_list]
[CLUSTER BY col_list
| [DISTRIBUTE BY col_list] [SORT BY col_list]
]
[LIMIT number]
1)select t.empno,t.ename,t.deptno from emp t; -->t
2)between -->
select t.empno,t.ename,t.deptno from emp t where t.sal between 800 and 1500;
3)is null/is not null
select t.empno,t.ename,t.deptno from emp t where comm is null;
4)group by/having
查询每个部门的平均工资
select avg(sal) avg_sal from emp
group by deptno;
查询每个部门中每个岗位的最高薪水
select t.deptno,t.job,max(t.sal) max_sal from emp t group by t.deptno,t.job; -->双重分组
having与where区别
where 针对单条记录进行筛选
having针对分组结果进行筛选
10、Export/Import
1)Export -->将Hive表中的数据导出
EXPORT TABLE tablename [PARTITION (part_column=
"value"
[, ...])]
TO
'export_target_path'
[ FOR replication(
'eventid'
) ] -->path指的为HDFS上的路径
IMPORT [[EXTERNAL] TABLE new_or_original_tablename [PARTITION (part_column=
"value"
[, ...])]]
FROM
'source_path'
[LOCATION
'import_target_path'
]