Hive环境搭建及基本操作

伪分布式

一、安装及配置Hive

1.配置HADOOP_HOME和Hive conf 目录hive-env.sh

# Set HADOOP_HOME to point to a specific hadoop install directory
HADOOP_HOME=/opt/softwares/hadoop-2.8.0

# Hive Configuration Directory can be controlled by:
export HIVE_CONF_DIR=/opt/softwares/hive-1.2.2/conf

2.在HDFS文件系统中创建两个目录并为同组用户添加执行权限

In addition, you must use below HDFS commands to create /tmp and /user/hive/warehouse (aka hive.metastore.warehouse.dir) and set them chmod g+w before you can create a table in Hive

$ $HADOOP_HOME/bin/hadoop fs -mkdir       /tmp

$ $HADOOP_HOME/bin/hadoop fs -mkdir       /user/hive/warehouse

$ $HADOOP_HOME/bin/hadoop fs -chmod g+w   /tmp

$ $HADOOP_HOME/bin/hadoop fs -chmod g+w   /user/hive/warehouse

3.运行Hive shell(CLI)

!!!必须在HDFS启动之后启动Hive

$ $HIVE_HOME/bin/hive

4.HQL操作

show databases ;

use/create default;

show tables;

#显示表信息
desc student;

#显示表详细信息
desc extended student;

#显示表格式化后的详细信息
desc formatted student;

create table student(id int, name string)ROW FORMAT DELIMITED FIELDS TERMINATED BY '\t'; 	# 行格式分隔的字段以“t”结尾

load data local inpath '/opt/datas/test' into table databaseName.student;					# 从本地文件导入数据到表中,数据库名可以加可以不加,跟手动 put 是一样的

select * from student;

select id form student;

show functions;

desc function xxx;					#注意没有s

desc function extended xxx;			#详细

#显示帮助
#/bin/hive -help

#启动时设置配置信息
bin/hive --hiveconf <property=value>

#!!!在Hive中操作文件dfs系统
dfs -cp xxx xxx;

#!!!在Hivez中操作本地文件系统
!ls /opt/datas

二、安装MySQL

1. sudo apt-get install mysql-server

2. sudo apt-get isntall mysql-client

3. sudo apt-get install libmysqlclient-dev

4. service mysql start

5. mysql -uroot -p

6. 在数据库中将root的Hostname改成%,允许所有人登录,所有IP

三、在Hive中配置MySQL作为元数据存储数据库,Metastore

1.拷贝MySQL的驱动Java包(mysql-connector下的jar包)到HIVE-HOME/lib下

2.添加配置文件hive-site.xml(hive-default.xml.template的复刻版)

3.添加并修改连接数据库需要的四个配置

<property>
  <name>javax.jdo.option.ConnectionURL</name>
  <value>jdbc:mysql://localhost:3306/hive?createDatabaseIfNotExist=true</value>
  <description>JDBC connect string for a JDBC metastore</description>
</property>

<property>
  <name>javax.jdo.option.ConnectionDriverName</name>
  <value>com.mysql.jdbc.Driver</value>
  <description>Driver class name for a JDBC metastore</description>
</property>

<property>
  <name>javax.jdo.option.ConnectionUserName</name>
  <value>root</value>
  <description>Username to use against metastore database</description>
</property>

<property>
  <name>javax.jdo.option.ConnectionPassword</name>
  <value>ubuntu</value>
  <description>password to use against metastore database</description>
</property>

<!--显示头信息和当前数据库名-->
<property>
  <name>hive.cli.print.header</name>
  <value>true</value>
  <description>Whether to print the names of the columns in query output.</description>
</property>

<property>
  <name>hive.cli.print.current.db</name>
  <value>true</value>
  <description>Whether to include the current database in the Hive prompt.</description>
</property>

四、各种信息的设置位置说明

1.数据仓库存储位置
	*由1.2.2可知,默认为/user/hive/warehouse
	*配置在hive.default中
		  <property>
		    <name>hive.metastore.warehouse.dir</name>
		    <value>/user/hive/warehouse</value>
		    <description>location of default database for the warehouse</description>
		  </property>
	*在仓库目录下每一个数据库一个文件夹(default除外)		DirName = databaseName.db
	*default下的表也作为一个文件夹在默认路径下存储起来		DirName = tableName
2.配置log目录(非必要)
	*默认路径/tmp/username/hive.log
	*修改conf/hive-log4j.properties文件
3.日志显示级别
	*修改conf/hive-log4j.properties文件
	*hive.root.logger = INFO/DEBUG,DRFA/console
4.在命令行显示当前数据库以及表头信息
	*三中有涉及
5.启动时设置配置信息
	*bin/hive --hiveconf <property=value>
6.查看当前所有配置信息
	*hive > set ;
	*hive > set key = value ;	#临时设置一个值
7.执行历史记录
	~/.hivehistory

五、hive的参数

cen@hostname-ubuntu:/opt/softwares/hive-1.2.2$ bin/hive -help
usage: hive
 -d,--define <key=value>          Variable subsitution to apply to hive 		-d 定义一个值
                                  commands. e.g. -d A=B or --define A=B
    --database <databasename>     Specify the database to use
 -e <quoted-query-string>         SQL from command line 	                    -e 单行查询语句
 -f <filename>                    SQL from files 			                    -f SQL文件
 -H,--help                        Print help information						-help 显示帮助
    --hiveconf <property=value>   Use value for given property 					-hiveconf 临时配置hive
    --hivevar <key=value>         Variable subsitution to apply to hive 		-hivevar 
                                  commands. e.g. --hivevar A=B
 -i <filename>                    Initialization SQL file 						-i 初始化的SQL文件,udf相关
 -S,--silent                      Silent mode in interactive shell
 -v,--verbose                     Verbose mode (echo executed SQL to the
                                  console)

#查询结果保存到文件中
bin/hiva -e "select * from student;" > /opt/datas/hive-res.txt
posted @ 2017-07-13 17:53  岑忠满  阅读(775)  评论(0编辑  收藏  举报