hive + hadoop 环境搭建
机器规划:
主机 | ip | 进程 |
hadoop1 | 10.183.225.158 | hive server |
hadoop2 | 10.183.225.166 | hive client |
前置条建:
kerberos部署:http://www.cnblogs.com/kisf/p/7473193.html
Hadoop HA + kerberos部署:http://www.cnblogs.com/kisf/p/7477440.html
mysql安装:略
添加hive用户名,及数据库。mysql -uhive -h10.112.28.179 -phive123456
hive使用2.3.0版本:
wget http://mirror.bit.edu.cn/apache/hive/hive-2.3.0/apache-hive-2.3.0-bin.tar.gz
添加环境变量:
export HIVE_HOME=/letv/soft/apache-hive-2.3.0-bin export HIVE_CONF_DIR=$HIVE_HOME/conf export PATH=\$PATH:\$HIVE_HOME/bin
同步至master2,并 source /etc/profile
解压:
tar zxvf apache-hive-2.3.0-bin.tar.gz
kerberos生成keytab:
addprinc -randkey hive/hadoop1@JENKIN.COM addprinc -randkey hive/hadoop2@JENKIN.COM xst -k /var/kerberos/krb5kdc/keytab/hive.keytab hive/hadoop1@JENKIN.COM xst -k /var/kerberos/krb5kdc/keytab/hive.keytab hive/hadoop2@JENKIN.COM
拷贝至hadoop2
scp /var/kerberos/krb5kdc/keytab/hive.keytab hadoop1:/var/kerberos/krb5kdc/keytab/ scp /var/kerberos/krb5kdc/keytab/hive.keytab hadoop2:/var/kerberos/krb5kdc/keytab/
(使用需要kinit)
hive server 配置:
hive server hive-env.sh增加:
HADOOP_HOME=/xxx/soft/hadoop-2.7.3 export HIVE_CONF_DIR=/xxx/soft/apache-hive-2.3.0-bin/conf export HIVE_AUX_JARS_PATH=/xxx/soft/apache-hive-2.3.0-bin/lib
hive server上增加hive-site.xml:
<configuration> <property> <name>hive.metastore.schema.verification</name> <value>false</value> <description> Enforce metastore schema version consistency. True: Verify that version information stored in metastore matches with one from Hive jars. Also disable automatic schema migration attempt. Users are required to manully migrate schema after Hive upgrade which ensures proper metastore schema migration. (Default) False: Warn if the version information stored in metastore doesn't match with one from in Hive jars. </description> </property> <property> <name>hive.metastore.warehouse.dir</name> <value>/user/hive/warehouse</value> <description>location of default database for the warehouse</description> </property> <property> <name>hive.querylog.location</name> <value>/xxx/soft/apache-hive-2.3.0-bin/log</value> <description>Location of Hive run time structured log file</description> </property> <property> <name>hive.downloaded.resources.dir</name> <value>/xxx/soft/apache-hive-2.3.0-bin/tmp</value> <description>Temporary local directory for added resources in the remote file system.</description> </property> <property> <name>javax.jdo.option.ConnectionURL</name> <value>jdbc:mysql://10.112.28.179:3306/hive?createDatabaseIfNotExist=true&iuseUnicode=true&characterEncoding=utf-8&useSSL=false</value<configuration> <description>JDBC connect string for a JDBC metastore</description> </property> <property> <name>javax.jdo.option.ConnectionDriverName</name> <value>com.mysql.jdbc.Driver</value> <description>Driver class name for a JDBC metastore</description> </property> <property> <name>javax.jdo.option.ConnectionUserName</name> <value>hive</value> <description>username to use against metastore database</description> </property> <property> <name>javax.jdo.option.ConnectionPassword</name> <value>hive123456</value> <description>password to use against metastore database</description> </property> <!-- kerberos config --> <property> <name>hive.server2.authentication</name> <value>KERBEROS</value> </property> <property> <name>hive.server2.authentication.kerberos.principal</name> <value>hive/_HOST@JENKIN.COM</value> </property> <property> <name>hive.server2.authentication.kerberos.keytab</name> <value>/var/kerberos/krb5kdc/keytab/hive.keytab</value> <!-- value>/xxx/soft/apache-hive-2.3.0-bin/conf/keytab/hive.keytab</value --> </property> <property> <name>hive.metastore.sasl.enabled</name> <value>true</value> </property> <property> <name>hive.metastore.kerberos.keytab.file</name> <value>/var/kerberos/krb5kdc/keytab/hive.keytab</value> </property> <property> <name>hive.metastore.kerberos.principal</name> <value>hive/_HOST@JENKIN.COM</value> </property>
hadoop namenode core-site.xml增加配置:
<!-- hive congfig --> <property> <name>hadoop.proxyuser.hive.hosts</name> <value>*</value> </property> <property> <name>hadoop.proxyuser.hive.groups</name> <value>*</value> </property> <property> <name>hadoop.proxyuser.hdfs.hosts</name> <value>*</value> </property> <property> <name>hadoop.proxyuser.hdfs.groups</name> <value>*</value> </property> <property> <name>hadoop.proxyuser.HTTP.hosts</name> <value>*</value> </property> <property> <name>hadoop.proxyuser.HTTP.groups</name> <value>*</value> </property>
同步是其他机器。
scp etc/hadoop/core-site.xml master2:/xxx/soft/hadoop-2.7.3/etc/hadoop/ scp etc/hadoop/core-site.xml slave2:/xxx/soft/hadoop-2.7.3/etc/hadoop/
JDBC下载:
wget https://dev.mysql.com/get/Downloads/Connector-J/mysql-connector-java-5.1.44.tar.gz tar zxvf mysql-connector-java-5.1.44.tar.gz
复制到hive lib目录:
cp mysql-connector-java-5.1.44/mysql-connector-java-5.1.44-bin.jar apache-hive-2.3.0-bin/lib/
客户端配置:
将hive拷贝至hadoop2
scp -r apache-hive-2.3.0-bin/ hadoop2:/xxx/soft/
在hadoop2上(client):
hive-site.xml
<configuration> <property> <name>hive.metastore.uris</name> <value>thrift://hadoop1:9083</value> </property> <property> <name>hive.metastore.local</name> <value>false</value> </property> <!-- kerberos config --> <property> <name>hive.server2.authentication</name> <value>KERBEROS</value> </property> <property> <name>hive.server2.authentication.kerberos.principal</name> <value>hive/_HOST@JENKIN.COM</value> </property> <property> <name>hive.server2.authentication.kerberos.keytab</name> <value>/var/kerberos/krb5kdc/keytab/hive.keytab</value> <!-- value>/xxx/soft/apache-hive-2.3.0-bin/conf/keytab/hive.keytab</value --> </property> <property> <name>hive.metastore.sasl.enabled</name> <value>true</value> </property> <property> <name>hive.metastore.kerberos.keytab.file</name> <value>/var/kerberos/krb5kdc/keytab/hive.keytab</value> </property> <property> <name>hive.metastore.kerberos.principal</name> <value>hive/_HOST@JENKIN.COM</value> </property> </configuration>
启动hive:
初始化数据:
./bin/schematool -dbType mysql -initSchema
获取票据:
kinit -k -t /var/kerberos/krb5kdc/keytab/hive.keytab hive/hadoop1@JENKIN.COM
启动server:
hive --service metastore &
验证:
[root@hadoop1 conf]# netstat -nl | grep 9083 tcp 0 0 0.0.0.0:9083 0.0.0.0:* LISTEN
ps -ef | grep metastore hive hive>
启动thrift (hive server)
hive --service hiveserver2 &
验证thrift(hive server是否启动)
[root@hadoop1 conf]# netstat -nl | grep 10000 tcp 0 0 0.0.0.0:10000 0.0.0.0:* LISTEN
hive客户端hql操作:
DML参考:https://cwiki.apache.org//confluence/display/Hive/LanguageManual+DML
通过hive建的database,tables, 在hdfs 上都能看到。参考hive-site.xml location配置。
hadoop fs -ls /usr/hive/warehouse
beeline客户端连接hive:
beeline -u "jdbc:hive2://hadoop1:10000/;principal=hive/_HOST@JENKIN.COM"
执行sql:
0: jdbc:hive2://hadoop1:10000/> show databases; +----------------+ | database_name | +----------------+ | default | | hivetest | +----------------+ 2 rows selected (0.318 seconds)
hive> create database jenkintest; OK Time taken: 0.968 seconds hive> show databases; OK default hivetest jenkintest Time taken: 0.033 seconds, Fetched: 3 row(s) hive> use jenkintest > ; OK Time taken: 0.108 seconds hive> create table test1(columna int, columnb string); OK Time taken: 0.646 seconds hive> show tables; OK test1 Time taken: 0.084 seconds, Fetched: 1 row(s)
hive数据导入:(通过文件导入,在本地建立文件,列按“table”键分开)
[root@hadoop2 ~]# vim jenkindb.txt 1 jenkin 2 jenkin.k 3 anne [root@hadoop2 ~]#hive hive> create table jenkintb (id int, name string) ROW FORMAT DELIMITED FIELDS TERMINATED BY '\t' STORED AS TEXTFILE; hive> load data local inpath 'jenkindb.txt' into table jenkintb; hive> select * from jenkintb; OK 1 jenkin 2 jenkin.k 3 anne
show create table jenkintb;