hue4.0.1+haproxy实现hive集群的高可用

需求描述:
ucloud的hadoop大数据使用的hive不支持高可用,hive经常出问题,商务和其他业务同事通过hue或者zeeplin查询的时候经常不出结果,重启hive服务后就好了,过一段时间就又不行了,需要反复重启,比较头疼

在网上搜索了一下关于hive的高可用方案:
1.通过zookeeper对hive进行管理(后面测试发现不行,停用其中一台hiveserver2,zookeeper确实能感应到,但是查询依然报错空指针)
2.通过haproxy做代理,后面跟hive集群即可(推荐),不过这时候haproxy成了一个单点,暂时无法解决,只能添加监控进行监测

hive只是一个客户端,实际的数据是从hadoop的hdfs里面来的,具体架构如下


为了不把ucloud的环境弄乱,我重新部署了一套物理隔离的hue+haproxy+hive的环境(避免冲突)

具体如下:
方案一通过zookeeper进行hive的高可用管理(失败案例)
1.配置Hive的环境变量

# Hive environment 
export HIVE_HOME=/data/yunva/apache-hive-1.2.2-bin
export PATH=$HIVE_HOME/bin:$HIVE_HOME/conf:$PATH 


2.修改配置hive的文件

[root@test6_vedio bin]# cat /data/yunva/apache-hive-1.2.2-bin/conf/hive-site.xml
<?xml version="1.0" encoding="UTF-8" standalone="no"?>
<?xml-stylesheet type="text/xsl" href="configuration.xsl"?>
<configuration>
    <property>
        <name>javax.jdo.option.ConnectionURL</name>
        <value>jdbc:mysql://10.10.11.214:3306/hive?createDatabaseIfNotExist=true</value>
        <description>JDBC connect string for a JDBC metastore</description>
    </property>
    <property>
        <name>javax.jdo.option.ConnectionDriverName</name>
        <value>com.mysql.jdbc.Driver</value>
        <description>Driver class name for a JDBC metastore</description>  
    </property>          

    <property> 
        <name>javax.jdo.option.ConnectionUserName</name>
        <value>hive</value>
        <description>username to use against metastore database</description>
    </property>
    <property>
        <name>javax.jdo.option.ConnectionPassword</name>
        <value>hive</value>
        <description>password to use against metastore database</description>
    </property>
    <property>
    <name>hive.server2.support.dynamic.service.discovery</name>
    <value>true</value>
    </property>

# 新增加的配置
<property>
<name>hive.server2.zookeeper.namespace</name>
<value>hiveserver2_zk</value>
</property>
 
<property>
<name>hive.zookeeper.quorum</name>
<value>master,slave1,slave2</value>
</property>
 
<property>
<name>hive.zookeeper.client.port</name>
<value>2181</value>
</property>

<property>
<name>hive.server2.thrift.bind.host</name>
<value>0.0.0.0</value>
</property>
 
<property>
<name>hive.server2.thrift.port</name>
<value>10001</value> //两个HiveServer2实例的端口号要一致
</property>
</configuration>

启动一台
在zookeeper里面可以看到
[zk: localhost:2181(CONNECTED) 8] ls /hiveserver2_zk
[serverUri=0.0.0.0:10001;version=1.2.2;sequence=0000000000]

启动另外一台,第二个实例启动后,ZK中可以看到两个都注册上来
[zk: localhost:2181(CONNECTED) 9] ls /hiveserver2_zk
[serverUri=0.0.0.0:10001;version=1.2.2;sequence=0000000000, serverUri=0.0.0.0:10001;version=1.2.2;sequence=0000000001]


JDBC连接

JDBC连接的URL格式为:

jdbc:hive2://<zookeeper quorum>/<dbName>;serviceDiscoveryMode=zooKeeper;zooKeeperNamespace=hiveserver2

其中:

<zookeeper quorum> 为Zookeeper的集群链接串,如master:2181,slave1:2181,slave2:2181

<dbName> 为Hive数据库,默认为default

serviceDiscoveryMode=zooKeeper 指定模式为zooKeeper

zooKeeperNamespace=hiveserver2 指定ZK中的nameSpace,即参数hive.server2.zookeeper.namespace所定义,我定义为hiveserver2_zk

使用beeline测试连接:

cd $HIVE_HOME/bin
[root@test3 bin]# ./beeline 
Beeline version 1.2.2 by Apache Hive
beeline> !connect jdbc:hive2://master:2181,slave1:2181,slave2:2181/;serviceDiscoveryMode=zooKeeper;zooKeeperNamespace=hiveserver2_zk yunva_hive_test01 ""
Connecting to jdbc:hive2://master:2181,slave1:2181,slave2:2181/;serviceDiscoveryMode=zooKeeper;zooKeeperNamespace=hiveserver2_zk
Connected to: Apache Hive (version 1.2.2)
Driver: Hive JDBC (version 1.2.2)
Transaction isolation: TRANSACTION_REPEATABLE_READ
0: jdbc:hive2://master:2181,slave1:2181,slave> show databases;
OK
+----------------+--+
| database_name  |
+----------------+--+
| default        |
| test           |
| yunvatest      |
+----------------+--+
3 rows selected (1.839 seconds)
0: jdbc:hive2://master:2181,slave1:2181,slave> use test;
OK
No rows affected (0.051 seconds)
0: jdbc:hive2://master:2181,slave1:2181,slave> show tables;
OK
+-----------+--+
| tab_name  |
+-----------+--+
| table2    |
| table3    |
+-----------+--+
2 rows selected (0.052 seconds)
0: jdbc:hive2://master:2181,slave1:2181,slave> select * from table2;
OK
+------------+--------------+--+
| table2.id  | table2.name  |
+------------+--------------+--+
| 1          | spring       |
| 2          | summer       |
| 3          | autumn       |
| 4          | winter       |
+------------+--------------+--+
4 rows selected (0.432 seconds)
0: jdbc:hive2://master:2181,slave1:2181,slave> 

可以正常连接

执行一个HQL查询,日志中显示连接到的HiveServer2实例为Node2,停掉Node2中的HiveServer2实例后,需要重新使用!connect命令连接,当关闭第二个hive的时候可以连上,但是当停掉掉beelive在的这台服务器的hive时,zook能感应到,但依然报错
关闭其中一台hiveserver2通过beeline出现空指针,看zookeeper已经剔除掉了第一台,说明这个高可用方案不可行
[zk: localhost:2181(CONNECTED) 61] get /hiveserver2_zk                                                              
10.10.125.156
cZxid = 0x3000003f6
ctime = Mon Feb 05 10:21:24 CST 2018
mZxid = 0x3000003f6
mtime = Mon Feb 05 10:21:24 CST 2018
pZxid = 0x300000485
cversion = 19
dataVersion = 0
aclVersion = 0
ephemeralOwner = 0x0
dataLength = 13
numChildren = 1
[zk: localhost:2181(CONNECTED) 62] ls /hiveserver2_zk
[serverUri=test6_vedio:10001;version=1.2.2;sequence=0000000008]
[zk: localhost:2181(CONNECTED) 63] ls /hiveserver2_zk/serverUri=test6_vedio:10001;version=1.2.2;sequence=0000000008
[]
[zk: localhost:2181(CONNECTED) 64] get /hiveserver2_zk/serverUri=test6_vedio:10001;version=1.2.2;sequence=0000000008
test6_vedio:10001
cZxid = 0x300000452
ctime = Mon Feb 05 14:23:54 CST 2018
mZxid = 0x300000452
mtime = Mon Feb 05 14:23:54 CST 2018
pZxid = 0x300000452
cversion = 0
dataVersion = 0
aclVersion = 0
ephemeralOwner = 0x1608c981022000d
dataLength = 17
numChildren = 0

方案二hue4.0.1+haproxy+hive集群(推荐)

1.配置hive客户端

拷贝hadoop,hive等程序到客户端

①修改环境变量配置文件
export JAVA_HOME=/usr/java/jdk1.8.0_102
export PATH=$JAVA_HOME/bin:$PATH
# hadoop
export HADOOP_HOME_WARN_SUPPRESS=true
export HADOOP_HOME=/root/hadoop-2.6.0
export HADOOP_PREFIX=$HADOOP_HOME
export HADOOP_MAPRED_HOME=$HADOOP_HOME
export HADOOP_YARN_HOME=$HADOOP_HOME
export HADOOP_COMMON_HOME=$HADOOP_HOME
export HADOOP_HDFS_HOME=$HADOOP_HOME
export YARN_HOME=$HADOOP_HOME
export HADOOP_CONF_DIR=$HADOOP_HOME/conf
export YARN_CONF_DIR=$HADOOP_HOME/conf

# Hive
export HIVE_HOME=/root/hive
export HIVE_CONF_DIR=$HIVE_HOME/conf

# HBase
export HBASE_HOME=/root/hbase
export HBASE_CONF_DIR=$HBASE_HOME/conf

# spark
export SPARK_HOME=/root/spark
export SPARK_CONF_DIR=$SPARK_HOME/conf

# pig
export PIG_HOME=/root/pig
export PIG_CONF_DIR=$PIG_HOME/conf
export PIG_CLASSPATH=$HADOOP_HOME/conf

export PATH=$JAVA_HOME/bin:$HADOOP_HOME/bin:$HIVE_HOME/bin:$HBASE_HOME/bin:$SPARK_HOME/bin:$PIG_HOME/bin:$PATH
umask 022

# Cuda
export LD_LIBRARY_PATH=$HADOOP_HOME/lib/native:/usr/lib64:/usr/local/cuda/lib64:/usr/local/cuda/lib:$LD_LIBRARY_PATH
export PATH=/usr/local/cuda/bin:$PATH

export HADOOOP_ROOT_LOGGER=DEBUG,console


②修改/etc/hosts
10.19.105.50 u04ck04.yaya.corp
10.19.128.248 uhadoop-bwgkeu-master1
10.19.196.141 uhadoop-bwgkeu-core4
10.19.80.123 uhadoop-bwgkeu-core2
10.19.185.160 uhadoop-bwgkeu-core1
10.19.91.236 uhadoop-bwgkeu-core3
10.19.72.208 uhadoop-bwgkeu-master2
10.19.101.5 uhadoop-bwgkeu-core6
10.19.139.82 uhadoop-bwgkeu-core8
10.19.104.184 uhadoop-bwgkeu-core7
10.19.62.73 uhadoop-bwgkeu-core5


③启动hive客户端程序(3台都需要启动)
cd /root/hive/bin
nohup ./hiveserver2 &

检查10002端口是否正常启动
[root@u04ck01 bin]# ss -tnlp|grep 10002
LISTEN     0      50                        *:10002                    *:*      users:(("java",9920,466))

验证hive是否安装成功
[root@u04ck02 bin]# hive

Logging initialized using configuration in file:/root/hive/conf/hive-log4j.properties
SLF4J: Class path contains multiple SLF4J bindings.
SLF4J: Found binding in [jar:file:/root/hbase/lib/slf4j-log4j12-1.7.5.jar!/org/slf4j/impl/StaticLoggerBinder.class]
SLF4J: Found binding in [jar:file:/root/hadoop-2.6.0/share/hadoop/common/lib/slf4j-log4j12-1.7.5.jar!/org/slf4j/impl/StaticLoggerBinder.class]
SLF4J: See http://www.slf4j.org/codes.html#multiple_bindings for an explanation.
SLF4J: Actual binding is of type [org.slf4j.impl.Log4jLoggerFactory]
hive> show databases;
OK
default
test_hive_ucloud10086
Time taken: 2.02 seconds, Fetched: 2 row(s)
hive> use default;
OK
Time taken: 0.02 seconds
hive> show tables;
OK
dual
t_bin_common_country
t_bin_user_report
t_bin_user_req
t_bin_user_task
t_push_common_city
t_push_common_province
t_push_task_result
t_push_user_app
t_push_user_req
t_script_common_city
t_script_common_province
t_script_user_report
t_script_user_req
...
hive> select * from tbl_voice_live_user_msg limit 1;
OK
1001710    100000    10000    1    2017-09-29 15:36:03.847.+0800    voice:live:msg    /voice/live/msg    1506688563847    2017-09-29
Time taken: 0.962 seconds, Fetched: 1 row(s)

2.代理服务器haproxy的配置
[root@u04ck04 hue-4.0.0]# cat /etc/haproxy/haproxy.cfg
global
                daemon
                nbproc 1
                pidfile /var/run/haproxy.pid

defaults
                mode tcp                         #mode { tcp|http|health },tcp 表示4层,http表示7层(对我们没用),health仅作为健康检查使用
                retries 2                          #尝试2次失败则从集群摘除
                option redispatch              #如果失效则强制转换其他服务器
                option abortonclose           #连接数过大自动关闭
                maxconn 1024                  #最大连接数
                timeout connect 1d           #连接超时时间,重要,hive查询数据能返回结果的保证
                timeout client 1d               #同上
                timeout server 1d              #同上
                timeout check 8000           #健康检查时间
                log 127.0.0.1 local0 err #[err warning info debug]

listen    admin_stats                           #定义管理界面
                bind 0.0.0.0:6080               #管理界面访问IP和端口
                mode http                         #管理界面所使用的协议
                maxconn 10        #最大连接数
                stats refresh 30s                #30秒自动刷新
                stats uri /                         #访问url
                stats realm Hive\ Haproxy    #验证窗口提示
                stats auth admin:haproxyadmin2017     #401验证用户名密码

listen hive #hive后端定义
                bind 0.0.0.0:10001               #ha作为proxy所绑定的IP和端口
                mode tcp                         #以4层方式代理,重要
                balance leastconn                #调度算法 'leastconn' 最少连接数分配,或者 'roundrobin',轮询分配
                maxconn 1024                      #最大连接数
                server hive_01 10.10.125.156:10001 check inter 10000 rise 1 fall 2  #释义:server 主机代名(你自己能看懂就行),IP:端口 每1000毫秒检查一次。
                server hive_02 10.10.50.133:10001 check inter 10000 rise 1 fall 2  #同上,另外一台服务器。

3.hue的安装和配置
初始化hue数据库
# 新建数据库,字符集一定要是utf8不能是utf8mb4否则报错
create database hue4db charset=utf8;
授权给hue4db/hue4db所有权限

cd build/env/bin
2.1) bin/hue syncdb # 数据库同步的时候是根据hue.ini配置文件的database部分连接mysql进行初始化的
2.2) bin/hue migrate

直接拷贝生产环境3.1的hue配置,然后修改hue.ini配置:

[beeswax]
  hive_server_host=uhadoop-bwgkeu-master2
  hive_conf_dir=/root/hive/conf

[hbase]
  hbase_conf_dir=/root/hbase/conf

 [[database]]
        host=10.19.128.248
        port=3306
        engine=mysql
        user=hue4db
        password=hue4db
        name=hue4db


监控的添加:
1.hue的web监控
2.haproxy的代理端口监控
3.各hive的终端的10002端口监控

 

posted @ 2018-02-05 18:32  reblue520  阅读(842)  评论(0编辑  收藏  举报