HBASE_HIVE整合

Hive与HBase的整合功能的实现是利用两者本身对外的API接口互相进行通信,相互通信主要是依靠hive_hbase-handler.jar工具类

首先保证版本一致

cd /home/hadoop/hive-1.1.0-cdh5.5.2/lib

查看版本是否一致hbase-server-1.0.0-cdh5.5.2.jar  zookeeper-3.4.5-cdh5.5.2.jar  hive-hbase-handler-1.1.0-cdh5.5.2.jar

如果不一致删除原有jar  cp 原hbase和zookeeper下的jar包到hive的lib中

 

2.修改 hive/conf下hive-site.xml文件
[hadoop@h91 hive-0.7.1-cdh3u5]$ mkdir tmp
[hadoop@h91 hive-0.7.1-cdh3u5]$ mkdir logs

[hadoop@h91 conf]$ vi hive-site.xml
以下内容 添加到文件底部  </configuration> 之上

<!--        
<property>  
  <name>hive.exec.scratchdir</name>
  <value>/home/hadoop/hive-1.1.0-cdh5.5.2/tmp/</value>
</property>   
-->   

<property>
  <name>hive.querylog.location</name>
  <value>/home/hadoop/hive-1.1.0-cdh5.5.2/logs/</value>
</property>

<property>
  <name>hive.aux.jars.path</name>  
  <value>
    file:///home/hadoop/hive-1.1.0-cdh5.5.2/lib/hive-hbase-handler-1.1.0-cdh5.5.2.jar,
    file:///home/hadoop/hive-1.1.0-cdh5.5.2/lib/hbase-server-1.0.0-cdh5.5.2.jar,
    file:///home/hadoop/hive-1.1.0-cdh5.5.2/lib/zookeeper-3.4.5-cdh5.5.2.jar
  </value>
</property>

 

3.拷贝hbase-server-1.0.0-cdh5.5.2.jar 到所有hadoop节点 lib下(包括master节点)
[hadoop@h91 hbase-0.90.6-cdh3u5]$ cp hbase-server-1.0.0-cdh5.5.2.jar /home/hadoop/hadoop-2.6.0-cdh5.5.2/lib
[hadoop@h91 hbase-0.90.6-cdh3u5]$ scp hbase-server-1.0.0-cdh5.5.2.jar hadoop@h202:/home/hadoop/hadoop-2.6.0-cdh5.5.2/lib
[hadoop@h91 hbase-0.90.6-cdh3u5]$ scp hbase-server-1.0.0-cdh5.5.2.jarhadoop@h203:/home/hadoop/hadoop-2.6.0-cdh5.5.2/lib

4.拷贝hbase/conf下的hbase-site.xml文件到所有hadoop节点(包括master)的hadoop/conf下
[hadoop@h91 conf]$ cp hbase-site.xml /home/hadoop/hadoop-2.6.0-cdh5.5.2/conf/
[hadoop@h91 conf]$ scp hbase-site.xml hadoop@h202:/home/hadoop/hadoop-2.6.0-cdh5.5.2/conf/
[hadoop@h91 conf]$ scp hbase-site.xml hadoop@h203:/home/hadoop/hadoop-2.6.0-cdh5.5.2/conf/


5.启动hive
[hadoop@h91 hive-0.7.1-cdh3u5]$ bin/hive -hiveconf hbase.zookeeper.quorum=h201,h202,h203

----------------------------------------------------------------
例子
1创建 hbase识别的表
hive>
CREATE TABLE hbase_table_1(key int, value string)  
STORED BY 'org.apache.hadoop.hive.hbase.HBaseStorageHandler'  
WITH SERDEPROPERTIES ("hbase.columns.mapping" = ":key,cf1:val")  
TBLPROPERTIES ("hbase.table.name" = "sq");

***(hbase.table.name 定义在hbase的table名称
     hbase.columns.mapping 定义在hbase的列族)****

2.创建 hive表
hive> create table ha1(id int,name string)
     row format delimited
     fields terminated by '\t'
     stored as textfile;

[hadoop@h91 ~]$ vi ha1.txt
11      zs
22      ls
33      ww

hive> load data local inpath '/home/hadoop/ha1.txt' into table ha1;

hive> insert into table hbase_table_1 select * from ha1;

~~hive> select * from  hbase_table_1;  

3.[hadoop@h91 hbase-0.90.6-cdh3u5]$ bin/hbase shell

hbase(main):002:0> scan 'sq'
(能看到结果 说明hive把数据存到hbase中)

 

第二种方法直接修改 hive/conf下hive-site.xml文件

<!--        
<property>  
  <name>hive.exec.scratchdir</name>
  <value>/home/hadoop/hive-1.1.0-cdh5.5.2/tmp/</value>
</property>   
-->   

<property>
  <name>hive.querylog.location</name>
  <value>/home/hadoop/hive-1.1.0-cdh5.5.2/logs/</value>
</property>

<property>
  <name>hive.aux.jars.path</name>  
  <value>
    file:///home/hadoop/hive-1.1.0-cdh5.5.2/lib/hive-hbase-handler-1.1.0-cdh5.5.2.jar,
    file:///home/hadoop/hive-1.1.0-cdh5.5.2/lib/guava-14.0.1.jar,
    file:///home/hadoop/hbase-1.0.0-cdh5.5.2/lib/hbase-common-1.0.0-cdh5.5.2.jar,
    file:///home/hadoop/hbase-1.0.0-cdh5.5.2/lib/hbase-client-1.0.0-cdh5.5.2.jar,
    file:///home/hadoop/hbase-1.0.0-cdh5.5.2/lib/hbase-server-1.0.0-cdh5.5.2.jar,
    file:///home/hadoop/hbase-1.0.0-cdh5.5.2/lib/netty-all-4.0.23.Final.jar,
    file:///home/hadoop/hbase-1.0.0-cdh5.5.2/lib/hbase-hadoop2-compat-1.0.0-cdh5.5.2.jar,
    file:///home/hadoop/zookeeper-3.4.5-cdh5.5.2/zookeeper-3.4.5-cdh5.5.2.jar
  </value>
</property>

然后重启hive即可。

 

HIVE 关联存在的hbase表

CREATE external TABLE hive_hbase_offset(key string, value map<string,string>)
STORED BY 'org.apache.hadoop.hive.hbase.HBaseStorageHandler'
WITH SERDEPROPERTIES ("hbase.columns.mapping" =":key,d:")
TBLPROPERTIES ("hbase.table.name" = "ecpcibb:feed.meta.kafka_offset");

当Hive使用overwrite关键字进行插入数据时。删除Hive表对HBase没有影响,但是先删除HBase表Hive就会报TableNotFoundException,但是删除Hive不会报上面这个错。map<string,string>第一个string代表列名,第二个string代表hbase中的values

WITH SERDEPROPERTIES ("hbase.columns.mapping" =":key,F:")

解释一下这句,这句功能是让HBase来识别Hive表中数据,默认Hive表中第一个字段为HBase的主键,也就是这里设置的key(似乎是固定写法),F表示列族名,这里没有设置列名,所以: 后面为空。

外部表不支持load data操作,所以需要insert overtwrite操作来插入数据

创建表例子

CREATE external TABLE CY_relationship(rowkey string, relationshipPermId string, relationshipTypeName string,subjectPermId string,objectPermId string,RPPT string,RPPV string)
STORED BY 'org.apache.hadoop.hive.hbase.HBaseStorageHandler'
WITH SERDEPROPERTIES ("hbase.columns.mapping" =":key,d:rpid,d:type,d:sbjtpid,d:objt,d:propt[0],d:propv[0]")
TBLPROPERTIES ("hbase.table.name" = "ecpcibb:master.data.relationship");


CREATE external TABLE CY_master_instrument(rowkey string,isin string, instrumentPermid string,dataSource string,relationshipPermId map<string,string>)
STORED BY 'org.apache.hadoop.hive.hbase.HBaseStorageHandler'
WITH SERDEPROPERTIES ("hbase.columns.mapping" =":key,d:id[0]/val,d:pid,d:src,d:rel.*")
TBLPROPERTIES ("hbase.table.name" = "ecpcibb:master.data.instrument");

d:rel.*将列族名开头为d:rel全部放入map集合中。

字段为map集合的查询方式:

select * from cy_aggr_quote where concat_ws(',',map_values(relationshippermid)) like '%200764004179%'

 

posted @ 2018-05-22 22:21  蜘蛛侠0  阅读(367)  评论(0编辑  收藏  举报