Hive 连接 HBase 错误处理

Hive 连接 HBase

我的版本是:

HADOOP 2.4.1
HBase 0.98.6.1
Hive 0.13.1

关于 HBase 0.98.6.1
我好像还是没有完全正确安装HBase,0.98.6.1对应的Hadoop版本是2.2,我这里面用的2.4.1。
使用的过程中,会遇到各种问题,比如在用importtsv向HBase里面导入数据的时候,会报错。暂时的解决方法是,用Hadoop2.4.1的jar包直接替换掉HBase里面的hadoop开头的2.2的jar包。运行以后没有报错。

问题

首先在Hbase里面先创建一个table

$ hbase shell
HBase Shell; enter 'help<RETURN>' for list of supported commands.
Type "exit<RETURN>" to leave the HBase Shell
Version 0.92.0, r1231986, Mon Jan 16 13:16:35 UTC 2012
hbase(main):001:0>

hbase(main):001:0> create 'bar', 'cf'
0 row(s) in 0.1200 seconds
hbase(main):002:0>

然后使用Hive连接HBase中的这个表,使用Hive的HBaseStorageHandler,DDL语句如下:

hive>CREATE EXTERNAL TABLE foo(rowkey STRING, a STRING, b STRING) STORED BY 'org.apache.hadoop.hive.hbase.HBaseStorageHandler' WITH SERDEPROPERTIES ('hbase.columns.mapping' = ':key,cf:c1,cf:c2') TBLPROPERTIES ('hbase.table.name' = 'bar');

出现了如下错误:

14/10/24 19:31:43 WARN conf.HiveConf: DEPRECATED: hive.metastore.ds.retry.* no longer has any effect.  Use hive.hmshandler.retry.* instead


FAILED: Execution Error, return code 1 from org.apache.hadoop.hive.ql.exec.DDLTask. MetaException(message:java.io.IOException: Attempt to start meta tracker failed.
	at org.apache.hadoop.hbase.catalog.CatalogTracker.start(CatalogTracker.java:201)
	at org.apache.hadoop.hbase.client.HBaseAdmin.getCatalogTracker(HBaseAdmin.java:230)
	at org.apache.hadoop.hbase.client.HBaseAdmin.tableExists(HBaseAdmin.java:277)
	at org.apache.hadoop.hbase.client.HBaseAdmin.tableExists(HBaseAdmin.java:293)
	at org.apache.hadoop.hive.hbase.HBaseStorageHandler.preCreateTable(HBaseStorageHandler.java:162)
	at org.apache.hadoop.hive.metastore.HiveMetaStoreClient.createTable(HiveMetaStoreClient.java:554)
	at org.apache.hadoop.hive.metastore.HiveMetaStoreClient.createTable(HiveMetaStoreClient.java:547)
	at sun.reflect.NativeMethodAccessorImpl.invoke0(Native Method)
	at sun.reflect.NativeMethodAccessorImpl.invoke(NativeMethodAccessorImpl.java:39)
	at sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:25)
	at java.lang.reflect.Method.invoke(Method.java:597)
	at org.apache.hadoop.hive.metastore.RetryingMetaStoreClient.invoke(RetryingMetaStoreClient.java:89)
	at com.sun.proxy.$Proxy9.createTable(Unknown Source)
	at org.apache.hadoop.hive.ql.metadata.Hive.createTable(Hive.java:613)
	at org.apache.hadoop.hive.ql.exec.DDLTask.createTable(DDLTask.java:4189)
	at org.apache.hadoop.hive.ql.exec.DDLTask.execute(DDLTask.java:281)
	at org.apache.hadoop.hive.ql.exec.Task.executeTask(Task.java:153)
	at org.apache.hadoop.hive.ql.exec.TaskRunner.runSequential(TaskRunner.java:85)
	at org.apache.hadoop.hive.ql.Driver.launchTask(Driver.java:1503)
	at org.apache.hadoop.hive.ql.Driver.execute(Driver.java:1270)
	at org.apache.hadoop.hive.ql.Driver.runInternal(Driver.java:1088)
	at org.apache.hadoop.hive.ql.Driver.run(Driver.java:911)
	at org.apache.hadoop.hive.ql.Driver.run(Driver.java:901)
	at org.apache.hadoop.hive.cli.CliDriver.processLocalCmd(CliDriver.java:268)
	at org.apache.hadoop.hive.cli.CliDriver.processCmd(CliDriver.java:220)
	at org.apache.hadoop.hive.cli.CliDriver.processLine(CliDriver.java:423)
	at org.apache.hadoop.hive.cli.CliDriver.executeDriver(CliDriver.java:792)
	at org.apache.hadoop.hive.cli.CliDriver.run(CliDriver.java:686)
	at org.apache.hadoop.hive.cli.CliDriver.main(CliDriver.java:625)
	at sun.reflect.NativeMethodAccessorImpl.invoke0(Native Method)
	at sun.reflect.NativeMethodAccessorImpl.invoke(NativeMethodAccessorImpl.java:39)
	at sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:25)
	at java.lang.reflect.Method.invoke(Method.java:597)
	at org.apache.hadoop.util.RunJar.main(RunJar.java:212)
Caused by: org.apache.zookeeper.KeeperException$ConnectionLossException: KeeperErrorCode = ConnectionLoss for /hbase/meta-region-server
	at org.apache.zookeeper.KeeperException.create(KeeperException.java:99)
	at org.apache.zookeeper.KeeperException.create(KeeperException.java:51)
	at org.apache.zookeeper.ZooKeeper.exists(ZooKeeper.java:1041)
	at org.apache.hadoop.hbase.zookeeper.RecoverableZooKeeper.exists(RecoverableZooKeeper.java:220)
	at org.apache.hadoop.hbase.zookeeper.ZKUtil.watchAndCheckExists(ZKUtil.java:425)
	at org.apache.hadoop.hbase.zookeeper.ZooKeeperNodeTracker.start(ZooKeeperNodeTracker.java:77)
	at org.apache.hadoop.hbase.catalog.CatalogTracker.start(CatalogTracker.java:197)
	... 33 more

找了好久终于找到了解决办法


解决方法

HBaseIntegration使用的是 hive-hbase-handler-x.y.z.jar模块。
https://cwiki.apache.org/confluence/display/Hive/HBaseIntegration

The handler requires Hadoop 0.20 or higher, and has only been tested with dependency versions hadoop-0.20.x, hbase-0.92.0 and zookeeper-3.3.4. If you are not using hbase-0.92.0, you will need to rebuild the handler with the HBase jar matching your version, and change the --auxpath above accordingly. Failure to use matching versions will lead to misleading connection failures such as MasterNotRunningException since the HBase RPC protocol changes often.

使用这个HBaseStorageHandler需要用到一些jar包,需要使用--auxpath来指定相对路径。但是cwiki上面说方法太复杂,使用起来容易出错。

但是在介绍 HBaseBulkLoad 的时候也用到了额外的jar包,这里面的使用方式就简单多了。
https://cwiki.apache.org/confluence/display/Hive/HBaseBulkLoad

Add necessary JARs
You will need to add a couple jar files to your path. First, put them in DFS:

hadoop dfs -put /usr/lib/hive/lib/hbase-VERSION.jar /user/hive/hbase-VERSION.jar
hadoop dfs -put /usr/lib/hive/lib/hive-hbase-handler-VERSION.jar /user/hive/hive-hbase-handler-VERSION.jar

Then add them to your hive-site.xml:

<property>
  <name>hive.aux.jars.path</name>
  <value>/user/hive/hbase-VERSION.jar,/user/hive/hive-hbase-handler-VERSION.jar</value>
</property>

在hive-site.xml里面直接设置jar包路径,方便多了。
我把文件传到hdfs上面之后,添加的配置如下:

<property>
  <name>hive.aux.jars.path</name>
  <value>/user/hive/lib/hbase-common-0.98.6.1-hadoop2.jar,/user/hive/lib/hive-hbase-handler-0.13.1.jar,/user/hive/lib/zookeeper-3.4.6.jar</value>
  <description>The location of the plugin jars that contain implementations of user defined functions and serdes.</description>
</property>

这样修改完成之后,再重新启动Hive

#nohup hive --service metastore > $HIVE_HOME/log/hive_metastore.log & 
 
#nohup hive --service hiveserver > $HIVE_HOME/log/hiveserver.log & 

#./hive -hiveconf hbase.zookeeper.quorum=slave1,slave2,slave3 

最后一步#./hive -hiveconf hbase.zookeeper.quorum=slave1,slave2,slave3 一定不能少了,这是启动成功的关键。

关于最后一句的作用,参考大神的原话:

You need to tell Hive where to find the zookeepers quorum which would elect the HBase master

现在重新在Hive的shell中执行:

hive>CREATE EXTERNAL TABLE foo(rowkey STRING, a STRING, b STRING) STORED BY 'org.apache.hadoop.hive.hbase.HBaseStorageHandler' WITH SERDEPROPERTIES ('hbase.columns.mapping' = ':key,cf:c1,cf:c2') TBLPROPERTIES ('hbase.table.name' = 'bar');

不报错,成功添加外部表!


Hive中table的定义

Hive 相关概念:

【受管理的表】A managed table is one for which the definition is primarily managed in Hive's metastore, and for whose data storage Hive is responsible.
【外部表】An external table is one whose definition is managed in some external catalog, and whose data Hive does not own (i.e. it will not be deleted when the table is dropped).
【内部表】native
【外部表】non-native

These two distinctions (managed vs. external and native vs non-native) are orthogonal(正交).
Hence, there are four possibilities for base tables:

  • managed native: what you get by default with CREATE TABLE
  • external native: what you get with CREATE EXTERNAL TABLE when no STORED BY clause is specified
  • managed non-native: what you get with CREATE TABLE when a STORED BY clause is specified; Hive stores the definition in its metastore, but does not create any files itself; instead, it calls the storage handler with a request to create a corresponding object structure
  • external non-native: what you get with CREATE EXTERNAL TABLE when a STORED BY clause is specified; Hive registers the definition in its metastore and calls the storage handler to check that it matches the primary definition in the other system

One more thing 关于Hive的关闭

Hive好像没有指定关闭的脚本。我暂时的用的方法是,找出Hive的pid(两个东西),然后直接kill...简单粗暴啊。

# netstat -lnp
Active Internet connections (only servers)
Proto Recv-Q Send-Q Local Address               Foreign Address             State       PID/Program name   
tcp        0      0 0.0.0.0:10000               0.0.0.0:*                   LISTEN      21415/java          
tcp        0      0 0.0.0.0:50070               0.0.0.0:*                   LISTEN      12601/java          
tcp        0      0 0.0.0.0:22                  0.0.0.0:*                   LISTEN      884/sshd            
tcp        0      0 127.0.0.1:25                0.0.0.0:*                   LISTEN      960/master          
tcp        0      0 0.0.0.0:9083                0.0.0.0:*                   LISTEN      21100/java          
tcp        0      0 192.168.129.63:9000         0.0.0.0:*                   LISTEN      12601/java          
tcp        0      0 192.168.129.63:9001         0.0.0.0:*                   LISTEN      12783/java          
tcp        0      0 :::22                       :::*                        LISTEN      884/sshd            
tcp        0      0 ::ffff:192.168.129.63:8088  :::*                        LISTEN      12939/java          
tcp        0      0 ::1:25                      :::*                        LISTEN      960/master          
tcp        0      0 ::ffff:192.168.129.63:8030  :::*                        LISTEN      12939/java          
tcp        0      0 ::ffff:192.168.129.63:8031  :::*                        LISTEN      12939/java          
tcp        0      0 ::ffff:192.168.129.63:60000 :::*                        LISTEN      20610/java          
tcp        0      0 ::ffff:192.168.129.63:8032  :::*                        LISTEN      12939/java          
tcp        0      0 ::ffff:192.168.129.63:8033  :::*                        LISTEN      12939/java          
tcp        0      0 :::60010                    :::*                        LISTEN      20610/java          
Active UNIX domain sockets (only servers)
Proto RefCnt Flags       Type       State         I-Node PID/Program name    Path
unix  2      [ ACC ]     STREAM     LISTENING     8318   1/init              @/com/ubuntu/upstart
unix  2      [ ACC ]     STREAM     LISTENING     10389  850/dbus-daemon     /var/run/dbus/system_bus_socket
unix  2      [ ACC ]     STREAM     LISTENING     10698  960/master          public/cleanup
unix  2      [ ACC ]     STREAM     LISTENING     10705  960/master          private/tlsmgr
unix  2      [ ACC ]     STREAM     LISTENING     10709  960/master          private/rewrite
unix  2      [ ACC ]     STREAM     LISTENING     10713  960/master          private/bounce
unix  2      [ ACC ]     STREAM     LISTENING     10717  960/master          private/defer
unix  2      [ ACC ]     STREAM     LISTENING     10721  960/master          private/trace
unix  2      [ ACC ]     STREAM     LISTENING     10725  960/master          private/verify
unix  2      [ ACC ]     STREAM     LISTENING     10729  960/master          public/flush
unix  2      [ ACC ]     STREAM     LISTENING     10733  960/master          private/proxymap
unix  2      [ ACC ]     STREAM     LISTENING     10737  960/master          private/proxywrite
unix  2      [ ACC ]     STREAM     LISTENING     10741  960/master          private/smtp
unix  2      [ ACC ]     STREAM     LISTENING     10745  960/master          private/relay
unix  2      [ ACC ]     STREAM     LISTENING     10749  960/master          public/showq
unix  2      [ ACC ]     STREAM     LISTENING     10753  960/master          private/error
unix  2      [ ACC ]     STREAM     LISTENING     10757  960/master          private/retry
unix  2      [ ACC ]     STREAM     LISTENING     10761  960/master          private/discard
unix  2      [ ACC ]     STREAM     LISTENING     10765  960/master          private/local
unix  2      [ ACC ]     STREAM     LISTENING     10769  960/master          private/virtual
unix  2      [ ACC ]     STREAM     LISTENING     10773  960/master          private/lmtp
unix  2      [ ACC ]     STREAM     LISTENING     10777  960/master          private/anvil
unix  2      [ ACC ]     STREAM     LISTENING     10781  960/master          private/scache
#kill -9 21110
#kill -9 21415

参考链接

https://cwiki.apache.org/confluence/display/Hive/HBaseIntegration
https://cwiki.apache.org/confluence/display/Hive/StorageHandlers
http://stackoverflow.com/questions/23658600/error-while-creating-an-hive-table-on-top-of-an-hbase-table

posted @ 2014-10-24 21:32  Damian Zhou  阅读(4478)  评论(0编辑  收藏  举报