hbase 异常
1.org.apache.hadoop.hbase.ipc.CallTimeoutException
a.出现情况描述:使用java API进行hbase数据的scan操作,发现有的数据可以scan到,有的数据scan不到(报超时异常)。
b.异常原因:执行scan的数据量太多,在设置的超时时间段里面,程序没有找到想要的结果。
c.异常具体样例
1 2017-04-21 17:53:23,253 INFO [Thread-15] zookeeper.ZooKeeper: Initiating client connection, connectString=slave1:2181,slave2:2181,slave3:2181 sessionTimeout=90000 watcher=hconnection-0x222a0f2e0x0, quorum=slave1:2181,slave2:2181,slave3:2181, baseZNode=/hbase 2 2017-04-21 17:53:23,255 INFO [Thread-15-SendThread(slave2:2181)] zookeeper.ClientCnxn: Opening socket connection to server slave2/192.168.240.167:2181. Will not attempt to authenticate using SASL (unknown error) 3 2017-04-21 17:53:23,255 INFO [Thread-15-SendThread(slave2:2181)] zookeeper.ClientCnxn: Socket connection established, initiating session, client: /192.168.240.162:48001, server: slave2/192.168.240.167:2181 4 2017-04-21 17:53:23,256 INFO [Thread-15-SendThread(slave2:2181)] zookeeper.ClientCnxn: Session establishment complete on server slave2/192.168.240.167:2181, sessionid = 0x25b86981f993878, negotiated timeout = 40000 5 Exception in thread "Thread-15" java.lang.RuntimeException: org.apache.hadoop.hbase.client.RetriesExhaustedException: Failed after attempts=36, exceptions: 6 Fri Apr 21 17:54:23 CST 2017, null, java.net.SocketTimeoutException: callTimeout=60000, callDuration=60304: row '' on table 'noc_caller' at region=noc_caller,,1492616514434.09a155eeeba545376fa7f2d2f8e95a5a., hostname=slave3,60020,1492161485334, seqNum=1422155 7 8 at org.apache.hadoop.hbase.client.AbstractClientScanner$1.hasNext(AbstractClientScanner.java:97) 9 at com.xwtech.noc.database.HbaseConnection.scan(HbaseConnection.java:57) 10 at com.xwtech.noc.database.HbaseService.getRecords(HbaseService.java:35) 11 at com.xwtech.noc.database.RedisConnection.getSearchPage(RedisConnection.java:61) 12 at com.xwtech.noc.httpServer.LinkClientThread.run(LinkClientThread.java:40) 13 at java.lang.Thread.run(Thread.java:745) 14 Caused by: org.apache.hadoop.hbase.client.RetriesExhaustedException: Failed after attempts=36, exceptions: 15 Fri Apr 21 17:54:23 CST 2017, null, java.net.SocketTimeoutException: callTimeout=60000, callDuration=60304: row '' on table 'noc_caller' at region=noc_caller,,1492616514434.09a155eeeba545376fa7f2d2f8e95a5a., hostname=slave3,60020,1492161485334, seqNum=1422155 16 17 at org.apache.hadoop.hbase.client.RpcRetryingCallerWithReadReplicas.throwEnrichedException(RpcRetryingCallerWithReadReplicas.java:276) 18 at org.apache.hadoop.hbase.client.ScannerCallableWithReplicas.call(ScannerCallableWithReplicas.java:207) 19 at org.apache.hadoop.hbase.client.ScannerCallableWithReplicas.call(ScannerCallableWithReplicas.java:60) 20 at org.apache.hadoop.hbase.client.RpcRetryingCaller.callWithoutRetries(RpcRetryingCaller.java:200) 21 at org.apache.hadoop.hbase.client.RpcRetryingCaller.callWithoutRetries(RpcRetryingCaller.java:200) 22 at org.apache.hadoop.hbase.client.ClientScanner.call(ClientScanner.java:320) 23 at org.apache.hadoop.hbase.client.ClientScanner.loadCache(ClientScanner.java:403) 24 at org.apache.hadoop.hbase.client.ClientScanner.next(ClientScanner.java:364) 25 at org.apache.hadoop.hbase.client.AbstractClientScanner$1.hasNext(AbstractClientScanner.java:94) 26 ... 5 more 27 Caused by: java.net.SocketTimeoutException: callTimeout=60000, callDuration=60304: row '' on table 'noc_caller' at region=noc_caller,,1492616514434.09a155eeeba545376fa7f2d2f8e95a5a., hostname=slave3,60020,1492161485334, seqNum=1422155 28 at org.apache.hadoop.hbase.client.RpcRetryingCaller.callWithRetries(RpcRetryingCaller.java:159) 29 at org.apache.hadoop.hbase.client.ResultBoundedCompletionService$QueueingFuture.run(ResultBoundedCompletionService.java:65) 30 at java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1142) 31 at java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:617) 32 ... 1 more 33 Caused by: java.io.IOException: Call to slave3/192.168.240.161:60020 failed on local exception: org.apache.hadoop.hbase.ipc.CallTimeoutException: Call id=2, waitTime=60001, operationTimeout=60000 expired. 34 at org.apache.hadoop.hbase.ipc.AbstractRpcClient.wrapException(AbstractRpcClient.java:291) 35 at org.apache.hadoop.hbase.ipc.RpcClientImpl.call(RpcClientImpl.java:1273) 36 at org.apache.hadoop.hbase.ipc.AbstractRpcClient.callBlockingMethod(AbstractRpcClient.java:226) 37 at org.apache.hadoop.hbase.ipc.AbstractRpcClient$BlockingRpcChannelImplementation.callBlockingMethod(AbstractRpcClient.java:331) 38 at org.apache.hadoop.hbase.protobuf.generated.ClientProtos$ClientService$BlockingStub.scan(ClientProtos.java:34094) 39 at org.apache.hadoop.hbase.client.ScannerCallable.call(ScannerCallable.java:219) 40 at org.apache.hadoop.hbase.client.ScannerCallable.call(ScannerCallable.java:64) 41 at org.apache.hadoop.hbase.client.RpcRetryingCaller.callWithoutRetries(RpcRetryingCaller.java:200) 42 at org.apache.hadoop.hbase.client.ScannerCallableWithReplicas$RetryingRPC.call(ScannerCallableWithReplicas.java:360) 43 at org.apache.hadoop.hbase.client.ScannerCallableWithReplicas$RetryingRPC.call(ScannerCallableWithReplicas.java:334) 44 at org.apache.hadoop.hbase.client.ScannerCallableWithReplicas$RetryingRPC.call(ScannerCallableWithReplicas.java:334) 45 at org.apache.hadoop.hbase.client.RpcRetryingCaller.callWithRetries(RpcRetryingCaller.java:126) 46 ... 4 more 47 Caused by: org.apache.hadoop.hbase.ipc.CallTimeoutException: Call id=2, waitTime=60001, operationTimeout=60000 expired. 48 at org.apache.hadoop.hbase.ipc.Call.checkAndSetTimeout(Call.java:73) 49 at org.apache.hadoop.hbase.ipc.RpcClientImpl.call(RpcClientImpl.java:1247) 50 ... 13 more
d.解决方案
1. 增加超时的时间设置(默认为60s)
配置文件或者程序中设置相关参数
conf.setInt("hbase.client.operation.timeout", 60000); conf.setInt("hbase.rpc.timeout", 60000); conf.setInt("hbase.client.scanner.timeout.period", 60000); conf.setInt("mapreduce.task.timeout", 60000);
配置文件中设置,只要将上面的参数以及值设置到hbase-site.xml就可以了,这里就不写了
2.设置scan的开始和结束范围(这个是重点,初学者最容易忽视的,我查了大量的异常描述,就是没有提到这一点)
在数据量非常大是,一定要限制每一次的查询scan范围
1 scan.setStartRow(Bytes.toBytes()); 2 scan.setStopRow(Bytes.toBytes());
如果不设置查询范围,默认的查询会查询范围会冲整个hbase表的开始到结束。
3.如果查询范围已经无法进行缩小,只能增加时间,配置scan的参数等等,进行简单的优化了。