第三周总结(Java连接HBase的正确方法及Connection创建步骤)
本周做了软件设计的作业,同时学习了HBASE数据库的连接的相关知识点。
Connection是什么?
常见的使用Connection的错误方法有:
自己实现一个Connection对象的资源池,每次使用都从资源池中取出一个Connection对象;
每个线程一个Connection对象。
每次访问HBase的时候临时创建一个Connection对象,使用完之后调用close关闭连接。
从这些做法来看,显然是把Connection对象当成了单机数据库里面的连接对象来用了。然而作为分布式数据库,HBase客户端需要和多个服务器中的不同服务角色建立连接,所以HBase客户端中的Connection对象并不是简单对应一个socket连接。HBase的API文档当中对Connection的定义是:
A cluster connection encapsulating lower level individual connections to actual servers and a connection to zookeeper.
我们知道,HBase clinet要连接三个不同的服务角色:
Zookeeper:主要用于获得meta-region位置,集群Id、master等信息。
HBase Master:主要用于执行HBaseAdmin接口的一些操作,例如建表等。
HBase RegionServer:用于读、写数据。
Connection 创建连接步骤及代码解析
HBase客户端默认的是连接池大小是1,也就是每个RegionServer 1个连接。如果应用需要使用更大的连接池或指定其他的资源池类型,也可以通过修改配置实现。
config.set("hbase.client.ipc.pool.type",...); config.set("hbase.client.ipc.pool.size",...); connection = ConnectionFactory.createConnection(config);
Connection创建RpcClient的核心入口:
ConnectionImplementation(Configuration conf, ExecutorService pool, User user) throws IOException { ... try { ... this.rpcClient = RpcClientFactory.createClient(this.conf, this.clusterId, this.metrics); ... } catch (Throwable e) { // avoid leaks: registry, rpcClient, ... LOG.debug("connection construction failed", e); close(); throw e; } }
RpcClient使用PoolMap数据结构存储客户端到HBase服务器之间的连接映射,PoolMap封装ConcurrentHashMap结构,其中key是ConnectionId[new ConnectionId(ticket, md.getService().getName(), addr)],value是RpcConnection对象的资源池。
protected final PoolMap<ConnectionId, T> connections; /** * Construct an IPC client for the cluster <code>clusterId</code> * @param conf configuration * @param clusterId the cluster id * @param localAddr client socket bind address. * @param metrics the connection metrics */ public AbstractRpcClient(Configuration conf, String clusterId, SocketAddress localAddr, MetricsConnection metrics) { ... this.connections = new PoolMap<>(getPoolType(conf), getPoolSize(conf)); ... }
当HBase需要连接一个服务器时,首先会根据ConnectionId找到对应的连接池,然后从连接池中取出一个连接对象,获取连接的核心实现:
/** * Get a connection from the pool, or create a new one and add it to the pool. Connections to a * given host/port are reused. */ private T getConnection(ConnectionId remoteId) throws IOException { if (failedServers.isFailedServer(remoteId.getAddress())) { if (LOG.isDebugEnabled()) { LOG.debug("Not trying to connect to " + remoteId.address + " this server is in the failed servers list"); } throw new FailedServerException( "This server is in the failed servers list: " + remoteId.address); } T conn; synchronized (connections) { if (!running) { throw new StoppedRpcClientException(); } conn = connections.get(remoteId); if (conn == null) { conn = createConnection(remoteId); connections.put(remoteId, conn); } conn.setLastTouched(EnvironmentEdgeManager.currentTime()); } return conn; }
连接池根据ConnectionId获取不到连接则创建RpcConnection的具体实现:
PS: HBASE2.0后使用基于Netty框架建立RPC连接,2.0之前使用的还是基于Socket原生连接,此部分源码可能不同版本不一样。
NettyRpcServer
HBase2.0 开始默认使用NettyRpcServer
使用Netty替代HBase原生的RPC server,大大提升了HBaseRPC的吞吐能力,降低了延迟
protected NettyRpcConnection createConnection(ConnectionId remoteId) throws IOException { return new NettyRpcConnection(this, remoteId); } NettyRpcConnection(NettyRpcClient rpcClient, ConnectionId remoteId) throws IOException { super(rpcClient.conf, AbstractRpcClient.WHEEL_TIMER, remoteId, rpcClient.clusterId, rpcClient.userProvider.isHBaseSecurityEnabled(), rpcClient.codec, rpcClient.compressor); this.rpcClient = rpcClient; byte connectionHeaderPreamble = getConnectionHeaderPreamble(); this.connectionHeaderPreamble = Unpooled.directBuffer(connectionHeaderPreamble.length).writeBytes(connectionHeaderPreamble); ConnectionHeader header = getConnectionHeader(); this.connectionHeaderWithLength = Unpooled.directBuffer(4 + header.getSerializedSize()); this.connectionHeaderWithLength.writeInt(header.getSerializedSize()); header.writeTo(new ByteBufOutputStream(this.connectionHeaderWithLength)); } protected RpcConnection(Configuration conf, HashedWheelTimer timeoutTimer, ConnectionId remoteId, String clusterId, boolean isSecurityEnabled, Codec codec, CompressionCodec compressor) throws IOException { if (remoteId.getAddress().isUnresolved()) { throw new UnknownHostException("unknown host: " + remoteId.getAddress().getHostName()); } this.timeoutTimer = timeoutTimer; this.codec = codec; this.compressor = compressor; this.conf = conf; UserGroupInformation ticket = remoteId.getTicket().getUGI(); SecurityInfo securityInfo = SecurityInfo.getInfo(remoteId.getServiceName()); this.useSasl = isSecurityEnabled; Token<? extends TokenIdentifier> token = null; String serverPrincipal = null; if (useSasl && securityInfo != null) { AuthenticationProtos.TokenIdentifier.Kind tokenKind = securityInfo.getTokenKind(); if (tokenKind != null) { TokenSelector<? extends TokenIdentifier> tokenSelector = AbstractRpcClient.TOKEN_HANDLERS .get(tokenKind); if (tokenSelector != null) { token = tokenSelector.selectToken(new Text(clusterId), ticket.getTokens()); } else if (LOG.isDebugEnabled()) { LOG.debug("No token selector found for type " + tokenKind); } } String serverKey = securityInfo.getServerPrincipal(); if (serverKey == null) { throw new IOException("Can't obtain server Kerberos config key from SecurityInfo"); } serverPrincipal = SecurityUtil.getServerPrincipal(conf.get(serverKey), remoteId.address.getAddress().getCanonicalHostName().toLowerCase()); if (LOG.isDebugEnabled()) { LOG.debug("RPC Server Kerberos principal name for service=" + remoteId.getServiceName() + " is " + serverPrincipal); } } this.token = token; this.serverPrincipal = serverPrincipal; if (!useSasl) { authMethod = AuthMethod.SIMPLE; } else if (token != null) { authMethod = AuthMethod.DIGEST; } else { authMethod = AuthMethod.KERBEROS; } // Log if debug AND non-default auth, else if trace enabled. // No point logging obvious. if ((LOG.isDebugEnabled() && !authMethod.equals(AuthMethod.SIMPLE)) || LOG.isTraceEnabled()) { // Only log if not default auth. LOG.debug("Use " + authMethod + " authentication for service " + remoteId.serviceName + ", sasl=" + useSasl); } reloginMaxBackoff = conf.getInt("hbase.security.relogin.maxbackoff", 5000); this.remoteId = remoteId; }