分布式结构化存储系统-HBase访问方式

            分布式结构化存储系统-HBase访问方式

                                     作者:尹正杰

版权声明:原创作品,谢绝转载!否则将追究法律责任。

 

  HBase提供了多种访问方式,包括HBase shell,HBase API,数据收集组件(比如Flume,Sqoop等),上层算框架以及Apache Phoenix等,本篇博客将详细介绍这几种方式。

 

一.HBase Shell

    HDFS提供了丰富的shell命令让用户更加容易管理HBase集群,你可以通过“$HBASE_HOME/bin/hbase shell”命令进入交互式命令后,并输入“help”查看所有命令:
          [hdfs@node101.yinzhengjie.org.cn ~]$ hbase shell        #进入到Hbase的shell命令行
          OpenJDK 64-Bit Server VM warning: Using incremental CMS is deprecated and will likely be removed in a future release
          19/05/23 15:23:36 INFO Configuration.deprecation: hadoop.native.lib is deprecated. Instead, use io.native.lib.available
          HBase Shell; enter 'help<RETURN>' for list of supported commands.
          Type "exit<RETURN>" to leave the HBase Shell
          Version 1.2.0-cdh5.15.1, rUnknown, Thu Aug 9 09:07:41 PDT 2018

          hbase(main):001:0> help                        #查看Hbase的帮助信息
          HBase Shell, version 1.2.0-cdh5.15.1, rUnknown, Thu Aug 9 09:07:41 PDT 2018
          Type 'help "COMMAND"', (e.g. 'help "get"' -- the quotes are necessary) for help on a specific command.
          Commands are grouped. Type 'help "COMMAND_GROUP"', (e.g. 'help "general"') for help on a command group.
          ......(此处省略,它会把所有命令进行分类的打印出来~点击下面一行可以查看完整输出)

  HBase shell命令集非常庞大,但常用的有两种,分别是DDL(Data Definition Language)和DML(Data Manipulation Language)。

[hdfs@node101.yinzhengjie.org.cn ~]$ hbase shell
OpenJDK 64-Bit Server VM warning: Using incremental CMS is deprecated and will likely be removed in a future release
19/05/23 15:23:36 INFO Configuration.deprecation: hadoop.native.lib is deprecated. Instead, use io.native.lib.available
HBase Shell; enter 'help<RETURN>' for list of supported commands.
Type "exit<RETURN>" to leave the HBase Shell
Version 1.2.0-cdh5.15.1, rUnknown, Thu Aug  9 09:07:41 PDT 2018

hbase(main):001:0> help
HBase Shell, version 1.2.0-cdh5.15.1, rUnknown, Thu Aug  9 09:07:41 PDT 2018
Type 'help "COMMAND"', (e.g. 'help "get"' -- the quotes are necessary) for help on a specific command.
Commands are grouped. Type 'help "COMMAND_GROUP"', (e.g. 'help "general"') for help on a command group.

COMMAND GROUPS:
  Group name: general
  Commands: status, table_help, version, whoami

  Group name: ddl
  Commands: alter, alter_async, alter_status, create, describe, disable, disable_all, drop, drop_all, enable, enable_all, exists, get_table, is_disabled, is_enabled, list, locate_region, show_filters

  Group name: namespace
  Commands: alter_namespace, create_namespace, describe_namespace, drop_namespace, list_namespace, list_namespace_tables

  Group name: dml
  Commands: append, count, delete, deleteall, get, get_counter, get_splits, incr, put, scan, truncate, truncate_preserve

  Group name: tools
  Commands: assign, balance_switch, balancer, balancer_enabled, catalogjanitor_enabled, catalogjanitor_run, catalogjanitor_switch, close_region, compact, compact_mob, compact_rs, flush, is_in_maintenance_mode, major_compact, major_compact_mob, merge_region, move, normalize, normalizer_enabled, normalizer_switch, split, splitormerge_enabled, splitormerge_switch, trace, unassign, wal_roll, zk_dump

  Group name: replication
  Commands: add_peer, append_peer_tableCFs, disable_peer, disable_table_replication, enable_peer, enable_table_replication, get_peer_config, list_peer_configs, list_peers, list_replicated_tables, remove_peer, remove_peer_tableCFs, set_peer_tableCFs, show_peer_tableCFs, update_peer_config

  Group name: snapshots
  Commands: clone_snapshot, delete_all_snapshot, delete_snapshot, list_snapshots, restore_snapshot, snapshot

  Group name: configuration
  Commands: update_all_config, update_config

  Group name: quotas
  Commands: list_quotas, set_quota

  Group name: security
  Commands: grant, list_security_capabilities, revoke, user_permission

  Group name: procedures
  Commands: abort_procedure, list_procedures

  Group name: visibility labels
  Commands: add_labels, clear_auths, get_auths, list_labels, set_auths, set_visibility

  Group name: rsgroup
  Commands: add_rsgroup, balance_rsgroup, get_rsgroup, get_server_rsgroup, get_table_rsgroup, list_rsgroups, move_servers_rsgroup, move_tables_rsgroup, remove_rsgroup

SHELL USAGE:
Quote all names in HBase Shell such as table and column names.  Commas delimit
command parameters.  Type <RETURN> after entering a command to run it.
Dictionaries of configuration used in the creation and alteration of tables are
Ruby Hashes. They look like this:

  {'key1' => 'value1', 'key2' => 'value2', ...}

and are opened and closed with curley-braces.  Key/values are delimited by the
'=>' character combination.  Usually keys are predefined constants such as
NAME, VERSIONS, COMPRESSION, etc.  Constants do not need to be quoted.  Type
'Object.constants' to see a (messy) list of all constants in the environment.

If you are using binary keys or values and need to enter them in the shell, use
double-quote'd hexadecimal representation. For example:

  hbase> get 't1', "key\x03\x3f\xcd"
  hbase> get 't1', "key\003\023\011"
  hbase> put 't1', "test\xef\xff", 'f1:', "\x01\x33\x40"

The HBase shell is the (J)Ruby IRB with the above HBase-specific commands added.
For more on the HBase Shell, see http://hbase.apache.org/book.html
hbase(main):002:0> 
[hdfs@node101.yinzhengjie.org.cn ~]$ hbase shell      #详细信息戳这里

1>.HBase的DDL(Data Definition Language)的相关命令

    DDL作用在HBase表(元信息)上的命令,主要包括如下命令:
    (1)create
        创建一张HBase新表,语法为:create '<table name>','<column family>'。
    (2)list
        列出HBase中所有表。
    (3)disable
        让一张HBase表下线(不可用在提供读写服务,但不会被删除)。
    (4)describe
        列出一张HBase的描述信息。
    (5)drop
        删除一张HBase表。
    
    注意,创建HBase表时只需要指定它所包含的column family,无需指定具体的列。

  举个例子:
    员工信息表中包含两个column family : personal(记录员工基本信息,比如名字,性别,家庭住址等)和office(记录员工工作信息,比如电话号码,工作地点,薪水等),则可以使用create命令创建该表(我们命名为employee)。
hbase(main):003:0> create 'employee','personal','office'
0 row(s) in 2.8610 seconds

=> Hbase::Table - employee
hbase(main):004:0>
hbase(main):003:0> create 'employee','personal','office'      #创建一张employee表,列族为'personal'和'office'
hbase(main):006:0> list
TABLE                                                                                                                                                                                                              
employee                                                                                                                                                                                                           
1 row(s) in 0.0370 seconds

=> ["employee"]
hbase(main):007:0> 
hbase(main):006:0> list                          #查看当前namespace下存在的表
hbase(main):002:0> describe 'employee'
Table employee is ENABLED                                                                                                                                                                                          
employee                                                                                                                                                                                                           
COLUMN FAMILIES DESCRIPTION                                                                                                                                                                                        
{NAME => 'office', BLOOMFILTER => 'ROW', VERSIONS => '1', IN_MEMORY => 'false', KEEP_DELETED_CELLS => 'FALSE', DATA_BLOCK_ENCODING => 'NONE', TTL => 'FOREVER', COMPRESSION => 'NONE', MIN_VERSIONS => '0', BLOCKCA
CHE => 'true', BLOCKSIZE => '65536', REPLICATION_SCOPE => '0'}                                                                                                                                                     
{NAME => 'personal', BLOOMFILTER => 'ROW', VERSIONS => '1', IN_MEMORY => 'false', KEEP_DELETED_CELLS => 'FALSE', DATA_BLOCK_ENCODING => 'NONE', TTL => 'FOREVER', COMPRESSION => 'NONE', MIN_VERSIONS => '0', BLOCK
CACHE => 'true', BLOCKSIZE => '65536', REPLICATION_SCOPE => '0'}                                                                                                                                                   
2 row(s) in 0.2510 seconds

hbase(main):003:0> 
hbase(main):002:0> describe 'employee'                  #查看employee表的详细信息

2>.HBase的DML(Data Manipulation Language)的相关命令

    DDL作用在数据上的命令,主要包括如下命令:
    (1)put
        往HBase表中的特定行写入一个cell value,语法为:put 'table name','rowkey','column family:colimn qualifier','value'
    (2)get
        获取HBase表中一个cell或一行的值,语法为:get 'table name','rowkey',[,{COLUMN => 'column family:column qualifier'}]
    (3)delete
        删除以昂中所有cell value,语法为:delete 'table name','rowkey'
    (4)deleteall
        删除一行中所有cell value,语法为:delete 'table name','rowkey'
    (5)scan
        给定一个初始rowkey和结束rowkey,扫描并返回该区间内的数据,语法为:scan 'table name'[,{filter1,filter2,...}],比如:scan 'hbase:meta', {COLUMNS => 'info:regioninfo',LIMIT=>10}
    (6)count
        返回HBase表中总的记录条数,语法为:count 'table name'
[hdfs@node101.yinzhengjie.org.cn ~]$ hbase shell
OpenJDK 64-Bit Server VM warning: Using incremental CMS is deprecated and will likely be removed in a future release
19/05/23 18:23:17 INFO Configuration.deprecation: hadoop.native.lib is deprecated. Instead, use io.native.lib.available
HBase Shell; enter 'help<RETURN>' for list of supported commands.
Type "exit<RETURN>" to leave the HBase Shell
Version 1.2.0-cdh5.15.1, rUnknown, Thu Aug  9 09:07:41 PDT 2018

hbase(main):001:0> scan 'hbase:meta', {COLUMNS => 'info:regioninfo',LIMIT=>10}
ROW                                                   COLUMN+CELL                                                                                                                                                  
 employee,,1558606190586.6d331651ab0bb8ccac5e602cdd2e column=info:regioninfo, timestamp=1558606192477, value={ENCODED => 6d331651ab0bb8ccac5e602cdd2ee197, NAME => 'employee,,1558606190586.6d331651ab0bb8ccac5e602
 e197.                                                cdd2ee197.', STARTKEY => '', ENDKEY => ''}                                                                                                                   
 hbase:namespace,,1558589850639.27b26f72bc8dabfb5f8da column=info:regioninfo, timestamp=1558590713877, value={ENCODED => 27b26f72bc8dabfb5f8dae587ad9cda7, NAME => 'hbase:namespace,,1558589850639.27b26f72bc8dabfb
 e587ad9cda7.                                         5f8dae587ad9cda7.', STARTKEY => '', ENDKEY => ''}                                                                                                            
2 row(s) in 0.4440 seconds

hbase(main):002:0> 
hbase(main):001:0> scan 'hbase:meta', {COLUMNS => 'info:regioninfo',LIMIT=>10}
[root@node101.yinzhengjie.org.cn ~]# cat add_employee.txt
put 'employee','000001','personal:name','zhengjie.yin'
put 'employee','000001','personal:gender','man'
put 'employee','000001','personal:phone','13052001314'
put 'employee','000001','personal:salary','1000000000'
put 'employee','000002','personal:name','Jason Yin'
put 'employee','000002','personal:gender','man'
put 'employee','000002','personal:phone','7474741'
put 'employee','000002','personal:salary','8888888888888'
[root@node101.yinzhengjie.org.cn ~]# 
[root@node101.yinzhengjie.org.cn ~]# cat add_employee.txt              #我们将HBase shell命令提前写入到一个文件中
[root@node101.yinzhengjie.org.cn ~]# hbase shell ./add_employee.txt 
Java HotSpot(TM) 64-Bit Server VM warning: Using incremental CMS is deprecated and will likely be removed in a future release
19/05/23 18:35:29 INFO Configuration.deprecation: hadoop.native.lib is deprecated. Instead, use io.native.lib.available
0 row(s) in 0.5430 seconds

0 row(s) in 0.0090 seconds

0 row(s) in 0.0390 seconds

0 row(s) in 0.0040 seconds

0 row(s) in 0.0040 seconds

0 row(s) in 0.0040 seconds

0 row(s) in 0.0040 seconds

0 row(s) in 0.0040 seconds

HBase Shell; enter 'help<RETURN>' for list of supported commands.
Type "exit<RETURN>" to leave the HBase Shell
Version 1.2.0-cdh5.15.1, rUnknown, Thu Aug  9 09:07:41 PDT 2018

hbase(main):001:0>
[root@node101.yinzhengjie.org.cn ~]# hbase shell ./add_employee.txt        #以非交互模式执行我们在文件中定义的命令
hbase(main):001:0> scan 'employee'
ROW                                                   COLUMN+CELL                                                                                                                                                 
 000001                                               column=personal:gender, timestamp=1558607733794, value=man                                                                                                  
 000001                                               column=personal:name, timestamp=1558607733715, value=zhengjie.yin                                                                                           
 000001                                               column=personal:phone, timestamp=1558607733839, value=13052001314                                                                                           
 000001                                               column=personal:salary, timestamp=1558607733845, value=1000000000                                                                                           
 000002                                               column=personal:gender, timestamp=1558607733854, value=man                                                                                                  
 000002                                               column=personal:name, timestamp=1558607733850, value=Jason Yin                                                                                              
 000002                                               column=personal:phone, timestamp=1558607733860, value=7474741                                                                                               
 000002                                               column=personal:salary, timestamp=1558607733865, value=8888888888888                                                                                        
2 row(s) in 0.1670 seconds

hbase(main):002:0> 
hbase(main):001:0> scan 'employee'                            #使用scan命令查看表中数据

 

二.HBase API

    对应于HBase shell,HBase也提供了两类编程API,一类是HBase表操作API,对应Java类是org.apache.hadoop.hbase.client.HBaseAdmin;另一类是数据读写API,对应Java类org.apache.hadoop.hbase.client.HTable.
    
    注意,这两类的构造函数已在0.99x版本过期,推荐使用Java类org.apache.hadoop.hbase.client.Connection中getAdmin()和getTable()两个方法获取这两个类对象。

1>.HBase表操作API

    所有表操作API均封装在Java类org.apache.hadoop.hbase.client.HBaseAdmin中,部分API如下图所示:

    接下来我们用Java API创建一张employee表,代码如下:

 

2>.数据读写API 

    所有数据读写API均封装在Java类org.apache.hadoop.hbase.client.HTable中,部分API如下图所示:

    比如我们用Java API将数据写入上面用API创建的表中,代码如下:

 

三.数据收集组件

    我们可以使用数据收集中间件比如Flume和Sqoop,它们均可以将数据以预定格式导入HBase。

1>.Flume

  Flume提供了HBase Sink,能够将收集到的数据直接写入HBase中,且自带了灵活配置参数,可设置超时时间,批大小,序列化方式等。
    
  关于Flume HBase Sink更详尽的配置参数,可参考官方文档:https://flume.apache.org/releases/content/1.9.0/FlumeUserGuide.html#hbasesinks.   如下图所示,Flume名称为a1的Agent,将数据写入HBase表user中的名为bahavior列簇中。

2>.Sqoop 

  Sqoop允许用户指定数据写入HBase的表名,列簇名等。

  如下图所示,使用Sqool将MySQL数据库中user表导入到HBase表user中命名为bahavior列簇中。

 

四.计算引擎

    HBase 提供了TableInputFormat和TableOutputFormat两个组件供各类计算引擎并行读取或写入HBase中的数据,其中,TableInputFormat以HBase Region为单位划分数据,每个Region会被映射成一个InputSplit,可被一个任务处理;TableOutputFormat可讲数据插入到HBase中。

    用户也可以直接使用SQL访问HBase中的数据,查询引擎Hive,Impala即Presto等对HBase有良好的支持。由于HBase中存储的并非标准关系型数据,因此,使用SQL查询时需要将HBase中的表映射到一个关系型数据表中。

 

五.Apache Phoenix

  Apache Phoenix是一种SQL ON HBase的实现方案,它基于HBase构建了一个分布式关系型数据库,能够将SQL转换成一系列HBase scan操作,并以JDBC结果集的方式将结果返回给用户。

  Apache Phoenix具有以下特点:
    (1)嵌入式的JDBC驱动,实现了大部分的java.sql接口;
    (2)完善的查询支持,可以使用多个谓词以及优化的扫描件;
    (3)DDL支持,通过CREATE TABLE,DROP TABLE以及ALTER TABLE来添加/删除列;
    (4)DML支持,用于数据查询的SELECT(支持group by,sort,join等),用于逐行插入UOSERT VALUES,用于相同或不同表之间大量数据传输的UPSERT SELECT,用于删除行的DELETE;
    (5)支持二级索引;
    (6)支持用户自定义函数;
    (7)通过客户端的批处理实现有限的事务支持;
    (8)与MapReduce,Spark,Flume等开源系统集成;
    (9)紧跟ANSI SQL标准;

  Apache Phoenix自带查询优化引擎,结合使用HBase Comprocessor,相比于直接使用HBase API,能够达到更低的访问延迟和更高的性能,目前被不少互联网公司使用。

博主推荐阅读:
  Apache Phoenix官网      :  http://phoenix.apache.org/
   Hbase Coprocessor相关链接  : https://hbase.apache.org/book.html#cp 

 

 

 

 

 

 

 

posted @ 2019-05-16 23:05  尹正杰  阅读(536)  评论(0编辑  收藏  举报