分布式结构化存储系统-HBase访问方式
分布式结构化存储系统-HBase访问方式
作者:尹正杰
版权声明:原创作品,谢绝转载!否则将追究法律责任。
HBase提供了多种访问方式,包括HBase shell,HBase API,数据收集组件(比如Flume,Sqoop等),上层算框架以及Apache Phoenix等,本篇博客将详细介绍这几种方式。
一.HBase Shell
HDFS提供了丰富的shell命令让用户更加容易管理HBase集群,你可以通过“$HBASE_HOME/bin/hbase shell”命令进入交互式命令后,并输入“help”查看所有命令: [hdfs@node101.yinzhengjie.org.cn ~]$ hbase shell #进入到Hbase的shell命令行 OpenJDK 64-Bit Server VM warning: Using incremental CMS is deprecated and will likely be removed in a future release 19/05/23 15:23:36 INFO Configuration.deprecation: hadoop.native.lib is deprecated. Instead, use io.native.lib.available HBase Shell; enter 'help<RETURN>' for list of supported commands. Type "exit<RETURN>" to leave the HBase Shell Version 1.2.0-cdh5.15.1, rUnknown, Thu Aug 9 09:07:41 PDT 2018 hbase(main):001:0> help #查看Hbase的帮助信息 HBase Shell, version 1.2.0-cdh5.15.1, rUnknown, Thu Aug 9 09:07:41 PDT 2018 Type 'help "COMMAND"', (e.g. 'help "get"' -- the quotes are necessary) for help on a specific command. Commands are grouped. Type 'help "COMMAND_GROUP"', (e.g. 'help "general"') for help on a command group. ......(此处省略,它会把所有命令进行分类的打印出来~点击下面一行可以查看完整输出)
HBase shell命令集非常庞大,但常用的有两种,分别是DDL(Data Definition Language)和DML(Data Manipulation Language)。
[hdfs@node101.yinzhengjie.org.cn ~]$ hbase shell OpenJDK 64-Bit Server VM warning: Using incremental CMS is deprecated and will likely be removed in a future release 19/05/23 15:23:36 INFO Configuration.deprecation: hadoop.native.lib is deprecated. Instead, use io.native.lib.available HBase Shell; enter 'help<RETURN>' for list of supported commands. Type "exit<RETURN>" to leave the HBase Shell Version 1.2.0-cdh5.15.1, rUnknown, Thu Aug 9 09:07:41 PDT 2018 hbase(main):001:0> help HBase Shell, version 1.2.0-cdh5.15.1, rUnknown, Thu Aug 9 09:07:41 PDT 2018 Type 'help "COMMAND"', (e.g. 'help "get"' -- the quotes are necessary) for help on a specific command. Commands are grouped. Type 'help "COMMAND_GROUP"', (e.g. 'help "general"') for help on a command group. COMMAND GROUPS: Group name: general Commands: status, table_help, version, whoami Group name: ddl Commands: alter, alter_async, alter_status, create, describe, disable, disable_all, drop, drop_all, enable, enable_all, exists, get_table, is_disabled, is_enabled, list, locate_region, show_filters Group name: namespace Commands: alter_namespace, create_namespace, describe_namespace, drop_namespace, list_namespace, list_namespace_tables Group name: dml Commands: append, count, delete, deleteall, get, get_counter, get_splits, incr, put, scan, truncate, truncate_preserve Group name: tools Commands: assign, balance_switch, balancer, balancer_enabled, catalogjanitor_enabled, catalogjanitor_run, catalogjanitor_switch, close_region, compact, compact_mob, compact_rs, flush, is_in_maintenance_mode, major_compact, major_compact_mob, merge_region, move, normalize, normalizer_enabled, normalizer_switch, split, splitormerge_enabled, splitormerge_switch, trace, unassign, wal_roll, zk_dump Group name: replication Commands: add_peer, append_peer_tableCFs, disable_peer, disable_table_replication, enable_peer, enable_table_replication, get_peer_config, list_peer_configs, list_peers, list_replicated_tables, remove_peer, remove_peer_tableCFs, set_peer_tableCFs, show_peer_tableCFs, update_peer_config Group name: snapshots Commands: clone_snapshot, delete_all_snapshot, delete_snapshot, list_snapshots, restore_snapshot, snapshot Group name: configuration Commands: update_all_config, update_config Group name: quotas Commands: list_quotas, set_quota Group name: security Commands: grant, list_security_capabilities, revoke, user_permission Group name: procedures Commands: abort_procedure, list_procedures Group name: visibility labels Commands: add_labels, clear_auths, get_auths, list_labels, set_auths, set_visibility Group name: rsgroup Commands: add_rsgroup, balance_rsgroup, get_rsgroup, get_server_rsgroup, get_table_rsgroup, list_rsgroups, move_servers_rsgroup, move_tables_rsgroup, remove_rsgroup SHELL USAGE: Quote all names in HBase Shell such as table and column names. Commas delimit command parameters. Type <RETURN> after entering a command to run it. Dictionaries of configuration used in the creation and alteration of tables are Ruby Hashes. They look like this: {'key1' => 'value1', 'key2' => 'value2', ...} and are opened and closed with curley-braces. Key/values are delimited by the '=>' character combination. Usually keys are predefined constants such as NAME, VERSIONS, COMPRESSION, etc. Constants do not need to be quoted. Type 'Object.constants' to see a (messy) list of all constants in the environment. If you are using binary keys or values and need to enter them in the shell, use double-quote'd hexadecimal representation. For example: hbase> get 't1', "key\x03\x3f\xcd" hbase> get 't1', "key\003\023\011" hbase> put 't1', "test\xef\xff", 'f1:', "\x01\x33\x40" The HBase shell is the (J)Ruby IRB with the above HBase-specific commands added. For more on the HBase Shell, see http://hbase.apache.org/book.html hbase(main):002:0>
1>.HBase的DDL(Data Definition Language)的相关命令
DDL作用在HBase表(元信息)上的命令,主要包括如下命令:
(1)create
创建一张HBase新表,语法为:create '<table name>','<column family>'。
(2)list
列出HBase中所有表。
(3)disable
让一张HBase表下线(不可用在提供读写服务,但不会被删除)。
(4)describe
列出一张HBase的描述信息。
(5)drop
删除一张HBase表。
注意,创建HBase表时只需要指定它所包含的column family,无需指定具体的列。
举个例子:
员工信息表中包含两个column family : personal(记录员工基本信息,比如名字,性别,家庭住址等)和office(记录员工工作信息,比如电话号码,工作地点,薪水等),则可以使用create命令创建该表(我们命名为employee)。
hbase(main):003:0> create 'employee','personal','office' 0 row(s) in 2.8610 seconds => Hbase::Table - employee hbase(main):004:0>
hbase(main):006:0> list TABLE employee 1 row(s) in 0.0370 seconds => ["employee"] hbase(main):007:0>
hbase(main):002:0> describe 'employee' Table employee is ENABLED employee COLUMN FAMILIES DESCRIPTION {NAME => 'office', BLOOMFILTER => 'ROW', VERSIONS => '1', IN_MEMORY => 'false', KEEP_DELETED_CELLS => 'FALSE', DATA_BLOCK_ENCODING => 'NONE', TTL => 'FOREVER', COMPRESSION => 'NONE', MIN_VERSIONS => '0', BLOCKCA CHE => 'true', BLOCKSIZE => '65536', REPLICATION_SCOPE => '0'} {NAME => 'personal', BLOOMFILTER => 'ROW', VERSIONS => '1', IN_MEMORY => 'false', KEEP_DELETED_CELLS => 'FALSE', DATA_BLOCK_ENCODING => 'NONE', TTL => 'FOREVER', COMPRESSION => 'NONE', MIN_VERSIONS => '0', BLOCK CACHE => 'true', BLOCKSIZE => '65536', REPLICATION_SCOPE => '0'} 2 row(s) in 0.2510 seconds hbase(main):003:0>
2>.HBase的DML(Data Manipulation Language)的相关命令
DDL作用在数据上的命令,主要包括如下命令:
(1)put
往HBase表中的特定行写入一个cell value,语法为:put 'table name','rowkey','column family:colimn qualifier','value'
(2)get
获取HBase表中一个cell或一行的值,语法为:get 'table name','rowkey',[,{COLUMN => 'column family:column qualifier'}]
(3)delete
删除以昂中所有cell value,语法为:delete 'table name','rowkey'
(4)deleteall
删除一行中所有cell value,语法为:delete 'table name','rowkey'
(5)scan
给定一个初始rowkey和结束rowkey,扫描并返回该区间内的数据,语法为:scan 'table name'[,{filter1,filter2,...}],比如:scan 'hbase:meta', {COLUMNS => 'info:regioninfo',LIMIT=>10}
(6)count
返回HBase表中总的记录条数,语法为:count 'table name'
[hdfs@node101.yinzhengjie.org.cn ~]$ hbase shell OpenJDK 64-Bit Server VM warning: Using incremental CMS is deprecated and will likely be removed in a future release 19/05/23 18:23:17 INFO Configuration.deprecation: hadoop.native.lib is deprecated. Instead, use io.native.lib.available HBase Shell; enter 'help<RETURN>' for list of supported commands. Type "exit<RETURN>" to leave the HBase Shell Version 1.2.0-cdh5.15.1, rUnknown, Thu Aug 9 09:07:41 PDT 2018 hbase(main):001:0> scan 'hbase:meta', {COLUMNS => 'info:regioninfo',LIMIT=>10} ROW COLUMN+CELL employee,,1558606190586.6d331651ab0bb8ccac5e602cdd2e column=info:regioninfo, timestamp=1558606192477, value={ENCODED => 6d331651ab0bb8ccac5e602cdd2ee197, NAME => 'employee,,1558606190586.6d331651ab0bb8ccac5e602 e197. cdd2ee197.', STARTKEY => '', ENDKEY => ''} hbase:namespace,,1558589850639.27b26f72bc8dabfb5f8da column=info:regioninfo, timestamp=1558590713877, value={ENCODED => 27b26f72bc8dabfb5f8dae587ad9cda7, NAME => 'hbase:namespace,,1558589850639.27b26f72bc8dabfb e587ad9cda7. 5f8dae587ad9cda7.', STARTKEY => '', ENDKEY => ''} 2 row(s) in 0.4440 seconds hbase(main):002:0>
[root@node101.yinzhengjie.org.cn ~]# cat add_employee.txt put 'employee','000001','personal:name','zhengjie.yin' put 'employee','000001','personal:gender','man' put 'employee','000001','personal:phone','13052001314' put 'employee','000001','personal:salary','1000000000' put 'employee','000002','personal:name','Jason Yin' put 'employee','000002','personal:gender','man' put 'employee','000002','personal:phone','7474741' put 'employee','000002','personal:salary','8888888888888' [root@node101.yinzhengjie.org.cn ~]#
[root@node101.yinzhengjie.org.cn ~]# hbase shell ./add_employee.txt Java HotSpot(TM) 64-Bit Server VM warning: Using incremental CMS is deprecated and will likely be removed in a future release 19/05/23 18:35:29 INFO Configuration.deprecation: hadoop.native.lib is deprecated. Instead, use io.native.lib.available 0 row(s) in 0.5430 seconds 0 row(s) in 0.0090 seconds 0 row(s) in 0.0390 seconds 0 row(s) in 0.0040 seconds 0 row(s) in 0.0040 seconds 0 row(s) in 0.0040 seconds 0 row(s) in 0.0040 seconds 0 row(s) in 0.0040 seconds HBase Shell; enter 'help<RETURN>' for list of supported commands. Type "exit<RETURN>" to leave the HBase Shell Version 1.2.0-cdh5.15.1, rUnknown, Thu Aug 9 09:07:41 PDT 2018 hbase(main):001:0>
hbase(main):001:0> scan 'employee' ROW COLUMN+CELL 000001 column=personal:gender, timestamp=1558607733794, value=man 000001 column=personal:name, timestamp=1558607733715, value=zhengjie.yin 000001 column=personal:phone, timestamp=1558607733839, value=13052001314 000001 column=personal:salary, timestamp=1558607733845, value=1000000000 000002 column=personal:gender, timestamp=1558607733854, value=man 000002 column=personal:name, timestamp=1558607733850, value=Jason Yin 000002 column=personal:phone, timestamp=1558607733860, value=7474741 000002 column=personal:salary, timestamp=1558607733865, value=8888888888888 2 row(s) in 0.1670 seconds hbase(main):002:0>
二.HBase API
对应于HBase shell,HBase也提供了两类编程API,一类是HBase表操作API,对应Java类是org.apache.hadoop.hbase.client.HBaseAdmin;另一类是数据读写API,对应Java类org.apache.hadoop.hbase.client.HTable.
注意,这两类的构造函数已在0.99x版本过期,推荐使用Java类org.apache.hadoop.hbase.client.Connection中getAdmin()和getTable()两个方法获取这两个类对象。
1>.HBase表操作API
所有表操作API均封装在Java类org.apache.hadoop.hbase.client.HBaseAdmin中,部分API如下图所示:
接下来我们用Java API创建一张employee表,代码如下:
2>.数据读写API
所有数据读写API均封装在Java类org.apache.hadoop.hbase.client.HTable中,部分API如下图所示:
比如我们用Java API将数据写入上面用API创建的表中,代码如下:
三.数据收集组件
我们可以使用数据收集中间件比如Flume和Sqoop,它们均可以将数据以预定格式导入HBase。
1>.Flume
Flume提供了HBase Sink,能够将收集到的数据直接写入HBase中,且自带了灵活配置参数,可设置超时时间,批大小,序列化方式等。
关于Flume HBase Sink更详尽的配置参数,可参考官方文档:https://flume.apache.org/releases/content/1.9.0/FlumeUserGuide.html#hbasesinks.
如下图所示,Flume名称为a1的Agent,将数据写入HBase表user中的名为bahavior列簇中。
2>.Sqoop
Sqoop允许用户指定数据写入HBase的表名,列簇名等。
如下图所示,使用Sqool将MySQL数据库中user表导入到HBase表user中命名为bahavior列簇中。
四.计算引擎
HBase 提供了TableInputFormat和TableOutputFormat两个组件供各类计算引擎并行读取或写入HBase中的数据,其中,TableInputFormat以HBase Region为单位划分数据,每个Region会被映射成一个InputSplit,可被一个任务处理;TableOutputFormat可讲数据插入到HBase中。
用户也可以直接使用SQL访问HBase中的数据,查询引擎Hive,Impala即Presto等对HBase有良好的支持。由于HBase中存储的并非标准关系型数据,因此,使用SQL查询时需要将HBase中的表映射到一个关系型数据表中。
五.Apache Phoenix
Apache Phoenix是一种SQL ON HBase的实现方案,它基于HBase构建了一个分布式关系型数据库,能够将SQL转换成一系列HBase scan操作,并以JDBC结果集的方式将结果返回给用户。 Apache Phoenix具有以下特点: (1)嵌入式的JDBC驱动,实现了大部分的java.sql接口; (2)完善的查询支持,可以使用多个谓词以及优化的扫描件; (3)DDL支持,通过CREATE TABLE,DROP TABLE以及ALTER TABLE来添加/删除列; (4)DML支持,用于数据查询的SELECT(支持group by,sort,join等),用于逐行插入UOSERT VALUES,用于相同或不同表之间大量数据传输的UPSERT SELECT,用于删除行的DELETE; (5)支持二级索引; (6)支持用户自定义函数; (7)通过客户端的批处理实现有限的事务支持; (8)与MapReduce,Spark,Flume等开源系统集成; (9)紧跟ANSI SQL标准; Apache Phoenix自带查询优化引擎,结合使用HBase Comprocessor,相比于直接使用HBase API,能够达到更低的访问延迟和更高的性能,目前被不少互联网公司使用。 博主推荐阅读: Apache Phoenix官网 : http://phoenix.apache.org/ Hbase Coprocessor相关链接 : https://hbase.apache.org/book.html#cp
本文来自博客园,作者:尹正杰,转载请注明原文链接:https://www.cnblogs.com/yinzhengjie/p/10878681.html,个人微信: "JasonYin2020"(添加时请备注来源及意图备注,有偿付费)
当你的才华还撑不起你的野心的时候,你就应该静下心来学习。当你的能力还驾驭不了你的目标的时候,你就应该沉下心来历练。问问自己,想要怎样的人生。