Springboot 系列 (25) - Springboot+HBase 大数据存储(三)| HBase Shell
HBase Shell 是 HBase 的一套命令行工具,可以使用 shell 命令在 HBase 所在的本机上查询/操作 HBase 的数据。
HBase Shell 相关文档:https://hbase.apache.org/book.html#shell
在 “Springboot 系列 (24) - Springboot+HBase 大数据存储(二)| 安装配置 Apache HBase 和 Apache Zookeeper” 里我们安装配置了 Apache HBase 和 Apache Zookeeper。
本文将介绍 HBase Shell 的使用方式。
1. 系统环境
操作系统:Ubuntu 20.04
Java 版本:openjdk 11.0.18
Hadoop 版本:3.2.2
Zookeeper 版本:3.6.3
HBase 版本:2.4.4
HBase 所在路径:~/apps/hbase-2.4.4
本文使用的 HBase 部署在伪分布式 Hadoop 架构上(主机名:hadoop-master-vm),在 HBase + Zookeeper (独立的) 模式下运行,Zookeeper 使用端口 2182。
2. 运行 HBase Shell
$ cd ~/apps
$ ./hbase-2.4.4/bin/hbase shell
# help 命令 hbase:001:0> help HBase Shell, version 2.4.4, r20e7ba45b0c3affdc0c06b1a0e5cbddd1b2d8d18, Mon Jun 7 15:31:55 PDT 2021 Type 'help "COMMAND"', (e.g. 'help "get"' -- the quotes are necessary) for help on a specific command. Commands are grouped. Type 'help "COMMAND_GROUP"', (e.g. 'help "general"') for help on a command group. ... # 查看 HBase 版本 hbase:002:0> version 2.4.4, r20e7ba45b0c3affdc0c06b1a0e5cbddd1b2d8d18, Mon Jun 7 15:31:55 PDT 2021 Took 0.0002 seconds # 查看 HBase 状态 hbase:003:0> status 1 active master, 0 backup masters, 1 servers, 0 dead, 1.0000 average load Took 0.4128 seconds # 退出 Shell hbase:003:0> exit # 或 quit
3. Table 操作
$ cd ~/apps
$ ./hbase-2.4.4/bin/hbase shell
# 查看所有表 hbase:001:0> list TABLE 0 row(s) Took 0.3782 seconds => [] # 创建 test 表,一个列族 cf1 hbase:002:0> create 'test', 'cf1' Created table test Took 1.2217 seconds => Hbase::Table - test # 查看表描述信息 hbase:003:0> describe 'test' Table test is ENABLED test COLUMN FAMILIES DESCRIPTION {NAME => 'cf1', BLOOMFILTER => 'ROW', IN_MEMORY => 'false', VERSIONS => '1', KEEP_DELETED_CELLS => 'FALSE', DATA_BLOC K_ENCODING => 'NONE', COMPRESSION => 'NONE', TTL => 'FOREVER', MIN_VERSIONS => '0', BLOCKCACHE => 'true', BLOCKSIZE = > '65536', REPLICATION_SCOPE => '0'} 1 row(s) Quota is disabled Took 0.0453 seconds # 添加 2 个列族 hbase:004:0> alter 'test', 'cf2', 'cf3' Updating all regions with the new schema... 1/1 regions updated. Done. Took 2.3449 seconds # 删除 cf3 列族 hbase:005:0> alter 'test', {NAME => 'cf3', METHOD => 'delete'} Updating all regions with the new schema... 1/1 regions updated. Done. Took 2.1434 seconds # 修改 cf1 列族的 VERSIONS 为 2,默认是 1 (即只保留最后一个版本) hbase:006:0> alter 'test', {NAME => 'cf1', VERSIONS => 2} Updating all regions with the new schema... 1/1 regions updated. Done. Took 2.1434 seconds # 查看表描述信息 hbase:007:0> describe 'test' Table test is ENABLED test COLUMN FAMILIES DESCRIPTION {NAME => 'cf1', BLOOMFILTER => 'ROW', IN_MEMORY => 'false', VERSIONS => '2', KEEP_DELETED_CELLS => 'FALSE', DATA_BLOC K_ENCODING => 'NONE', COMPRESSION => 'NONE', TTL => 'FOREVER', MIN_VERSIONS => '0', BLOCKCACHE => 'true', BLOCKSIZE = > '65536', REPLICATION_SCOPE => '0'} {NAME => 'cf2', BLOOMFILTER => 'ROW', IN_MEMORY => 'false', VERSIONS => '1', KEEP_DELETED_CELLS => 'FALSE', DATA_BLOC K_ENCODING => 'NONE', COMPRESSION => 'NONE', TTL => 'FOREVER', MIN_VERSIONS => '0', BLOCKCACHE => 'true', BLOCKSIZE = > '65536', REPLICATION_SCOPE => '0'} 2 row(s) Quota is disabled Took 0.0266 seconds # 检查表是否存在 hbase:008:0> exists 'demo' Table demo does not exist Took 0.0220 seconds => false # 使用 disable 禁用一个表 hbase:009:0> disable 'test' Took 0.3985 seconds # 使用 is_enabled 查看一个表是否被禁用,也可以用 is_disabled 来查看 hbase:010:0> is_enabled 'test' false Took 0.0143 seconds => false # 使用 enable 启用一个表 hbase:011:0> enable 'test' # 删除一个表,删除前要先使用 disable 禁用这个表 hbase:012:0> disable 'demo' hbase:013:0> drop 'demo' # 退出 Shell hbase:015:0> exit # 或 quit
4. 添加数据
可以通过 put 命令来插入数据,以上文创建的 test 表为例,test 表包含 cf1,cf2 和 cf3 三个列族 (Column Family),列簇下的列不需要提前创建,在需要时通过 “:” 来指定即可。
我们将向 test 表添加如下数据:
id | name | age | job |
row1 | Tom | 12 | Student |
row2 | Jerry | 9 | Engineer |
row3 | Jerry | 10 | Engineer |
注:row3 的 name、job 数据和 row2 的 name、job 数据的值一样,可以正常存储。
$ cd ~/apps
$ ./hbase-2.4.4/bin/hbase shell
# 添加 row1 的 name 数据到 cf1 列族 hbase:001:0> put 'test', 'row1', 'cf1:name', 'Tom' Took 0.3990 seconds # 查看 row1 hbase:002:0> get 'test', 'row1' COLUMN CELL cf1:name timestamp=2023-04-01T20:54:44.931, value=Tom 1 row(s) Took 0.0517 seconds # 添加 row1 的 age 数据到 cf1 列族,并添加时间戳(可以不加,系统自动会生成时间戳) hbase:003:0> put 'test', 'row1', 'cf1:age', '12', 1 # 查看 row1 hbase:004:0> get 'test', 'row1' COLUMN CELL cf1:age timestamp=1970-01-01T08:00:00.001, value=12 cf1:name timestamp=2023-04-01T20:54:44.931, value=Tom 1 row(s) Took 0.0106 seconds # 继续添加数据 hbase:005:0> put 'test', 'row1', 'cf2:job', 'Student' hbase:006:0> put 'test', 'row2', 'cf1:name', 'Jerry' hbase:007:0> put 'test', 'row2', 'cf1:age', '9' hbase:008:0> put 'test', 'row2', 'cf2:job', 'Engineer' hbase:009:0> put 'test', 'row3', 'cf1:name', 'Jerry' hbase:010:0> put 'test', 'row3', 'cf1:age', '10' hbase:011:0> put 'test', 'row3', 'cf2:job', 'Engineer' # 给 row2 的 cf1 列族的 age 添加第二版本数据 hbase:012:0> put 'test', 'row2', 'cf1:age', '10' # 查看 row2 的 cf1 列族的 age 数据 hbase:013:0> get 'test', 'row2', 'cf1:age' COLUMN CELL cf1:age timestamp=2023-04-01T21:44:02.976, value=10 1 row(s) Took 0.0098 seconds # 查看 row2 的 cf1 列族的 age 数据最新 2 个版本 hbase:014:0> get 'test', 'row2', {COLUMNS=>'cf1:age',VERSIONS=>2} COLUMN CELL cf1:age timestamp=2023-04-01T20:28:48.823, value=10 cf1:age timestamp=2023-04-01T20:25:45.637, value=9 1 row(s) Took 0.0130 seconds # 退出 Shell hbase:015:0> exit # 或 quit
5. 查询数据
$ cd ~/apps
$ ./hbase-2.4.4/bin/hbase shell
1) GET 操作
# 查询表中有多少行 hbase:001:0> count 'test' 3 row(s) Took 0.3769 seconds => 3 # 获取 row1 的全部数据 hbase:002:0> get 'test', 'row1' COLUMN CELL cf1:age timestamp=1970-01-01T08:00:00.001, value=12 cf1:name timestamp=2023-04-01T20:54:44.931, value=Tom cf2:job timestamp=2023-04-01T21:22:52.426, value=Student 1 row(s) Took 0.0190 seconds # 获取 row1 的 cf1 列族的全部数据 hbase:003:0> get 'test', 'row1', 'cf1' COLUMN CELL cf1:age timestamp=1970-01-01T08:00:00.001, value=12 cf1:name timestamp=2023-04-01T20:54:44.931, value=Tom 1 row(s) Took 0.0105 seconds # 获取 row1 的 name 数据 hbase:004:0> get 'test', 'row1', 'cf1:name' COLUMN CELL cf1:name timestamp=2023-04-01T20:54:44.931, value=Tom 1 row(s) Took 0.0671 seconds 注:也可以使用命令 get 'test', 'row1', {COLUMN=>'cf1:name'} # 获取 row1 的 name、job 数据 (多列) hbase:005:0> get 'test', 'row1', {COLUMNS=>['cf1:name','cf2:job']} COLUMN CELL cf1:name timestamp=2023-04-01T20:54:44.931, value=Tom cf2:job timestamp=2023-04-01T21:22:52.426, value=Student 1 row(s) Took 0.0365 seconds # 获取 row1 的值等于 12 的列 hbase:006:0> get 'test', 'row1', FILTER=>"ValueFilter(=,'binary:12')" COLUMN CELL cf1:age timestamp=1970-01-01T08:00:00.001, value=12 1 row(s) Took 0.0932 seconds
2) SCAN 操作
# 扫描整个表 hbase:007:0> scan 'test' ROW COLUMN+CELL row1 column=cf1:age, timestamp=1970-01-01T08:00:00.001, value=12 row1 column=cf1:name, timestamp=2023-04-01T20:54:44.931, value=Tom row1 column=cf2:job, timestamp=2023-04-01T21:22:52.426, value=Student row2 column=cf1:age, timestamp=2023-04-01T21:23:06.869, value=9 row2 column=cf1:name, timestamp=2023-04-01T21:22:59.690, value=Jerry row2 column=cf2:job, timestamp=2023-04-01T21:23:15.220, value=Engineer row3 column=cf1:age, timestamp=2023-04-01T21:23:35.183, value=10 row3 column=cf1:name, timestamp=2023-04-01T21:23:22.431, value=Jerry row3 column=cf2:job, timestamp=2023-04-01T21:23:41.953, value=Engineer 3 row(s) Took 0.0277 seconds # 扫描 cf1 列簇 hbase:008:0> scan 'test', {COLUMN=>'cf1'} ROW COLUMN+CELL row1 column=cf1:age, timestamp=1970-01-01T08:00:00.001, value=12 row1 column=cf1:name, timestamp=2023-04-01T20:54:44.931, value=Tom row2 column=cf1:age, timestamp=2023-04-01T21:23:06.869, value=9 row2 column=cf1:name, timestamp=2023-04-01T21:22:59.690, value=Jerry row3 column=cf1:age, timestamp=2023-04-01T21:23:35.183, value=10 row3 column=cf1:name, timestamp=2023-04-01T21:23:22.431, value=Jerry 3 row(s) Took 0.0115 seconds # 扫描 cf1 和 cf2 列簇 hbase:009:0> scan 'test', {COLUMNS=>['cf1','cf2']} ROW COLUMN+CELL row1 column=cf1:age, timestamp=1970-01-01T08:00:00.001, value=12 row1 column=cf1:name, timestamp=2023-04-01T20:54:44.931, value=Tom row1 column=cf2:job, timestamp=2023-04-01T21:22:52.426, value=Student row2 column=cf1:age, timestamp=2023-04-01T21:23:06.869, value=9 row2 column=cf1:name, timestamp=2023-04-01T21:22:59.690, value=Jerry row2 column=cf2:job, timestamp=2023-04-01T21:23:15.220, value=Engineer row3 column=cf1:age, timestamp=2023-04-01T21:23:35.183, value=10 row3 column=cf1:name, timestamp=2023-04-01T21:23:22.431, value=Jerry row3 column=cf2:job, timestamp=2023-04-01T21:23:41.953, value=Engineer 3 row(s) Took 0.0275 seconds # 扫描 cf1 列簇的 age,显示最后两个版本 hbase:010:0> scan 'test', {COLUMN=>'cf1:age',VERSIONS=>2} ROW COLUMN+CELL row1 column=cf1:age, timestamp=1970-01-01T08:00:00.001, value=12 row2 column=cf1:age, timestamp=2023-04-01T21:44:02.976, value=10 row2 column=cf1:age, timestamp=2023-04-01T21:40:07.682, value=9 row3 column=cf1:age, timestamp=2023-04-01T21:23:35.183, value=10 3 row(s) Took 0.0211 seconds # 扫描值等于 12 的列 hbase:011:0> scan 'test', {FILTER=>"ValueFilter(=,'binary:12')"} ROW COLUMN+CELL row1 column=cf1:age, timestamp=1970-01-01T08:00:00.001, value=12 1 row(s) Took 0.0365 seconds # 扫描包含 'Tom' 的列 hbase:012:0> scan 'test', {FILTER=>"ValueFilter(=,'substring:Tom')"} ROW COLUMN+CELL row1 column=cf1:name, timestamp=2023-04-01T20:54:44.931, value=Tom 1 row(s) Took 0.0092 seconds # 扫描整个表,限定返回行数 hbase:013:0> scan 'test', {LIMIT=>2} ROW COLUMN+CELL row1 column=cf1:age, timestamp=1970-01-01T08:00:00.001, value=12 row1 column=cf1:name, timestamp=2023-04-01T20:54:44.931, value=Tom row1 column=cf2:job, timestamp=2023-04-01T21:22:52.426, value=Student row2 column=cf1:age, timestamp=2023-04-01T22:28:48.823, value=10 row2 column=cf1:name, timestamp=2023-04-01T21:22:59.690, value=Jerry row2 column=cf2:job, timestamp=2023-04-01T21:23:15.220, value=Engineer 2 row(s) Took 0.0122 seconds # 扫描整个表,从指定行(包括该行)开始向后扫描 hbase:014:0> scan 'test', {STARTROW=>'row2'} ROW COLUMN+CELL row2 column=cf1:age, timestamp=2023-04-01T22:28:48.823, value=10 row2 column=cf1:name, timestamp=2023-04-01T21:22:59.690, value=Jerry row2 column=cf2:job, timestamp=2023-04-01T21:23:15.220, value=Engineer row3 column=cf1:age, timestamp=2023-04-02T12:59:58.985, value=10 row3 column=cf1:name, timestamp=2023-04-02T12:59:51.474, value=Jerry row3 column=cf2:job, timestamp=2023-04-02T13:00:06.009, value=Engineer 2 row(s) Took 0.0121 seconds # 扫描整个表,扫描到指定行(不包括该行) hbase:015:0> scan 'test', {STOPROW=>'row2'} ROW COLUMN+CELL row1 column=cf1:age, timestamp=1970-01-01T08:00:00.001, value=12 row1 column=cf1:name, timestamp=2023-04-01T20:54:44.931, value=Tom row1 column=cf2:job, timestamp=2023-04-01T21:22:52.426, value=Student 1 row(s) Took 0.0157 seconds # 扫描整个表,复合条件 hbase:016:0> scan 'test', {COLUMN=>'cf1:age',STARTROW=>'row2',LIMIT=>1} ROW COLUMN+CELL row2 column=cf1:age, timestamp=2023-04-01T22:28:48.823, value=10 1 row(s) Took 0.0089 seconds # 退出 Shell hbase:017:0> exit # 或 quit
6. 删除数据
$ cd ~/apps
$ ./hbase-2.4.4/bin/hbase shell
# 删除 row3 的 cf2 列族的 job 数据 hbase:001:0> delete 'test','row3','cf2:job' hbase:002:0> get 'test','row3' COLUMN CELL cf1:age timestamp=2023-04-01T21:23:35.183, value=10 cf1:name timestamp=2023-04-01T21:23:22.431, value=Jerry 1 row(s) Took 0.0210 seconds # 删除 row3 整行数据 hbase:003:0> deleteall 'test','row3' hbase:004:0> get 'test','row3' COLUMN CELL 0 row(s) Took 0.0058 seconds # 退出 Shell hbase:031:0> exit # 或 quit
7. Namespace 操作
命名空间,类似于关系型数据库下的 database,每个命名空间下有多个表。
$ cd ~/apps
$ ./hbase-2.4.4/bin/hbase shell
# 创建 Namespace hbase:001:0> create_namespace 'springboot' # 查看 Namespace hbase:002:0> describe_namespace 'springboot' DESCRIPTION {NAME => 'springboot'} Quota is disabled Took 0.0186 seconds # 查看所有 Namespace hbase:003:0> list_namespace NAMESPACE default hbase springboot 3 row(s) Took 0.0151 seconds # 在 springboot 下创建表 tbl_01 hbase:005:0>create 'springboot:tbl_01', 'column_family1' Created table springboot:tbl_01 Took 1.2189 seconds => Hbase::Table - springboot:tbl_01 # 列出 springboot 下的所有 Table hbase:006:0> list_namespace_tables 'springboot' TABLE tbl_01 1 row(s) Took 0.0197 seconds => ["tbl_01"] # 删除 Namespace hbase:007:0> drop_namespace 'springboot' ERROR: org.apache.hadoop.hbase.constraint.ConstraintException: Only empty namespaces can be removed. Namespace springboot has 1 tables ... 注:命名空间 springboot 下有一个表 tbl_01,不能删除非空的命名空间。