Springboot 系列 (25) - Springboot+HBase 大数据存储(三)| HBase Shell

 

HBase Shell 是 HBase 的一套命令行工具,可以使用 shell 命令在 HBase 所在的本机上查询/操作 HBase 的数据。

HBase Shell 相关文档:https://hbase.apache.org/book.html#shell

在 “Springboot 系列 (24) - Springboot+HBase 大数据存储(二)| 安装配置 Apache HBase 和 Apache Zookeeper” 里我们安装配置了 Apache HBase 和 Apache Zookeeper。

本文将介绍 HBase Shell 的使用方式。

1. 系统环境

    操作系统:Ubuntu 20.04
    Java 版本:openjdk 11.0.18
    Hadoop 版本:3.2.2
    Zookeeper 版本:3.6.3

    HBase 版本:2.4.4
    HBase 所在路径:~/apps/hbase-2.4.4

    本文使用的 HBase 部署在伪分布式 Hadoop 架构上(主机名:hadoop-master-vm),在 HBase + Zookeeper (独立的) 模式下运行,Zookeeper 使用端口 2182。

 

2. 运行 HBase Shell

    $ cd ~/apps
    $ ./hbase-2.4.4/bin/hbase shell

        # help 命令
        hbase:001:0> help

            HBase Shell, version 2.4.4, r20e7ba45b0c3affdc0c06b1a0e5cbddd1b2d8d18, Mon Jun  7 15:31:55 PDT 2021
            Type 'help "COMMAND"', (e.g. 'help "get"' -- the quotes are necessary) for help on a specific command.
            Commands are grouped. Type 'help "COMMAND_GROUP"', (e.g. 'help "general"') for help on a command group.

            ...

        # 查看 HBase 版本
        hbase:002:0> version

            2.4.4, r20e7ba45b0c3affdc0c06b1a0e5cbddd1b2d8d18, Mon Jun  7 15:31:55 PDT 2021
            Took 0.0002 seconds

        # 查看 HBase 状态
        hbase:003:0> status

            1 active master, 0 backup masters, 1 servers, 0 dead, 1.0000 average load
            Took 0.4128 seconds

        # 退出 Shell
        hbase:003:0> exit       # 或 quit 


3. Table 操作

    $ cd ~/apps
    $ ./hbase-2.4.4/bin/hbase shell

        # 查看所有表
        hbase:001:0> list       

            TABLE
            0 row(s)
            Took 0.3782 seconds
            => []

        # 创建 test 表,一个列族 cf1
        hbase:002:0> create 'test', 'cf1'

            Created table test
            Took 1.2217 seconds
            => Hbase::Table - test

        # 查看表描述信息
        hbase:003:0> describe 'test'

            Table test is ENABLED
            test
            COLUMN FAMILIES DESCRIPTION
            {NAME => 'cf1', BLOOMFILTER => 'ROW', IN_MEMORY => 'false', VERSIONS => '1', KEEP_DELETED_CELLS => 'FALSE', DATA_BLOC
            K_ENCODING => 'NONE', COMPRESSION => 'NONE', TTL => 'FOREVER', MIN_VERSIONS => '0', BLOCKCACHE => 'true', BLOCKSIZE =
            > '65536', REPLICATION_SCOPE => '0'}

            1 row(s)
            Quota is disabled
            Took 0.0453 seconds

        # 添加 2 个列族
        hbase:004:0> alter 'test', 'cf2', 'cf3'

            Updating all regions with the new schema...
            1/1 regions updated.
            Done.
            Took 2.3449 seconds

        # 删除 cf3 列族
        hbase:005:0> alter 'test', {NAME => 'cf3', METHOD => 'delete'}

            Updating all regions with the new schema...
            1/1 regions updated.
            Done.
            Took 2.1434 seconds

        # 修改 cf1 列族的 VERSIONS 为 2,默认是 1 (即只保留最后一个版本)
        hbase:006:0> alter 'test', {NAME => 'cf1', VERSIONS => 2}

            Updating all regions with the new schema...
            1/1 regions updated.
            Done.
            Took 2.1434 seconds

        # 查看表描述信息
        hbase:007:0> describe 'test'

            Table test is ENABLED
            test
            COLUMN FAMILIES DESCRIPTION
            {NAME => 'cf1', BLOOMFILTER => 'ROW', IN_MEMORY => 'false', VERSIONS => '2', KEEP_DELETED_CELLS => 'FALSE', DATA_BLOC
            K_ENCODING => 'NONE', COMPRESSION => 'NONE', TTL => 'FOREVER', MIN_VERSIONS => '0', BLOCKCACHE => 'true', BLOCKSIZE =
            > '65536', REPLICATION_SCOPE => '0'}

            {NAME => 'cf2', BLOOMFILTER => 'ROW', IN_MEMORY => 'false', VERSIONS => '1', KEEP_DELETED_CELLS => 'FALSE', DATA_BLOC
            K_ENCODING => 'NONE', COMPRESSION => 'NONE', TTL => 'FOREVER', MIN_VERSIONS => '0', BLOCKCACHE => 'true', BLOCKSIZE =
            > '65536', REPLICATION_SCOPE => '0'}

            2 row(s)
            Quota is disabled
            Took 0.0266 seconds

        # 检查表是否存在
        hbase:008:0> exists 'demo'

            Table demo does not exist
            Took 0.0220 seconds
            => false

        # 使用 disable 禁用一个表
        hbase:009:0> disable 'test'

            Took 0.3985 seconds

        # 使用 is_enabled 查看一个表是否被禁用,也可以用 is_disabled 来查看
        hbase:010:0> is_enabled 'test'

            false
            Took 0.0143 seconds
            => false

        # 使用 enable 启用一个表
        hbase:011:0> enable 'test'

        # 删除一个表,删除前要先使用 disable 禁用这个表
        hbase:012:0> disable 'demo'
        hbase:013:0> drop 'demo'

        # 退出 Shell
        hbase:015:0> exit       # 或 quit  


4. 添加数据

    可以通过 put 命令来插入数据,以上文创建的 test 表为例,test 表包含 cf1,cf2 和 cf3 三个列族 (Column Family),列簇下的列不需要提前创建,在需要时通过 “:” 来指定即可。

    我们将向 test 表添加如下数据:

id name age job
row1 Tom 12 Student
row2 Jerry 9 Engineer
row3 Jerry 10 Engineer

        注:row3 的 name、job 数据和 row2 的 name、job 数据的值一样,可以正常存储。

    $ cd ~/apps
    $ ./hbase-2.4.4/bin/hbase shell

        # 添加 row1 的 name 数据到 cf1 列族
        hbase:001:0> put 'test', 'row1', 'cf1:name', 'Tom'

            Took 0.3990 seconds

        # 查看 row1
        hbase:002:0> get 'test', 'row1'

            COLUMN                         CELL
            cf1:name                      timestamp=2023-04-01T20:54:44.931, value=Tom
            1 row(s)
            Took 0.0517 seconds

        # 添加 row1 的 age 数据到 cf1 列族,并添加时间戳(可以不加,系统自动会生成时间戳)
        hbase:003:0> put 'test', 'row1', 'cf1:age', '12', 1

        # 查看 row1
        hbase:004:0> get 'test', 'row1'

            COLUMN                         CELL
            cf1:age                       timestamp=1970-01-01T08:00:00.001, value=12
            cf1:name                      timestamp=2023-04-01T20:54:44.931, value=Tom
            1 row(s)
            Took 0.0106 seconds

        # 继续添加数据
        hbase:005:0> put 'test', 'row1', 'cf2:job', 'Student'

        hbase:006:0> put 'test', 'row2', 'cf1:name', 'Jerry'
        hbase:007:0> put 'test', 'row2', 'cf1:age', '9'
        hbase:008:0> put 'test', 'row2', 'cf2:job', 'Engineer'

        hbase:009:0> put 'test', 'row3', 'cf1:name', 'Jerry'
        hbase:010:0> put 'test', 'row3', 'cf1:age', '10'
        hbase:011:0> put 'test', 'row3', 'cf2:job', 'Engineer'

        # 给 row2 的 cf1 列族的 age 添加第二版本数据
        hbase:012:0> put 'test', 'row2', 'cf1:age', '10'

        # 查看 row2 的 cf1 列族的 age 数据
        hbase:013:0> get 'test', 'row2', 'cf1:age'

            COLUMN                         CELL
             cf1:age                       timestamp=2023-04-01T21:44:02.976, value=10
            1 row(s)
            Took 0.0098 seconds

        # 查看 row2 的 cf1 列族的 age 数据最新 2 个版本
        hbase:014:0> get 'test', 'row2', {COLUMNS=>'cf1:age',VERSIONS=>2}

            COLUMN                         CELL
             cf1:age                       timestamp=2023-04-01T20:28:48.823, value=10
             cf1:age                       timestamp=2023-04-01T20:25:45.637, value=9
            1 row(s)
            Took 0.0130 seconds

        # 退出 Shell
        hbase:015:0> exit       # 或 quit  

 

5. 查询数据

    $ cd ~/apps
    $ ./hbase-2.4.4/bin/hbase shell

    1) GET 操作

        # 查询表中有多少行
        hbase:001:0> count 'test'

            3 row(s)
            Took 0.3769 seconds
            => 3
        
        # 获取 row1 的全部数据
        hbase:002:0> get 'test', 'row1'

            COLUMN                         CELL
             cf1:age                       timestamp=1970-01-01T08:00:00.001, value=12
             cf1:name                      timestamp=2023-04-01T20:54:44.931, value=Tom
             cf2:job                       timestamp=2023-04-01T21:22:52.426, value=Student
            1 row(s)
            Took 0.0190 seconds

        # 获取 row1 的 cf1 列族的全部数据
        hbase:003:0> get 'test', 'row1', 'cf1'

            COLUMN                         CELL
             cf1:age                       timestamp=1970-01-01T08:00:00.001, value=12
             cf1:name                      timestamp=2023-04-01T20:54:44.931, value=Tom
            1 row(s)
            Took 0.0105 seconds

        # 获取 row1 的 name 数据
        hbase:004:0> get 'test', 'row1', 'cf1:name'

            COLUMN                         CELL
             cf1:name                      timestamp=2023-04-01T20:54:44.931, value=Tom
            1 row(s)
            Took 0.0671 seconds

            注:也可以使用命令 get 'test', 'row1', {COLUMN=>'cf1:name'}

        # 获取 row1 的 name、job 数据 (多列)
        hbase:005:0> get 'test', 'row1',  {COLUMNS=>['cf1:name','cf2:job']}

            COLUMN                          CELL
             cf1:name                       timestamp=2023-04-01T20:54:44.931, value=Tom
             cf2:job                        timestamp=2023-04-01T21:22:52.426, value=Student
            1 row(s)
            Took 0.0365 seconds

        # 获取 row1 的值等于 12 的列
        hbase:006:0> get 'test', 'row1', FILTER=>"ValueFilter(=,'binary:12')"

            COLUMN                          CELL
             cf1:age                        timestamp=1970-01-01T08:00:00.001, value=12
            1 row(s)
            Took 0.0932 seconds

 

 2) SCAN 操作

        # 扫描整个表
        hbase:007:0> scan 'test'

            ROW                            COLUMN+CELL
             row1                          column=cf1:age, timestamp=1970-01-01T08:00:00.001, value=12
             row1                          column=cf1:name, timestamp=2023-04-01T20:54:44.931, value=Tom
             row1                          column=cf2:job, timestamp=2023-04-01T21:22:52.426, value=Student
             row2                          column=cf1:age, timestamp=2023-04-01T21:23:06.869, value=9
             row2                          column=cf1:name, timestamp=2023-04-01T21:22:59.690, value=Jerry
             row2                          column=cf2:job, timestamp=2023-04-01T21:23:15.220, value=Engineer
             row3                          column=cf1:age, timestamp=2023-04-01T21:23:35.183, value=10
             row3                          column=cf1:name, timestamp=2023-04-01T21:23:22.431, value=Jerry
             row3                          column=cf2:job, timestamp=2023-04-01T21:23:41.953, value=Engineer
            3 row(s)
            Took 0.0277 seconds

        # 扫描 cf1 列簇
        hbase:008:0> scan 'test', {COLUMN=>'cf1'}

            ROW                            COLUMN+CELL
             row1                          column=cf1:age, timestamp=1970-01-01T08:00:00.001, value=12
             row1                          column=cf1:name, timestamp=2023-04-01T20:54:44.931, value=Tom
             row2                          column=cf1:age, timestamp=2023-04-01T21:23:06.869, value=9
             row2                          column=cf1:name, timestamp=2023-04-01T21:22:59.690, value=Jerry
             row3                          column=cf1:age, timestamp=2023-04-01T21:23:35.183, value=10
             row3                          column=cf1:name, timestamp=2023-04-01T21:23:22.431, value=Jerry
            3 row(s)
            Took 0.0115 seconds

        # 扫描 cf1 和 cf2 列簇
        hbase:009:0> scan 'test', {COLUMNS=>['cf1','cf2']}

            ROW                            COLUMN+CELL
             row1                          column=cf1:age, timestamp=1970-01-01T08:00:00.001, value=12
             row1                          column=cf1:name, timestamp=2023-04-01T20:54:44.931, value=Tom
             row1                          column=cf2:job, timestamp=2023-04-01T21:22:52.426, value=Student
             row2                          column=cf1:age, timestamp=2023-04-01T21:23:06.869, value=9
             row2                          column=cf1:name, timestamp=2023-04-01T21:22:59.690, value=Jerry
             row2                          column=cf2:job, timestamp=2023-04-01T21:23:15.220, value=Engineer
             row3                          column=cf1:age, timestamp=2023-04-01T21:23:35.183, value=10
             row3                          column=cf1:name, timestamp=2023-04-01T21:23:22.431, value=Jerry
             row3                          column=cf2:job, timestamp=2023-04-01T21:23:41.953, value=Engineer
            3 row(s)
            Took 0.0275 seconds

        # 扫描 cf1 列簇的 age,显示最后两个版本
        hbase:010:0> scan 'test', {COLUMN=>'cf1:age',VERSIONS=>2}

            ROW                            COLUMN+CELL
             row1                          column=cf1:age, timestamp=1970-01-01T08:00:00.001, value=12
             row2                          column=cf1:age, timestamp=2023-04-01T21:44:02.976, value=10
             row2                          column=cf1:age, timestamp=2023-04-01T21:40:07.682, value=9
             row3                          column=cf1:age, timestamp=2023-04-01T21:23:35.183, value=10
            3 row(s)
            Took 0.0211 seconds

        # 扫描值等于 12 的列
        hbase:011:0> scan 'test', {FILTER=>"ValueFilter(=,'binary:12')"}
        
            ROW                            COLUMN+CELL
             row1                          column=cf1:age, timestamp=1970-01-01T08:00:00.001, value=12
            1 row(s)
            Took 0.0365 seconds

        # 扫描包含 'Tom' 的列
        hbase:012:0> scan 'test', {FILTER=>"ValueFilter(=,'substring:Tom')"}

            ROW                            COLUMN+CELL
             row1                          column=cf1:name, timestamp=2023-04-01T20:54:44.931, value=Tom
            1 row(s)
            Took 0.0092 seconds

        # 扫描整个表,限定返回行数
        hbase:013:0> scan 'test', {LIMIT=>2}

            ROW                             COLUMN+CELL
             row1                           column=cf1:age, timestamp=1970-01-01T08:00:00.001, value=12
             row1                           column=cf1:name, timestamp=2023-04-01T20:54:44.931, value=Tom
             row1                           column=cf2:job, timestamp=2023-04-01T21:22:52.426, value=Student
             row2                           column=cf1:age, timestamp=2023-04-01T22:28:48.823, value=10
             row2                           column=cf1:name, timestamp=2023-04-01T21:22:59.690, value=Jerry
             row2                           column=cf2:job, timestamp=2023-04-01T21:23:15.220, value=Engineer
            2 row(s)
            Took 0.0122 seconds

        # 扫描整个表,从指定行(包括该行)开始向后扫描
        hbase:014:0> scan 'test', {STARTROW=>'row2'}

            ROW                             COLUMN+CELL
             row2                           column=cf1:age, timestamp=2023-04-01T22:28:48.823, value=10
             row2                           column=cf1:name, timestamp=2023-04-01T21:22:59.690, value=Jerry
             row2                           column=cf2:job, timestamp=2023-04-01T21:23:15.220, value=Engineer
             row3                           column=cf1:age, timestamp=2023-04-02T12:59:58.985, value=10
             row3                           column=cf1:name, timestamp=2023-04-02T12:59:51.474, value=Jerry
             row3                           column=cf2:job, timestamp=2023-04-02T13:00:06.009, value=Engineer
            2 row(s)
            Took 0.0121 seconds

        # 扫描整个表,扫描到指定行(不包括该行)
        hbase:015:0> scan 'test', {STOPROW=>'row2'}

            ROW                             COLUMN+CELL
             row1                           column=cf1:age, timestamp=1970-01-01T08:00:00.001, value=12
             row1                           column=cf1:name, timestamp=2023-04-01T20:54:44.931, value=Tom
             row1                           column=cf2:job, timestamp=2023-04-01T21:22:52.426, value=Student
            1 row(s)
            Took 0.0157 seconds

        # 扫描整个表,复合条件
        hbase:016:0> scan 'test', {COLUMN=>'cf1:age',STARTROW=>'row2',LIMIT=>1}

            ROW                             COLUMN+CELL
             row2                           column=cf1:age, timestamp=2023-04-01T22:28:48.823, value=10
            1 row(s)
            Took 0.0089 seconds

        # 退出 Shell
        hbase:017:0> exit       # 或 quit  

 

6. 删除数据

    $ cd ~/apps
    $ ./hbase-2.4.4/bin/hbase shell

        # 删除 row3 的 cf2 列族的 job 数据
        hbase:001:0> delete 'test','row3','cf2:job'

        hbase:002:0> get 'test','row3'

            COLUMN                         CELL
             cf1:age                       timestamp=2023-04-01T21:23:35.183, value=10
             cf1:name                      timestamp=2023-04-01T21:23:22.431, value=Jerry
            1 row(s)
            Took 0.0210 seconds

        # 删除 row3 整行数据
        hbase:003:0> deleteall 'test','row3'

        hbase:004:0> get 'test','row3'

            COLUMN                         CELL
            0 row(s)
            Took 0.0058 seconds

        # 退出 Shell
        hbase:031:0> exit    # 或 quit


7. Namespace 操作

    命名空间,类似于关系型数据库下的 database,每个命名空间下有多个表。

    $ cd ~/apps
    $ ./hbase-2.4.4/bin/hbase shell

        # 创建 Namespace
        hbase:001:0> create_namespace 'springboot'

        # 查看 Namespace
        hbase:002:0> describe_namespace 'springboot'

            DESCRIPTION
            {NAME => 'springboot'}
            Quota is disabled
            Took 0.0186 seconds

        # 查看所有 Namespace
        hbase:003:0> list_namespace

            NAMESPACE
            default
            hbase
            springboot
            3 row(s)
            Took 0.0151 seconds

        # 在 springboot 下创建表 tbl_01
        hbase:005:0>create 'springboot:tbl_01', 'column_family1'

            Created table springboot:tbl_01
            Took 1.2189 seconds
            => Hbase::Table - springboot:tbl_01

        # 列出 springboot 下的所有 Table
        hbase:006:0> list_namespace_tables 'springboot'

            TABLE
            tbl_01
            1 row(s)
            Took 0.0197 seconds
            => ["tbl_01"]

        # 删除 Namespace
        hbase:007:0> drop_namespace 'springboot'

            ERROR: org.apache.hadoop.hbase.constraint.ConstraintException: Only empty namespaces can be removed. Namespace springboot has 1 tables

            ...

        注:命名空间 springboot 下有一个表 tbl_01,不能删除非空的命名空间。

 

posted @ 2023-03-26 19:06  垄山小站  阅读(358)  评论(0编辑  收藏  举报