HBase数据导入导出工具
hbase中自带一些数据导入、导出工具
1. ImportTsv直接导入
1.1 hbase中建表
create 'testtable4','cf1','cf2'
1.2 准备数据文件data.txt,上传到hdfs
1,tom,m 2,jack,m 3,lili,f hadoop fs -put data.txt /user/dw_hbkal/przhang
1.3 使用命令导入
bin/hbase org.apache.hadoop.hbase.mapreduce.ImportTsv -Dimporttsv.separator=',' -Dimporttsv.columns=HBASE_ROW_KEY,cf1,cf2 testtable4 /user/dw_hbkal/przhang/data.txt
1.4 查看hbase数据
hbase(main):069:0> scan 'testtable4' ROW COLUMN+CELL 1 column=cf1:, timestamp=1533708793917, value=tom 1 column=cf2:, timestamp=1533708793917, value=m 2 column=cf1:, timestamp=1533708793917, value=jack 2 column=cf2:, timestamp=1533708793917, value=m 3 column=cf1:, timestamp=1533708793917, value=lili 3 column=cf2:, timestamp=1533708793917, value=f 3 row(s) in 0.0300 seconds
2. ImportTsv先生成HFile,然后增量导入
2.1 创建数据文件data2.txt,并上传hdfs
1,tom,f 5,jack2,m 6,lili2,m hadoop fs -put data2.txt /user/dw_hbkal/przhang
2.2 生成HFile
bin/hbase org.apache.hadoop.hbase.mapreduce.ImportTsv -Dimporttsv.separator=',' -Dimporttsv.columns=HBASE_ROW_KEY,cf1,cf2 -Dimporttsv.bulk.output=/user/dw_hbkal/przhang/hfile_tmp testtable4 /user/dw_hbkal/przhang/data2.txt
2.3 将HFile文件导入HBase,实际是执行hdfs mv 操作
bin/hbase org.apache.hadoop.hbase.mapreduce.LoadIncrementalHFiles /user/dw_hbkal/przhang/hfile_tmp testtable4
2.4 查看hdfs上cf1的hfile文件,时间戳晚一点的为生成的hfile文件
hadoop fs -ls /hbase/data/default/testtable4/ebaa89a06f73a0ecdc15b53bd88bc3a4/cf1 Found 2 items -rwxrwxrwx 3 hdfs bdos 1170 2018-08-08 14:23 /hbase/data/default/testtable4/ebaa89a06f73a0ecdc15b53bd88bc3a4/cf1/0e80f632a7214755a8e84e9fafea36eb_SeqId_6_ -rw-r--r-- 3 hbase hbase 1065 2018-08-08 14:45 /hbase/data/default/testtable4/ebaa89a06f73a0ecdc15b53bd88bc3a4/cf1/347598bdf4e34b51909b6965fed11a99
2.5 查看hbase
hbase(main):070:0> scan 'testtable4' ROW COLUMN+CELL 1 column=cf1:, timestamp=1533709383463, value=tom 1 column=cf2:, timestamp=1533709383463, value=f 2 column=cf1:, timestamp=1533708793917, value=jack 2 column=cf2:, timestamp=1533708793917, value=m 3 column=cf1:, timestamp=1533708793917, value=lili 3 column=cf2:, timestamp=1533708793917, value=f 5 column=cf1:, timestamp=1533709383463, value=jack2 5 column=cf2:, timestamp=1533709383463, value=m 6 column=cf1:, timestamp=1533709383463, value=lili2 6 column=cf2:, timestamp=1533709383463, value=m 5 row(s) in 0.0260 seconds
3. Export数据导出至HDFS
bin/hbase org.apache.hadoop.hbase.mapreduce.Export testtable /user/dw_hbkal/przhang/hbaseexport/testdata //testtable表数据导出到一个hdfs路径,可以设置导出的版本数量、起始时间
4. Import数据从HDFS中导入
hbase org.apache.hadoop.hbase.mapreduce.Import testtable /user/dw_hbkal/przhang/hbaseexport/testdata // hdfs数据导入testtable,导入之前test要先创建
5. CopyTable表复制
hbase org.apache.hadoop.hbase.mapreduce.CopyTable --new.name=test3 test //test中的数据复制到test3表中,复制只会考虑最新的数据