Impala整合HBase
• 步骤1:创建hbase 表,向表中添加数据
1 create 'test_info', 'info' 2 3 put 'test_info','0001','info:birthday','2020-01-01' 4 put 'test_info','0001','info:gender','male' 5 put 'test_info','0001','info:user_type','1' 6 7 put 'test_info','0002','info:birthday','2019-01-01' 8 put 'test_info','0002','info:gender','male' 9 put 'test_info','0002','info:user_type','2' 10 11 put 'test_info','0003','info:birthday','2018-01-01' 12 put 'test_info','0003','info:gender','male' 13 put 'test_info','0003','info:user_type','2'
• 步骤2:创建hive表
CREATE EXTERNAL TABLE test_info( user_id string, user_type tinyint, gender string, birthday string) ROW FORMAT SERDE 'org.apache.hadoop.hive.hbase.HBaseSerDe' STORED BY 'org.apache.hadoop.hive.hbase.HBaseStorageHandler' WITH SERDEPROPERTIES ("hbase.columns.mapping" = ":key, info:user_type, info:gender, info:birthday") TBLPROPERTIES("hbase.table.name" = "test_info");
在WITH SERDEPROPERTIES选项中指定Hive外部表字段到HBase列的映射,其中“:key”对应于HBase中的RowKey,名称为“user_id”,其余的就是列簇info中的列名。最后在TBLPROPERTIES中指定了HBase中要进行映射的表名。
• 步骤3:刷新Impala表
Impala共享Hive的Metastore,这时需要同步元数据,可以通过在Impala Shell中执行同步命令:
invalidate metadata
• 步骤4:在impala中查询hive表
SELECT * FROM test_info ;
查询结果如下