kudu学习笔记——建表语句
kudu支持的数据类型
boolean 8-bit signed integer 16-bit signed integer 32-bit signed integer 64-bit signed integer date (32-bit days since the Unix epoch) unixtime_micros (64-bit microseconds since the Unix epoch) single-precision (32-bit) IEEE-754 floating-point number double-precision (64-bit) IEEE-754 floating-point number decimal (see Decimal Type for details) varchar (see Varchar Type for details) UTF-8 encoded string (up to 64KB uncompressed) binary (up to 64KB uncompressed)
kudu的建表语句
CREATE TABLE sales_by_year( # 列存储 year INT, # 有限固定列,强类型 sale_id INT COLPROPERTIES (encoding=“bitshuffle”), # 每一列均可以设置encoding及压缩方式 amount INT, PRIMARY KEY (year,sale_id) # 主键索引 ) PARTITION BY HASH (sale_id) PARTITIONS 4, # 哈希分区 RANGE (year) ( PARTITION 2014 <= VALUES <= 2016, # 范围分区 PARTITION VALUE = 2017 ) STORED AS KUDU TBLPROPERTIES (replication=3); # 多副本
或者
CREATE TABLE xxxx.xxxx( uuid STRING, ds string, `date` string, `ts` string, `col1` int, `col2` bigint, `col3` int, PRIMARY KEY (uuid, ds)) PARTITION BY HASH (uuid, ds) PARTITIONS 9, RANGE(ds) ( PARTITION VALUE="2020-09-05", PARTITION VALUE="2020-09-06", PARTITION VALUE="2020-09-07", PARTITION VALUE="2020-09-08", PARTITION VALUE="2020-09-09" ) STORED AS KUDU;
支持若干种分区方式:
参考:kudu 表设计使用及限制 / Kudu table schema design and limited
1. hash 分区
create table kudu_test.real_time_sales_temporary_kuduc ( sdt String, shopId String, updateTime String, serialId String, sheetId String, goodsId String, key_by String, timeFrame String, regionId String, regionName String, shopName String, serviceRegionId String, serviceRegionName String, shopBelongId String, shopBelongName String, primary key(sdt,shopId,updatetime,serialId,sheetid) ) PARTITION by hash(sdt,shopId) partitions 32 COMMENT '销售流水表C' STORED AS kudu;
如果不指定hash分区的字段,比如
PARTITION BY HASH PARTITIONS 3
那么默认将会使用主键来进行hash,即
PARTITION BY HASH (sdt,shopId,updatetime,serialId,sheetid) PARTITIONS 3
2. 范围分区
CREATE TABLE cust_behavior_table ( id BIGINT, sku STRING, salary STRING, edu_level INT, usergender STRING, group STRING, city STRING, postcode STRING, last_purchase_price FLOAT, last_purchase_date BIGINT, category STRING, rating INT, fulfilled_date BIGINT, PRIMARY KEY (id, sku) ) PARTITION BY RANGE (sku) ( PARTITION VALUES < ‘g’, PARTITION ‘g’ <= VALUES < ‘o’, PARTITION ‘o’ <= VALUES < ‘u’, PARTITION ‘u’ <= VALUES ) STORED AS KUDU TBLPROPERTIES( ‘kudu.table_name’ = ‘cust_behavior_1 ‘,’kudu.master_addresses’ = ‘hadoop5:7051’);
3. 混合分区(hash + range)
CREATE TABLE cust_behavior_1 ( id BIGINT, sku STRING, salary STRING, edu_level INT, usergender STRING, group STRING, city STRING, postcode STRING, last_purchase_price FLOAT, last_purchase_date BIGINT, category STRING, rating INT, fulfilled_date BIGINT, PRIMARY KEY (id, sku) ) PARTITION BY HASH (id) PARTITIONS 4, RANGE (sku) ( PARTITION VALUES < ‘g’, PARTITION ‘g’ <= VALUES < ‘o’, PARTITION ‘o’ <= VALUES < ‘u’, PARTITION ‘u’ <= VALUES ) STORED AS KUDU TBLPROPERTIES( ‘kudu.table_name’ = ‘cust_behavior_1 ‘,’kudu.master_addresses’ = ‘hadoop5:7051’);
本文只发表于博客园和tonglin0325的博客,作者:tonglin0325,转载请注明原文链接:https://www.cnblogs.com/tonglin0325/p/5282704.html