kudu学习笔记——建表语句

kudu支持的数据类型

boolean

8-bit signed integer

16-bit signed integer

32-bit signed integer

64-bit signed integer

date (32-bit days since the Unix epoch)

unixtime_micros (64-bit microseconds since the Unix epoch)

single-precision (32-bit) IEEE-754 floating-point number

double-precision (64-bit) IEEE-754 floating-point number

decimal (see Decimal Type for details)

varchar (see Varchar Type for details)

UTF-8 encoded string (up to 64KB uncompressed)

binary (up to 64KB uncompressed)

参考:Apache Kudu Schema Design

kudu的建表语句

CREATE TABLE sales_by_year(					# 列存储
	year INT, 						# 有限固定列,强类型
	sale_id INT COLPROPERTIES (encoding=“bitshuffle”),	# 每一列均可以设置encoding及压缩方式
	amount INT,
	PRIMARY	KEY (year,sale_id)				# 主键索引
)
PARTITION BY HASH (sale_id) PARTITIONS 4,              # 哈希分区
RANGE (year)
(
PARTITION 2014 <= VALUES <= 2016,				# 范围分区
PARTITION VALUE	= 2017
)
STORED AS KUDU
TBLPROPERTIES (replication=3);					# 多副本

或者

CREATE TABLE xxxx.xxxx(
  uuid STRING,
  ds string,
  `date` string,
  `ts` string,
  `col1` int,
  `col2` bigint,
  `col3` int,
PRIMARY KEY (uuid, ds))
PARTITION BY HASH (uuid, ds) PARTITIONS 9,
RANGE(ds) (
PARTITION VALUE="2020-09-05",
PARTITION VALUE="2020-09-06",
PARTITION VALUE="2020-09-07",
PARTITION VALUE="2020-09-08",
PARTITION VALUE="2020-09-09"
)
STORED AS KUDU;

参考:DTCC2017-Kudu介绍-小米张震-final

 

支持若干种分区方式:

参考:kudu 表设计使用及限制 / Kudu table schema design and limited

1. hash 分区

create table kudu_test.real_time_sales_temporary_kuduc (
    sdt String,
    shopId String,
    updateTime String,
    serialId String,
    sheetId String,
    goodsId String,

    key_by String,
    timeFrame String,
    regionId String,
    regionName String,
    shopName String,
    serviceRegionId String,
    serviceRegionName String,
    shopBelongId String,
    shopBelongName String,
    primary key(sdt,shopId,updatetime,serialId,sheetid)
    )
    PARTITION by 
    hash(sdt,shopId) partitions 32
    COMMENT '销售流水表C'
    STORED AS kudu;

如果不指定hash分区的字段,比如

PARTITION BY HASH  PARTITIONS 3

那么默认将会使用主键来进行hash,即

PARTITION BY HASH (sdt,shopId,updatetime,serialId,sheetid) PARTITIONS 3

2. 范围分区

CREATE TABLE cust_behavior_table (
id BIGINT,
sku STRING,
salary STRING,
edu_level INT,
usergender STRING,
group STRING,
city STRING,
postcode STRING,
last_purchase_price FLOAT,
last_purchase_date BIGINT,
category STRING,
rating INT,
fulfilled_date BIGINT,
PRIMARY KEY (id, sku)
)
PARTITION BY RANGE (sku)
(
PARTITION VALUES < ‘g’,
PARTITION ‘g’ <= VALUES < ‘o’,
PARTITION ‘o’ <= VALUES < ‘u’,
PARTITION ‘u’ <= VALUES
) STORED AS KUDU
TBLPROPERTIES(
‘kudu.table_name’ = ‘cust_behavior_1 ‘,’kudu.master_addresses’ = ‘hadoop5:7051’);

3. 混合分区(hash + range)

CREATE TABLE cust_behavior_1 (
id BIGINT,
sku STRING,
salary STRING,
edu_level INT,
usergender STRING,
group STRING,
city STRING,
postcode STRING,
last_purchase_price FLOAT,
last_purchase_date BIGINT,
category STRING,
rating INT,
fulfilled_date BIGINT,
PRIMARY KEY (id, sku)
)
PARTITION BY 
HASH (id) PARTITIONS 4,
RANGE (sku)
(
PARTITION VALUES < ‘g’,
PARTITION ‘g’ <= VALUES < ‘o’,
PARTITION ‘o’ <= VALUES < ‘u’,
PARTITION ‘u’ <= VALUES
) STORED AS KUDU
TBLPROPERTIES(
‘kudu.table_name’ = ‘cust_behavior_1 ‘,’kudu.master_addresses’ = ‘hadoop5:7051’);

 

posted @ 2016-03-16 11:16  tonglin0325  阅读(1217)  评论(0编辑  收藏  举报