HiveQL查询语句总结
本文写作目的是整理出一份带注释的cheatsheet。内容组织形式为查询语句,实例,原理浅析。
基本查询语句
show databases;
create database retail;
use retail;
create table id (val type)
row format delimited
fields terminated by ','
stored as textfile;用textfile的格式存储表格文件
describe id;
alter table id rename to id_new;
alter table add columns (col_name type);
drop table id;
show tables;
load data inpath "/path" overwrite into table id;
select COLUMN_NAME from id;
设置属性语句:
set hive.cli.print.current.db=true;
高级查询语句
Group by:
select column1,sum(column2) from table_1 group by column1;
Order By:
略
Case&Creating Indexes:
create index rec_index on table records_big(category) as 'compact' with deferred rebuild;
alter index rec_index on records_big rebuild;
show(formatted) index on records_big;
using ARRAYS and MAPS:
create table cust_info (language array<string>,country_year map<string,int>)
row format delimited fields terminated by ','
collection items terminated by '#'
map keys terminated by '|';
load data ...;
select language[0] from cust_info;
External Tables:
默认internal table
create external table XXX();
注意external table和internal table在删除时的不同表现。
Partitioning&Bucketing:
Join: Convert join MapJoin
select COLUMNS from table_a join table_b on CONDITION;
文件:
ORCFILE
Create table xxx () row format delimited fields terminated by ',' stored as ORC;
(需要相应设置:set hive.exec.compress.output=true;set mapred.output.compression.type=BLOCK;)
通过对比压缩和非压缩的方式得出的查询运行时间以及存储的文件大小的差异,得出这种文件格式的益处在哪里。
实用函数:
concat