hive--distribute by and sort by
数据
B 10 store_B_4 A 12 store_A_1 A 14 store_A_2 B 15 store_B_1 B 19 store_B_2 B 30 store_B_3
建表及加载数据
create table if not exists store( sid string, amount string, name string ) row format delimited fields terminated by ' ' lines terminated by '\n' stored as textfile ; load data local inpath '/opt/wangyuqi/store.txt' into table store;
hive中 distribute by + 字段,关键字会控制map输出结果的分发,相同字段的map会分发到一个reduce节点,sort by 为每个reduce内部排序
select * from store distribute by sid sort by amount desc; result: A 14 store_A_2 A 12 store_A_1 B 30 store_B_3 B 19 store_B_2 B 15 store_B_1 B 10 store_B_4
Time taken: 224.482 seconds
cluster by用法:相当于 distribute by 和sort by 的结合,默认只能是升序
select * from store cluster by sid; result: A 14 store_A_2 A 12 store_A_1 B 30 store_B_3 B 19 store_B_2 B 15 store_B_1 B 10 store_B_4 Time taken: 126.178 seconds, Fetched: 6 row(s)