concat_ws

一、介绍

在 Hive 中想实现按某字段分组，对另外字段进行合并，可通过 collect_list 或者 collect_set 实现。

它们都是将分组中的某列转为一个数组返回，其中区别在于：

有点类似于 Python 中的列表、集合。

create table table_tmp(
    id string,
    classes string
) partitioned by (month string)
row format delimited fields terminated by ',';

1,a
1,b
2,a
2,b
2,a
2,c
3,a
3,c

load data local inpath '/root/data/id.data' into table table_tmp partition (month='202201');

select id,
       collect_list(classes) as col_001
from table_tmp
group by id;

select id,
       concat_ws('-', collect_list(cast(col_001 as string))) as col_concat
from table_tmp
group by id;

select id,
       concat_ws('-', collect_set(cast(col_001 as string))) as col_concat
from table_tmp
group by id;

可以利用 collect 突破 group by 的限制，分组查询的时候要求出现在 select 后面的列都必须是分组的列。

但有时候我们想根据某列进行分组后，随机抽取另一列中的一个值，即可通过以下实现：

select id
       collect_list(classes)[0] as col_001
from table_tmp
group by id;

有种类似于 Python 中索引切片的感觉。

concat_ws(separator, str1, str2, ...)
concat_ws(separator, [str1, str2, ...])

参考链接：hive中对多行进行合并—collect_set&collect_list函数

参考链接：Hive笔记之collect_list/collect_set（列转行）

posted @ 2022-01-11 22:49 Hider1214 阅读(2543) 评论(0) 编辑收藏举报

刷新页面返回顶部