Spark2 Dataset之collect_set与collect_list
collect_set去除重复元素;collect_list不去除重复元素
select gender,
concat_ws(',', collect_set(children)),
concat_ws(',', collect_list(children))
from Affairs
group by gender
// 创建视图 data.createOrReplaceTempView("Affairs") val df3= spark.sql("select gender,concat_ws(',',collect_set(children)),concat_ws(',',collect_list(children)) from Affairs group by gender") df3: org.apache.spark.sql.DataFrame = [gender: string, concat_ws(,, collect_set(children)): string ... 1 more field] df3.show // collect_set去除重复元素;collect_list不去除重复元素 +------+-----------------------------------+------------------------------------+ |gender|concat_ws(,, collect_set(children))|concat_ws(,, collect_list(children))| +------+-----------------------------------+------------------------------------+ |female| no,yes| no,yes,no,no,yes| | male| no,yes| no,yes,no,yes,no| +------+-----------------------------------+------------------------------------+