Spark:DataFrame数据透视函数pivot
使用数据透视函数pivot:
val list = List(
(2017, 1, 100),
(2017, 1, 50),
(2017, 2, 100),
(2017, 3, 50),
(2018, 2, 200),
(2018, 2, 100))
import spark.implicits._
val ds = spark.createDataset(list)
val df = ds.toDF("year", "month", "num")
val res:org.apache.spark.sql.DataFrame =
df.groupBy("year")
.pivot("month")
.sum("num")
df.show
+----+-----+---+
|year|month|num|
+----+-----+---+
|2017| 1|100|
|2017| 1| 50|
|2017| 2|100|
|2017| 3| 50|
|2018| 2|200|
|2018| 2|100|
+----+-----+---+
res.show
+----+----+---+----+
|year| 1| 2| 3|
+----+----+---+----+
|2018|null|300|null|
|2017| 150|100| 50|
+----+----+---+----+