Spark2 DataSet 创建新行之flatMap
val dfList = List(("Hadoop", "Java,SQL,Hive,HBase,MySQL"), ("Spark", "Scala,SQL,DataSet,MLlib,GraphX")) dfList: List[(String, String)] = List((Hadoop,Java,SQL,Hive,HBase,MySQL), (Spark,Scala,SQL,DataSet,MLlib,GraphX)) case class Book(title: String, words: String) val df=dfList.map{p=>Book(p._1,p._2)}.toDS() df: org.apache.spark.sql.Dataset[Book] = [title: string, words: string] df.show +------+--------------------+ | title| words| +------+--------------------+ |Hadoop|Java,SQL,Hive,HBa...| | Spark|Scala,SQL,DataSet...| +------+--------------------+ df.flatMap(_.words.split(",")).show +-------+ | value| +-------+ | Java| | SQL| | Hive| | HBase| | MySQL| | Scala| | SQL| |DataSet| | MLlib| | GraphX| +-------+