spark-sql创建DataFrame/DataSets的几种方式

1.DataFrame、DataSet以及SparkTable的区别

 

2.创建DataFrame

1.普通创建

case class Calllog(fromtel: String,totel: String,time: String,duration: Int)

val ds = sc.textFile("/user/data/calllog.csv").map(x=>x.split(","))

val log = ds.map(x=>Calllog(x(0),x(1),x(2),x(3).toInt))

val df = log.toDF

2.使用spark-seesion创建

//代码比较多,就不创建了,知道有这种方式就好了

3.根据带格式的文件创建,例如(json)

val df1 = spark.read.json("/user/data/people.json")

val df2 = spark.read.format("json").load("/user/data/people.json")

3.创建DataSets

1.

val df2 = spark.read.format("json").load("/user/data/people.json")

val ds = Seq(mydata(1,"tom"),mydata(2,"jerry")).toDS

2.

case class People(name:String,age:BigInt)

val data = spark.read.json("/user/data/people.json")

data.as[People]

4.将DF或者DS保存成一张表:

df.write.saveAsTable("XXXX")

posted @ 2019-12-24 22:14  悔不该放开那华佗哟  阅读(521)  评论(0编辑  收藏  举报