Spark SQL 以编程方式指定模式
Spark SQL 以编程方式指定模式
scala> val sqlcontext = new org.apache.spark.sql.SQLContext(sc) warning: there was one deprecation warning (since 2.0.0); for details, enable `:setting -deprecation' or `:replay -deprecation' sqlcontext: org.apache.spark.sql.SQLContext = org.apache.spark.sql.SQLContext@350a3df3 scala> val Demo = sc.textFile("Demo.txt") Demo: org.apache.spark.rdd.RDD[String] = Demo.txt MapPartitionsRDD[1] at textFile at <console>:24 scala> val schemastring = "id name age" schemastring: String = id name age scala> import org.apache.spark.sql.Row; import org.apache.spark.sql.Row scala> import org.apache.spark.sql.types.{StructType,StructField,StringType}; import org.apache.spark.sql.types.{StructType, StructField, StringType}
scala> val schema = StructType(schemastring.split(" ").map(fieldName => StructField(fieldName,StringType,true))) schema: org.apache.spark.sql.types.StructType = StructType(StructField(id,StringType,true), StructField(name,StringType,true), StructField(age,StringType,true))
scala> val rowRDD = Demo.map(_.split(",")).map(e => Row(e(0),e(1),e(2))) rowRDD: org.apache.spark.rdd.RDD[org.apache.spark.sql.Row] = MapPartitionsRDD[10] at map at <console>:27 -- 我原来在这一步将ID与age转化为 e(0).trim.toInt 是报错的;目前还未确定原因;日后在研究 scala> val DemoDF = sqlcontext.createDataFrame(rowRDD,schema) DemoDF: org.apache.spark.sql.DataFrame = [id: string, name: string ... 1 more field] scala> DemoDF.registerTempTable("Demo") warning: there was one deprecation warning (since 2.0.0); for details, enable `:setting -deprecation' or `:replay -deprecation' scala> val allrow = sqlcontext.sql("select * from Demo") allrow: org.apache.spark.sql.DataFrame = [id: string, name: string ... 1 more field] scala> allrow.show()
+----+--------+---+ | id| name|age| +----+--------+---+ |1201| satish| 25| |1202| krishna| 28| |1203| amith| 39| |1204| javed| 23| |1205| prudvi| 23| +----+--------+---+
作者:M_Fight๑҉
本文版权归作者和博客园共有,欢迎转载,但未经作者同意必须保留此段声明,且在文章页面明显位置给出原文连接,否则保留追究法律责任的权利。
【推荐】编程新体验,更懂你的AI,立即体验豆包MarsCode编程助手
【推荐】凌霞软件回馈社区,博客园 & 1Panel & Halo 联合会员上线
【推荐】抖音旗下AI助手豆包,你的智能百科全书,全免费不限次数
【推荐】博客园社区专享云产品让利特惠,阿里云新客6.5折上折
【推荐】轻量又高性能的 SSH 工具 IShell:AI 加持,快人一步
· 一个费力不讨好的项目,让我损失了近一半的绩效!
· 实操Deepseek接入个人知识库
· CSnakes vs Python.NET:高效嵌入与灵活互通的跨语言方案对比
· 【.NET】调用本地 Deepseek 模型
· Plotly.NET 一个为 .NET 打造的强大开源交互式图表库