Spark SQL 数据源 hive表
Spark SQL 数据源(json文件、hive表、parquet文件)
-- json 详见 524
hive表
scala> val hivecontext = new org.apache.spark.sql.hive.HiveContext(sc) warning: one deprecation (since 2.0.0); for details, enable `:setting -deprecation' or `:replay -deprecation' 22/06/24 14:29:08 WARN sql.SparkSession$Builder: Using an existing SparkSession; the static sql configurations will not take effect. hivecontext: org.apache.spark.sql.hive.HiveContext = org.apache.spark.sql.hive.HiveContext@7c089fbc scala> hivecontext.sql("CREATE TABLE IF NOT EXISTS Demo(id INT, name STRING, age INT) ROW FORMAT DELIMITED FIELDS TERMINATED BY ',' LINES TERMINATED BY '\n' ") 22/06/24 14:31:36 WARN session.SessionState: METASTORE_FILTER_HOOK will be ignored, since hive.security.authorization.manager is set to instance of HiveAuthorizerFactory. res1: org.apache.spark.sql.DataFrame = []
建表
scala> hivecontext.sql("CREATE TABLE IF NOT EXISTS mycdh.Demo(id INT, name STRING, age INT) ROW FORMAT DELIMITED FIELDS TERMINATED BY ',' LINES TERMINATED BY '\n' ") res5: org.apache.spark.sql.DataFrame = []
上述将Demo建在了默认库;这里修改为自己的hive库,最好先删除这个表,以免搞混
scala> hivecontext.sql("LOAD DATA INPATH 'hdfs://cdh1:9013/user/hive/employee.txt' INTO TABLE mycdh.Demo") res12: org.apache.spark.sql.DataFrame = []
scala> val result = hivecontext.sql("FROM mycdh.Demo SELECT id,name") result: org.apache.spark.sql.DataFrame = [id: int, name: string] scala> result.show() +----+--------+ | id| name| +----+--------+ |1201| satish| |1202| krishna| |1203| amith| |1204| javed| |1205| prudvi| +----+--------+
作者:M_Fight๑҉
本文版权归作者和博客园共有,欢迎转载,但未经作者同意必须保留此段声明,且在文章页面明显位置给出原文连接,否则保留追究法律责任的权利。
【推荐】编程新体验,更懂你的AI,立即体验豆包MarsCode编程助手
【推荐】凌霞软件回馈社区,博客园 & 1Panel & Halo 联合会员上线
【推荐】抖音旗下AI助手豆包,你的智能百科全书,全免费不限次数
【推荐】博客园社区专享云产品让利特惠,阿里云新客6.5折上折
【推荐】轻量又高性能的 SSH 工具 IShell:AI 加持,快人一步
· 一个费力不讨好的项目,让我损失了近一半的绩效!
· 实操Deepseek接入个人知识库
· CSnakes vs Python.NET:高效嵌入与灵活互通的跨语言方案对比
· 【.NET】调用本地 Deepseek 模型
· Plotly.NET 一个为 .NET 打造的强大开源交互式图表库