spark + hive

1.如何让 spark-sql 能够访问hive?

只需将hive-site.xml 放到 spark/conf 下即可,hive-site.xml 内容请参照hive集群搭建

 2.要在spark 代码中使用sql操作hive,需要在初始化sparksession 时加上

enableHiveSupport()
 val spark = SparkSession
      .builder()
      .appName("df")
      .master("local[*]")
      .enableHiveSupport()
      .getOrCreate()

3.spark开启hive动态分区功能

spark.sql("SET hive.exec.dynamic.partition = true")
spark.sql("SET hive.exec.dynamic.partition.mode = nonstrict ")

4.spark 查看hive表是否存在

val exists = spark.catalog.tableExists(db, tb)

5.spark 删除hdfs路径(用于重建hive表指定路径)

val hadoopConf = spark.sparkContext.hadoopConfiguration
        val hdfs = org.apache.hadoop.fs.FileSystem.get(hadoopConf)
        val path = new Path(location)
        if (hdfs.exists(path)) {
          //为防止误删,禁止递归删除
          hdfs.delete(path, false)
        }

 

posted @ 2018-09-11 11:12  生心无住  阅读(1107)  评论(0编辑  收藏  举报