spark + hive
1.如何让 spark-sql 能够访问hive?
只需将hive-site.xml 放到 spark/conf 下即可,hive-site.xml 内容请参照hive集群搭建
2.要在spark 代码中使用sql操作hive,需要在初始化sparksession 时加上
enableHiveSupport()
val spark = SparkSession .builder() .appName("df") .master("local[*]") .enableHiveSupport() .getOrCreate()
3.spark开启hive动态分区功能
spark.sql("SET hive.exec.dynamic.partition = true") spark.sql("SET hive.exec.dynamic.partition.mode = nonstrict ")
4.spark 查看hive表是否存在
val exists = spark.catalog.tableExists(db, tb)
5.spark 删除hdfs路径(用于重建hive表指定路径)
val hadoopConf = spark.sparkContext.hadoopConfiguration val hdfs = org.apache.hadoop.fs.FileSystem.get(hadoopConf) val path = new Path(location) if (hdfs.exists(path)) { //为防止误删,禁止递归删除 hdfs.delete(path, false) }
欢迎转载,不必注明出处