Spark集成Hive

命令行集成Hive

将hive中的hive-site.xml配置文件拷贝到spark配置文件目录下,仅需要以下内容

<configuration>
  <property>
   <name>hive.metastore.warehouse.dir</name>
   <value>/user/hive/warehouse</value>
  </property>
  <property>
   <name>javax.jdo.option.ConnectionURL</name>
   <value>jdbc:mysql://ip:port/hive?serverTimezone=Asia/Shanghai</value>
  </property>
  <property>
   <name>javax.jdo.option.ConnectionDriverName</name>
   <value>com.mysql.cj.jdbc.Driver</value>
  </property>
  <property>
   <name>javax.jdo.option.ConnectionUserName</name>
   <value>root</value>
  </property>
  <property>
   <name>javax.jdo.option.ConnectionPassword</name>
   <value>xxx</value>
  </property>
</configuration>

将hive中lib下的mysql渠道包拷贝到spark的jars目录下

bin/spark-sql

这样就可以像操作hive一样操作spark-sql了。

insert into tb_spark(name,age) values('lisi',23); # hive写法
insert into tb_spark values('lisi',23); # sparksql写法

插入数据时不能指定列名,原因未知,可能版本的问题。

代码集成Hive

<dependency>
    <groupId>org.apache.spark</groupId>
    <artifactId>spark-hive_2.11</artifactId>
    <version>2.4.3</version>
</dependency>
<dependency>
    <groupId>mysql</groupId>
    <artifactId>mysql-connector-java</artifactId>
    <version>8.0.29</version>
</dependency>
import org.apache.spark.SparkConf
import org.apache.spark.sql.SparkSession

/**
  * sparkSQL操作hive
  */
object SparkSQLReadHive {

  def main(args: Array[String]): Unit = {
    val conf = new SparkConf()
      .setMaster("local")

    val sparkSession = SparkSession.builder()
      .appName("SparkSQLReadHive")
      .config(conf)
      .config("spark.sql.warehouse.dir", "hdfs://bigdata01:9000/user/hive/warehouse")
      .enableHiveSupport()
      .getOrCreate()
    
    sparkSession.sql("select * from student").show()

    sparkSession.stop()
  }
}

报错

Exception in thread "main" org.apache.spark.sql.AnalysisException: java.lang.RuntimeException: java.lang.RuntimeException: Error while running command to get file permissions : java.io.IOException: (null) entry in command string: null ls -F C:\tmp\hive
	at org.apache.hadoop.util.Shell$ShellCommandExecutor.execute(Shell.java:762)
	at org.apache.hadoop.util.Shell.execCommand(Shell.java:859)
	at org.apache.hadoop.util.Shell.execCommand(Shell.java:842)
	at org.apache.hadoop.fs.FileUtil.execCommand(FileUtil.java:1097)
	at org.apache.hadoop.fs.RawLocalFileSystem$DeprecatedRawLocalFileStatus.loadPermissionInfo(RawLocalFileSystem.java:587)
	at org.apache.hadoop.fs.RawLocalFileSystem$DeprecatedRawLocalFileStatus.getPermission(RawLocalFileSystem.java:562)
	at org.apache.hadoop.hive.ql.session.SessionState.createRootHDFSDir(SessionState.java:599)
	at org.apache.hadoop.hive.ql.session.SessionState.createSessionDirs(SessionState.java:554)
	at org.apache.hadoop.hive.ql.session.SessionState.start(SessionState.java:508)
	at org.apache.spark.sql.hive.client.HiveClientImpl.newState(HiveClientImpl.scala:183)
	at org.apache.spark.sql.hive.client.HiveClientImpl.<init>(HiveClientImpl.scala:117)

解决方法

  1. 在本地下载hadoop并解压
  2. 并下载 winutils.exe,放到hadoop的bin目录下。
  3. 配置HADOOP_HOME环境变量或者在代码中配置
    System.setProperty("hadoop.home.dir","C:\\D-myfiles\\software\\hadoop-3.2.0\\hadoop-3.2.0")
    

又报错

Exception in thread "main" org.apache.spark.sql.AnalysisException: java.lang.RuntimeException: java.lang.RuntimeException: The root scratch dir: /tmp/hive on HDFS should be writable. Current permissions are: rw-rw-rw-;
	at org.apache.spark.sql.hive.HiveExternalCatalog.withClient(HiveExternalCatalog.scala:106)
	at org.apache.spark.sql.hive.HiveExternalCatalog.databaseExists(HiveExternalCatalog.scala:214)
	at org.apache.spark.sql.internal.SharedState.externalCatalog$lzycompute(SharedState.scala:114)
	at org.apache.spark.sql.internal.SharedState.externalCatalog(SharedState.scala:102)
	at org.apache.spark.sql.internal.SharedState.globalTempViewManager$lzycompute(SharedState.scala:141)
	at org.apache.spark.sql.internal.SharedState.globalTempViewManager(SharedState.scala:136)
	at org.apache.spark.sql.hive.HiveSessionStateBuilder$$anonfun$2.apply(HiveSessionStateBuilder.scala:55)
	at org.apache.spark.sql.hive.HiveSessionStateBuilder$$anonfun$2.apply(HiveSessionStateBuilder.scala:55)
	at org.apache.spark.sql.catalyst.catalog.SessionCatalog.globalTempViewManager$lzycompute(SessionCatalog.scala:91)
	at org.apache.spark.sql.catalyst.catalog.SessionCatalog.globalTempViewManager(SessionCatalog.scala:91)

参考网上,要执行

winutils.exe chmod 777 C:\tmp\hive

但报错

由于找不到MSVCR100.dll,无法继续执行代码

太麻烦,暂时先不管它了。

参考

解决windows上The root scratch dir: /tmp/hive on HDFS should be writable.Current permissions are: ------
本地spark连接hive相关问题总结

posted @ 2023-06-02 20:33  strongmore  阅读(108)  评论(0编辑  收藏  举报