Spark集成Hive
命令行集成Hive
将hive中的hive-site.xml
配置文件拷贝到spark配置文件目录下,仅需要以下内容
<configuration>
<property>
<name>hive.metastore.warehouse.dir</name>
<value>/user/hive/warehouse</value>
</property>
<property>
<name>javax.jdo.option.ConnectionURL</name>
<value>jdbc:mysql://ip:port/hive?serverTimezone=Asia/Shanghai</value>
</property>
<property>
<name>javax.jdo.option.ConnectionDriverName</name>
<value>com.mysql.cj.jdbc.Driver</value>
</property>
<property>
<name>javax.jdo.option.ConnectionUserName</name>
<value>root</value>
</property>
<property>
<name>javax.jdo.option.ConnectionPassword</name>
<value>xxx</value>
</property>
</configuration>
将hive中lib下的mysql渠道包拷贝到spark的jars目录下
bin/spark-sql
这样就可以像操作hive一样操作spark-sql了。
insert into tb_spark(name,age) values('lisi',23); # hive写法
insert into tb_spark values('lisi',23); # sparksql写法
插入数据时不能指定列名,原因未知,可能版本的问题。
代码集成Hive
<dependency>
<groupId>org.apache.spark</groupId>
<artifactId>spark-hive_2.11</artifactId>
<version>2.4.3</version>
</dependency>
<dependency>
<groupId>mysql</groupId>
<artifactId>mysql-connector-java</artifactId>
<version>8.0.29</version>
</dependency>
import org.apache.spark.SparkConf
import org.apache.spark.sql.SparkSession
/**
* sparkSQL操作hive
*/
object SparkSQLReadHive {
def main(args: Array[String]): Unit = {
val conf = new SparkConf()
.setMaster("local")
val sparkSession = SparkSession.builder()
.appName("SparkSQLReadHive")
.config(conf)
.config("spark.sql.warehouse.dir", "hdfs://bigdata01:9000/user/hive/warehouse")
.enableHiveSupport()
.getOrCreate()
sparkSession.sql("select * from student").show()
sparkSession.stop()
}
}
报错
Exception in thread "main" org.apache.spark.sql.AnalysisException: java.lang.RuntimeException: java.lang.RuntimeException: Error while running command to get file permissions : java.io.IOException: (null) entry in command string: null ls -F C:\tmp\hive
at org.apache.hadoop.util.Shell$ShellCommandExecutor.execute(Shell.java:762)
at org.apache.hadoop.util.Shell.execCommand(Shell.java:859)
at org.apache.hadoop.util.Shell.execCommand(Shell.java:842)
at org.apache.hadoop.fs.FileUtil.execCommand(FileUtil.java:1097)
at org.apache.hadoop.fs.RawLocalFileSystem$DeprecatedRawLocalFileStatus.loadPermissionInfo(RawLocalFileSystem.java:587)
at org.apache.hadoop.fs.RawLocalFileSystem$DeprecatedRawLocalFileStatus.getPermission(RawLocalFileSystem.java:562)
at org.apache.hadoop.hive.ql.session.SessionState.createRootHDFSDir(SessionState.java:599)
at org.apache.hadoop.hive.ql.session.SessionState.createSessionDirs(SessionState.java:554)
at org.apache.hadoop.hive.ql.session.SessionState.start(SessionState.java:508)
at org.apache.spark.sql.hive.client.HiveClientImpl.newState(HiveClientImpl.scala:183)
at org.apache.spark.sql.hive.client.HiveClientImpl.<init>(HiveClientImpl.scala:117)
解决方法
- 在本地下载hadoop并解压
- 并下载 winutils.exe,放到hadoop的bin目录下。
- 配置HADOOP_HOME环境变量或者在代码中配置
System.setProperty("hadoop.home.dir","C:\\D-myfiles\\software\\hadoop-3.2.0\\hadoop-3.2.0")
又报错
Exception in thread "main" org.apache.spark.sql.AnalysisException: java.lang.RuntimeException: java.lang.RuntimeException: The root scratch dir: /tmp/hive on HDFS should be writable. Current permissions are: rw-rw-rw-;
at org.apache.spark.sql.hive.HiveExternalCatalog.withClient(HiveExternalCatalog.scala:106)
at org.apache.spark.sql.hive.HiveExternalCatalog.databaseExists(HiveExternalCatalog.scala:214)
at org.apache.spark.sql.internal.SharedState.externalCatalog$lzycompute(SharedState.scala:114)
at org.apache.spark.sql.internal.SharedState.externalCatalog(SharedState.scala:102)
at org.apache.spark.sql.internal.SharedState.globalTempViewManager$lzycompute(SharedState.scala:141)
at org.apache.spark.sql.internal.SharedState.globalTempViewManager(SharedState.scala:136)
at org.apache.spark.sql.hive.HiveSessionStateBuilder$$anonfun$2.apply(HiveSessionStateBuilder.scala:55)
at org.apache.spark.sql.hive.HiveSessionStateBuilder$$anonfun$2.apply(HiveSessionStateBuilder.scala:55)
at org.apache.spark.sql.catalyst.catalog.SessionCatalog.globalTempViewManager$lzycompute(SessionCatalog.scala:91)
at org.apache.spark.sql.catalyst.catalog.SessionCatalog.globalTempViewManager(SessionCatalog.scala:91)
参考网上,要执行
winutils.exe chmod 777 C:\tmp\hive
但报错
由于找不到MSVCR100.dll,无法继续执行代码
太麻烦,暂时先不管它了。
参考
解决windows上The root scratch dir: /tmp/hive on HDFS should be writable.Current permissions are: ------
本地spark连接hive相关问题总结