SparkSQL 操作Hive
Spark中本身内置了Hive,但一般用于测试,生产环境中需要连接外置的Hive
1.将Hive的配置文件hive-site.xml
拷贝到Spark的配置目录下
cp /usr/hive/apache-hive-3.1.3-bin/conf/hive-site.xml /usr/spark/spark-3.5.0-bin-hadoop3/conf
2.将Mysql JDBC驱动放置在Spark的jars目录下
1.下载Mysql JDBC驱动
地址1(Mysql官方托管):https://downloads.mysql.com/archives/c-j/
地址2(Maven中央仓库托管): https://mvnrepository.com/artifact/com.mysql/mysql-connector-j
2.拷贝驱动
cp /home/mysql-connector-j-8.0.33.jar /usr/spark/spark-3.5.0-bin-hadoop3/jars
3.将Hadoop的core-site.xml
和hdfs-site.xml
拷贝到Spark配置目录下
cp /usr/hadoop/hadoop-3.3.6/etc/hadoop/{hdfs-site.xml,core-site.xml} /usr/spark/spark-3.5.0-bin-hadoop3/conf
4.重启Spark-shell
/usr/spark/spark-3.5.0-bin-hadoop3/bin/spark-shell
5.测试
spark.sql("show tables").show