PySpark 报错 java.lang.ClassNotFoundException: com.mysql.cj.jdbc.Driver

解决方案:

mv mysql-connector-java-8.0.20.jar $SPARK_HOME/jars/

驱动文件mysql-connector-java-8.0.20.jar是从maven仓库下载的:

https://mvnrepository.com/artifact/mysql/mysql-connector-java/8.0.20

注意,这个报错的设置,需要搞清楚当前spark是什么mode,如果盲目照搬stackoverflow和百度,你会发现无效!

spark-defaults.conf 中设置如下:

spark.driver.extraClassPath   = /home/appleyuchi/bigdata/apache-hive-3.0.0-bin/lib/mysql-connector-java-8.0.20.jar
spark.executor.extraClassPath = /home/appleyuchi/bigdata/apache-hive-3.0.0-bin/lib/mysql-connector-java-8.0.20.jar

spark.jars = /home/appleyuchi/bigdata/apache-hive-3.0.0-bin/lib/mysql-connector-java-8.0.20.jar

测试方法如下:

①pyspark --master yarn(然后在交互是模式中输入交互式代码)

②spark-submit --master yarn --deploy-mode cluster 源码.py

import pandas as pd
from pyspark.sql import SparkSession
from pyspark import SparkContext
from pyspark.sql import SQLContext
 
def map_extract(element):
    file_path, content = element
    year = file_path[-8:-4]
    return [(year, i) for i in content.split("\n") if i]
 
 
spark = SparkSession\
    .builder\
    .appName("PythonTest")\
    .getOrCreate()
 
    
res = spark.sparkContext.wholeTextFiles('hdfs://Desktop:9000/user/mercury/names',
                        minPartitions=40)  \
        .map(map_extract) \
        .flatMap(lambda x: x) \
        .map(lambda x: (x[0], int(x[1].split(',')[2]))) \
        .reduceByKey(lambda x,y:x+y)
 
 
 
df = res.toDF(["key","num"])  #把已有数据列改成和目标mysql表的列的名字相同
# print(dir(df))
df.printSchema()
print(df.show())
df.printSchema()
 
df.write.format("jdbc").options(
    url="jdbc:mysql://Desktop:3306/leaf",
    driver="com.mysql.cj.jdbc.Driver",
    dbtable="spark",
    user="appleyuchi",
    password="appleyuchi").mode('append').save()
posted @ 2022-06-23 13:11  RioTian  阅读(435)  评论(0编辑  收藏  举报