初识python 之 pyspark读写mysql数据

代码

#!/user/bin env python
# author:Simple-Sir
# create_time: 2022/6/2 14:20
from pyspark.sql import SparkSession
spark = SparkSession.builder.master("local[*]").appName("sparkSql").getOrCreate()
sc = spark.sparkContext
rdd = sc.textFile('user.txt').map(lambda x:x.split(',')).map(lambda x:(x[0],x[1]))
df = rdd.toDF(['name','age'])
df.show() # 查看RDD数据

# 保存数据到MySql
# jdbc 连接类型
# url dbc:mysql://地址:端口/数据库名
# driver 驱动,固定
# user 用户
# password 密码
# dbtable 表,若表不存在,则新建。
# SaveMode.append 数据保存模式,追加
'''
* `append`: Append contents of this :class:`DataFrame` to existing data.
* `overwrite`: Overwrite existing data.
* `error` or `errorifexists`: Throw an exception if data already exists.
* `ignore`: Silently ignore this operation if data already exists.
'''
df.write.format("jdbc")\
    .option("url", "jdbc:mysql://bigdata01:3336/hive")\
    .option("driver", "com.mysql.jdbc.Driver") \
    .option("user", "root") \
    .option("password", "123") \
    .option("dbtable", "tmp_20220531_2") \
    .mode(saveMode='append')\
    .save()

# 读取mysql数据
tmp_20220531_2 = spark.read\
    .format("jdbc")\
    .option("url", "jdbc:mysql://bigdata01:3336/hive")\
    .option("driver", "com.mysql.jdbc.Driver")\
    .option("user", "root")\
    .option("password", "123") \
    .option("dbtable", "tmp_20220531_2") \
    .load()\
    .show()

运行结果

 

 

 

posted on 2022-06-06 16:26  Simple-Sir  阅读(780)  评论(0编辑  收藏  举报

导航