Spark学习记录
Spark学习中遇到的一些基本问题以及解决思路。(谢谢各位大佬的经验)
- 读取scv文件第一行做表头
rating = spark.read.option('header','true').csv('file:///home/twain/sparkTest/ml-latest-small/ratings.csv')
- 一个简单的Spark创建和运行流程,统计词汇
from pyspark import SparkContext,SparkConf from pyspark.sql import SparkSession spark = SparkSession.builder.config(conf=SparkConf()).getOrCreate() spark = SparkSession.builder.appName('StructString').getOrCreate() spark.sparkContext.setLogLevel('WARN') lines = spark.readStream.format("socket").option("host", "localhost") \ .option("port", 9999) \ .load() words = lines.select( explode( split(lines.value, " ") ).alias("word") ) wordCounts = words.groupBy("word").count() query = wordCounts \ .writeStream \ .outputMode("complete") \ .format("console") \ .trigger(processingTime="8 seconds") \ .start() query.awaitTermination()
- 读取scv时创建列表名
rating = spark.read.csv('file:///home/twain/sparkTest/computerdata/data/ratings_Computers.csv').toDF('userId','itemId','score','create_time')
- 改变Spark DataFrame中列的类型
http://mini.eastday.com/mobile/191108004955918.html
-
spark机器学习ALS原理(一)
https://www.cnblogs.com/xiguage119/p/10813393.html https://blog.csdn.net/qq_37181642/article/details/102739855
- Spark ML关于模型保存、模型加载案例
https://blog.csdn.net/wangwei_5201314/article/details/89641800
- Spark机器学习库评估标准总结
https://blog.csdn.net/u011707542/article/details/77838588
- Spark SQL数据类型转换
https://blog.csdn.net/an1090239782/article/details/102541024
- 基于PySpark和ALS算法实现基本的电影推荐流程
https://blog.csdn.net/pysense/article/details/103880967
- Pycharm+PySpark远程调试的环境配置的方法
http://www.manongjc.com/article/20160.html
- PySpark机器学习库ML入门
https://www.jianshu.com/p/20456b512fa7
- VMware 虚拟机下为Ubuntu配置静态IP(NET方式)
https://www.cnblogs.com/liermao12/p/6079471.html
- Spark SQL使用explde展开嵌套的JSON数据
https://blog.csdn.net/strongyoung88/article/details/52227568
- PySpark调用Python第三方库出现ImportEoor:No module named... 问题
https://blog.csdn.net/lhx_xhl/article/details/85225968?depth_1-utm_source=distribute.pc_relevant.none-task-blog-BlogCommendFromBaidu-1&utm_source=distribute.pc_relevant.none-task-blog-BlogCommendFromBaidu-1
一个刚开始接触互联网滴小白鼠
【推荐】编程新体验,更懂你的AI,立即体验豆包MarsCode编程助手
【推荐】博客园携手 AI 驱动开发工具商 Chat2DB 推出联合终身会员
【推荐】抖音旗下AI助手豆包,你的智能百科全书,全免费不限次数
【推荐】轻量又高性能的 SSH 工具 IShell:AI 加持,快人一步