记一次SparkUI的使用记录
内存不足问题
spark 默认分配的内存是4G,通过这个命令分配更大的内存空间给我们的任务
spark-shell --driver-memory 12g
import org.apache.spark.sql.DataFrame
val hdfs_path_apply: String = s"/mnt/g/BaiduNetdiskDownload/2011-2019小汽车摇号数据/apply"
val applyNumbersDF: DataFrame = spark.read.parquet(hdfs_path_apply)
val hdfs_path_lucky: String = s"/mnt/g/BaiduNetdiskDownload/2011-2019小汽车摇号数据/lucky"
val luckyDogsDF: DataFrame = spark.read.parquet(hdfs_path_lucky)
val filteredLuckyDogs: DataFrame = luckyDogsDF.filter(col("batchNum") >= "201601").select("carNum")
val jointDF: DataFrame = applyNumbersDF.join(filteredLuckyDogs, Seq("carNum"), "inner")
val multipliers: DataFrame = jointDF.groupBy(col("batchNum"),col("carNum")).agg(count(lit(1)).alias("multiplier"))
val uniqueMultipliers: DataFrame = multipliers.groupBy("carNum").agg(max("multiplier").alias("multiplier"))
val result: DataFrame = uniqueMultipliers.groupBy("multiplier").agg(count(lit(1)).alias("cnt")).orderBy("multiplier")
result.collect
结果
浏览器访问:http://192.168.128.5:4040。页面分成Job,Stages,Storage,Envoronment,Executors,SQL