寒假学习 15 Spark机器学习库MLlib编程实践

1.针对数据导入,提前导入必要的包,如下所示

 

2.将数据集转换为DataFrame

import spark.implicits._

case class Adult(features: org.apache.spark.ml.linalg.Vector, label: String)

val df = sc.textFile("/export/server/spark-3.0.0-bin-hadoop3.2/adult.data.txt").map(_.split(",")).map(p => Adult(Vectors.dense(p(0).toDouble,p(2).toDouble,p(4).toDouble, p(10).toDouble, p(11).toDouble, p(12).toDouble), p(14).toString())).toDF()

 

posted @ 2024-02-25 15:44  搜一码赛  阅读(39)  评论(0编辑  收藏  举报