寒假学习 15 Spark机器学习库MLlib编程实践
1.针对数据导入,提前导入必要的包,如下所示
2.将数据集转换为DataFrame
import spark.implicits._
case class Adult(features: org.apache.spark.ml.linalg.Vector, label: String)
val df = sc.textFile("/export/server/spark-3.0.0-bin-hadoop3.2/adult.data.txt").map(_.split(",")).map(p => Adult(Vectors.dense(p(0).toDouble,p(2).toDouble,p(4).toDouble, p(10).toDouble, p(11).toDouble, p(12).toDouble), p(14).toString())).toDF()