摘要:
import org.apache.spark.SparkConfimport org.apache.spark.streaming.{Seconds, StreamingContext}import org.apache.spark.streaming.flume.FlumeUtils objec 阅读全文
摘要:
import org.apache.spark.SparkConfimport org.apache.spark.streaming.{Seconds, StreamingContext}import org.apache.spark.streaming.flume.FlumeUtils objec 阅读全文
摘要:
import org.apache.spark.ml.clustering.KMeansimport org.apache.spark.ml.evaluation.ClusteringEvaluatorimport org.apache.spark.sql.SparkSession object K 阅读全文
摘要:
import org.apache.spark.ml.regression.LinearRegressionimport org.apache.spark.ml.evaluation.RegressionEvaluatorimport org.apache.spark.sql.SparkSessio 阅读全文
摘要:
import org.apache.spark.ml.classification.RandomForestClassifierimport org.apache.spark.ml.evaluation.MulticlassClassificationEvaluatorimport org.apac 阅读全文
摘要:
import org.apache.spark.ml.classification.DecisionTreeClassifierimport org.apache.spark.ml.evaluation.MulticlassClassificationEvaluatorimport org.apac 阅读全文
摘要:
import org.apache.spark.ml.classification.LogisticRegressionimport org.apache.spark.ml.evaluation.BinaryClassificationEvaluatorimport org.apache.spark 阅读全文
摘要:
// 数据导入val data = spark.read.option("header", "true").csv("data/adult.csv") // 数据预处理val assembler = new VectorAssembler() .setInputCols(Array("age", " 阅读全文
摘要:
划分训练集和测试集为了评估模型性能,我们需要将数据集划分为训练集和测试集。scala// 随机划分数据集为训练集和测试集val Array(trainingData, testData) = data.randomSplit(Array(0.7, 0.3), seed = 1234L)5. 创建逻辑 阅读全文
摘要:
// 创建StreamingContextval ssc = new StreamingContext(sc, Seconds(5)) // 从Flume接收数据val flumeStream = FlumeUtils.createStream(ssc, "localhost", 44444) // 阅读全文