01 2025 档案
摘要:import org.apache.spark.SparkConfimport org.apache.spark.streaming.{Seconds, StreamingContext}import org.apache.spark.streaming.flume.FlumeUtils objec
阅读全文
摘要:import org.apache.spark.SparkConfimport org.apache.spark.streaming.{Seconds, StreamingContext}import org.apache.spark.streaming.flume.FlumeUtils objec
阅读全文
摘要:import org.apache.spark.ml.clustering.KMeansimport org.apache.spark.ml.evaluation.ClusteringEvaluatorimport org.apache.spark.sql.SparkSession object K
阅读全文
摘要:import org.apache.spark.ml.regression.LinearRegressionimport org.apache.spark.ml.evaluation.RegressionEvaluatorimport org.apache.spark.sql.SparkSessio
阅读全文
摘要:import org.apache.spark.ml.classification.RandomForestClassifierimport org.apache.spark.ml.evaluation.MulticlassClassificationEvaluatorimport org.apac
阅读全文
摘要:import org.apache.spark.ml.classification.DecisionTreeClassifierimport org.apache.spark.ml.evaluation.MulticlassClassificationEvaluatorimport org.apac
阅读全文
摘要:import org.apache.spark.ml.classification.LogisticRegressionimport org.apache.spark.ml.evaluation.BinaryClassificationEvaluatorimport org.apache.spark
阅读全文
摘要:// 数据导入val data = spark.read.option("header", "true").csv("data/adult.csv") // 数据预处理val assembler = new VectorAssembler() .setInputCols(Array("age", "
阅读全文
摘要:划分训练集和测试集为了评估模型性能,我们需要将数据集划分为训练集和测试集。scala// 随机划分数据集为训练集和测试集val Array(trainingData, testData) = data.randomSplit(Array(0.7, 0.3), seed = 1234L)5. 创建逻辑
阅读全文
摘要:// 创建StreamingContextval ssc = new StreamingContext(sc, Seconds(5)) // 从Flume接收数据val flumeStream = FlumeUtils.createStream(ssc, "localhost", 44444) //
阅读全文
摘要:// 创建SparkSessionval spark = SparkSession.builder() .appName("SparkSQLExample") .config("spark.master", "local") .getOrCreate() // 创建DataFrameval data
阅读全文
摘要:// 创建SparkContextval sc = new SparkContext("local", "RDDExample") // 创建RDDval data = Array(1, 2, 3, 4, 5)val distData = sc.parallelize(data, 2) // 转换操
阅读全文
摘要:# 下载Sparkwget https://downloads.apache.org/spark/spark-2.1.0/spark-2.1.0-bin-hadoop2.7.tgz # 解压到指定目录sudo tar -xzf spark-2.1.0-bin-hadoop2.7.tgz -C /us
阅读全文
摘要:# 安装Javasudo apt updatesudo apt install openjdk-8-jdk -y # 配置Java环境变量echo "export JAVA_HOME=/usr/lib/jvm/java-8-openjdk-amd64" >> ~/.bashrcecho "expor
阅读全文
摘要:import org.apache.spark.graphx.{Edge, Graph, VertexId}import org.apache.spark.rdd.RDDimport org.apache.spark.{SparkConf, SparkContext} object SparkGra
阅读全文
摘要:import org.apache.spark.{SparkConf, SparkContext} object SparkBasic { def main(args: Array[String]): Unit = { val conf = new SparkConf().setAppName("S
阅读全文
摘要:import org.apache.spark.{SparkConf, SparkContext}import org.apache.spark.streaming.{Seconds, StreamingContext}import org.apache.spark.sql.SparkSession
阅读全文
摘要:import org.apache.spark.SparkConfimport org.apache.spark.streaming.{Seconds, StreamingContext}import org.apache.spark.ml.classification.LogisticRegres
阅读全文
摘要:import org.apache.spark.graphx._import org.apache.spark.rdd.RDDimport org.apache.spark.{SparkConf, SparkContext} val conf = new SparkConf().setAppName
阅读全文
摘要:import org.apache.spark.ml.classification.LogisticRegressionimport org.apache.spark.sql.SparkSession val spark = SparkSession.builder.appName("MLlib E
阅读全文
摘要:import org.apache.spark.streaming.{Seconds, StreamingContext} // 初始化StreamingContextval conf = new SparkConf().setAppName("Spark Streaming").setMaster
阅读全文
摘要:import org.apache.spark.{SparkConf, SparkContext} // 初始化SparkContextval conf = new SparkConf().setAppName("Spark Basics").setMaster("local")val sc = n
阅读全文
摘要:// 定义一个类class Person(val name: String, val age: Int) // 使用集合操作val numbers = List(1, 2, 3, 4, 5)val doubled = numbers.map(_ * 2)val filtered = numbers.
阅读全文
摘要:// 打印输出println("Hello, Scala!") // 基本运算val a = 10val b = 5println(s"Sum: ${a + b}")println(s"Product: ${a * b}") // 判断奇偶数val number = scala.io.StdIn.r
阅读全文