spark streaming中join操作
/** * Return a new DStream by applying 'join' between RDDs of `this` DStream and `other` DStream. * The supplied org.apache.spark.Partitioner is used to control the partitioning of each RDD. */ def join[W: ClassTag]( other: DStream[(K, W)], partitioner: Partitioner ): DStream[(K, (V, W))] = ssc.withScope { self.transformWith( other, (rdd1: RDD[(K, V)], rdd2: RDD[(K, W)]) => rdd1.join(rdd2, partitioner)
实际还是调用RDD的join操作
) }
当应用于两个DStream(一个包含(K,V)键值对,一个包含(K,W)键值对),返回一个包含(K, (V, W))键值对的新Dstream
val conf: SparkConf = new SparkConf().setMaster("local[*]").setAppName("wordcount") val sc = new StreamingContext(conf, Duration(10000)) val lineStream: ReceiverInputDStream[String] = sc.socketTextStream("localhost", 9999) val DS1: DStream[(String, Int)] = lineStream.flatMap(_.split(" ")).map((_, 1)) val lineStream2: ReceiverInputDStream[String] = sc.socketTextStream("localhost", 8888) val DS2: DStream[(String, Int)] = lineStream2.flatMap(_.split(" ")).map((_, 1)) val DsSum: DStream[(String, (Int, Int))] = DS1.join(DS2) DsSum.print() sc.start() sc.awaitTermination()
【推荐】国内首个AI IDE,深度理解中文开发场景,立即下载体验Trae
【推荐】编程新体验,更懂你的AI,立即体验豆包MarsCode编程助手
【推荐】抖音旗下AI助手豆包,你的智能百科全书,全免费不限次数
【推荐】轻量又高性能的 SSH 工具 IShell:AI 加持,快人一步
· TypeScript + Deepseek 打造卜卦网站:技术与玄学的结合
· Manus的开源复刻OpenManus初探
· AI 智能体引爆开源社区「GitHub 热点速览」
· 从HTTP原因短语缺失研究HTTP/2和HTTP/3的设计差异
· 三行代码完成国际化适配,妙~啊~