SparkCore - 随笔分类(第2页) - 学而不思则罔！

第五章_Spark核心编程_Rdd_行动算子_count

摘要：1.定义 /* * 1.定义 * def count(): Long * 2.功能 * 返回 RDD 中元素的个数 * * */ 2. 示例 object countTest extends App { val sparkconf: SparkConf = new SparkConf().setMa 阅读全文

posted @ 2022-03-27 17:16 学而不思则罔！阅读(35) 评论(0) 推荐(0) 编辑

第五章_Spark核心编程_Rdd_行动算子_collect

摘要：1.定义 /* * 1.定义 * def collect(): Array[T] * 2.功能 * 拉取 Rdd所有的元素到 Driver上存储到数组上 * 3.note * 1.当 Rdd元素数据量很到时,可能导致Driver 内存溢出 * * */ 2. 示例 object collectTe 阅读全文

posted @ 2022-03-27 16:16 学而不思则罔！阅读(41) 评论(0) 推荐(0) 编辑

第五章_Spark核心编程_Rdd_行动算子_reduce

摘要：1. 定义 /* * 1.定义 * def reduce(f: (T, T) => T): T * 2.功能 * 聚集 RDD 中的所有元素,先聚合分区内数据,再聚合分区间数据 * 3.note * 1.先在map端reduce,再将结果拉取到Driver上进行reduce * 2.当计算不满足结阅读全文

posted @ 2022-03-27 16:11 学而不思则罔！阅读(39) 评论(0) 推荐(0) 编辑

第五章_Spark核心编程_Rdd_转换算子_keyValue型_(需求)统计出每一个省份每个广告被点击数量排行的 Top3

摘要：1. 需求说明 /* * 数据文件(用户点击行为数据) : * agent.log:时间戳，省份，城市，用户，广告，中间字段使用空格分隔 * 需求1 : * 统计出每一个省份每个广告被点击数量排行的 Top3 * 按省份、广告分组,统计指标为点击次数 * * */ 2. 代码示例 object 阅读全文

posted @ 2022-03-27 15:42 学而不思则罔！阅读(137) 评论(0) 推荐(0) 编辑

第五章_Spark核心编程_Rdd_转换算子_keyValue型_cogroup

摘要：1. 定义 /* * 1.定义 * def cogroup[W](other: RDD[(K, W)]): RDD[(K, (Iterable[V], Iterable[W]))] * def cogroup[W1, W2](other1: RDD[(K, W1)], other2: RDD[(K, 阅读全文

posted @ 2022-03-27 08:41 学而不思则罔！阅读(36) 评论(0) 推荐(0) 编辑

第五章_Spark核心编程_Rdd_转换算子_keyValue型_join&leftOuterJoin&rightOuterJoin&fullOuterJoin

摘要：1. join /* * 1.定义 * def join[W](other: RDD[(K, W)]): RDD[(K, (V, W))] * def join[W](other: RDD[(K, W)], numPartitions: Int): RDD[(K, (V, W))] * 2.功能 * 阅读全文

posted @ 2022-03-27 08:22 学而不思则罔！阅读(47) 评论(0) 推荐(0) 编辑

第五章_Spark核心编程_Rdd_转换算子_keyValue型_sortByKey

摘要：1.定义 /* * 1.定义 * def sortByKey(ascending: Boolean = true, numPartitions: Int = self.partitions.length) * : RDD[(K, V)] = self.withScope * ascending : 阅读全文

posted @ 2022-03-26 09:18 学而不思则罔！阅读(37) 评论(0) 推荐(0) 编辑

第五章_Spark核心编程_Rdd_转换算子_keyValue型_【思考】reduceByKey、flodByKey、aggregateByKey、combineByKey 的区别？

摘要：1. 说明 /* * 思考 : * reduceByKey、flodByKey、aggregateByKey、combineByKey 的区别？ * 本质区别 : Map端聚合和Reduce聚合规则是否相同,是不是要在Map的实现合并器 * * 1. reduceByKey * 1. 定义 * de 阅读全文

posted @ 2022-03-26 08:59 学而不思则罔！阅读(83) 评论(0) 推荐(0) 编辑

第五章_Spark核心编程_Rdd_转换算子_keyValue型_combineByKey

摘要：1. 定义 /* * 1. 定义 * def combineByKey[C](createCombiner: V => C, * mergeValue: (C, V) => C, * mergeCombiners: (C, C) => C, * numPartitions: Int): RDD[(K 阅读全文

posted @ 2022-03-25 19:55 学而不思则罔！阅读(24) 评论(0) 推荐(0) 编辑

第五章_Spark核心编程_Rdd_转换算子_keyValue型_foldByKey

摘要：1. 定义 /* * 1. 定义 * def foldByKey(zeroValue: V)(func: (V, V) => V): RDD[(K, V)] * def foldByKey(zeroValue: V,partitioner: Partitioner)(func: (V, V) => 阅读全文

posted @ 2022-03-25 12:37 学而不思则罔！阅读(27) 评论(0) 推荐(0) 编辑

第五章_Spark核心编程_Rdd_转换算子_keyValue型_aggregateByKey

摘要：1. 定义 /* * 1. 定义 * def aggregateByKey[U: ClassTag](zeroValue: U, partitioner: Partitioner) * (seqOp: (U, V) => U,combOp: (U, U) => U): RDD[(K, U)] * * 阅读全文

posted @ 2022-03-25 12:19 学而不思则罔！阅读(25) 评论(0) 推荐(0) 编辑

第五章_Spark核心编程_Rdd_转换算子_keyValue型_groupByKey

摘要：1. 定义 /* * 1. 定义 * def groupByKey(): RDD[(K, Iterable[V])] * def groupByKey(partitioner: Partitioner): RDD[(K, Iterable[V])] * def groupByKey(numParti 阅读全文

posted @ 2022-03-24 21:42 学而不思则罔！阅读(31) 评论(0) 推荐(0) 编辑

第五章_Spark核心编程_Rdd_转换算子_keyValue型_reduceByKey

摘要：1. 定义 /* * 1. 定义 * def reduceByKey(func: (V, V) => V): RDD[(K, V)] * def reduceByKey(func: (V, V) => V, numPartitions: Int): RDD[(K, V)] * def reduceB 阅读全文

posted @ 2022-03-24 20:38 学而不思则罔！阅读(30) 评论(0) 推荐(0) 编辑

第五章_Spark核心编程_Rdd_转换算子_keyValue型_partitionBy

摘要：1. 定义 /* * 1. 定义 * def partitionBy(partitioner: Partitioner): RDD[(K, V)] * * 2. 功能 * 将数据类型为key-value的Rdd 按照指定 Partitioner 重新进行分区 * 默认分区器为 HashPartiti 阅读全文

posted @ 2022-03-23 19:52 学而不思则罔！阅读(66) 评论(0) 推荐(0) 编辑

第五章_Spark核心编程_Rdd算子_转换算子_2value型_求交集&并集&差集&拉链_intersection&union&subtract&zip

摘要：1. 求交集-intersection /* * 1. 定义 * def intersection(other: RDD[T]): RDD[T] * * 2. 功能 * 对源 RDD 和参数 RDD 求交集后返回一个新的 RDD * 参与运算的两个Rdd 类型必须一致,会对返回的结果进行去重 * * 阅读全文

posted @ 2022-03-23 17:38 学而不思则罔！阅读(155) 评论(0) 推荐(0) 编辑

第五章_Spark核心编程_Rdd_转换算子_Value型_sortBy

摘要：1. 定义 /* * 1. 定义 * def sortBy[K]( * f: (T) => K, * ascending: Boolean = true, * numPartitions: Int = this.partitions.length) * (implicit ord: Ordering 阅读全文

posted @ 2022-03-23 16:11 学而不思则罔！阅读(49) 评论(0) 推荐(0) 编辑

第五章_Spark核心编程_Rdd_转换算子_Value型_repartition

摘要：1.定义 /* * def repartition(numPartitions: Int)(implicit ord: Ordering[T] = null): RDD[T] = withScope { * coalesce(numPartitions, shuffle = true) * } * 阅读全文

posted @ 2022-03-23 15:52 学而不思则罔！阅读(32) 评论(0) 推荐(0) 编辑

第五章_Spark核心编程_Rdd_转换算子_Value型_coalesce算子

摘要：1. 说明 /* * 1. 定义 * def coalesce(numPartitions: Int * , shuffle: Boolean = false * , partitionCoalescer: Option[PartitionCoalescer] = Option.empty) * ( 阅读全文

posted @ 2022-03-23 15:27 学而不思则罔！阅读(48) 评论(0) 推荐(0) 编辑

第五章_Spark核心编程_Rdd_转换算子_Value型_distinct算子

摘要：1. 定义 /* * 1. 定义 * def distinct(): RDD[T] * 2. 功能 * 将Rdd 元素去重,返回去重后的Rdd * * */ object distinctTest extends App { val sparkconf: SparkConf = new SparkC 阅读全文

posted @ 2022-03-23 15:04 学而不思则罔！阅读(33) 评论(0) 推荐(0) 编辑

第五章_Spark核心编程_Rdd_转换算子_Value型_sample算子

摘要：1. 定义 /* * 1. 定义 * def sample( * withReplacement: Boolean, * fraction: Double, * seed: Long = Utils.random.nextLong): RDD[T] * withReplacement : 抽取数据后阅读全文

posted @ 2022-03-23 14:33 学而不思则罔！阅读(38) 评论(0) 推荐(0) 编辑

私人小院

随笔分类 - SparkCore

公告

搜索

常用链接

随笔分类

随笔档案

linux运维

阅读排行榜

评论排行榜

推荐排行榜

最新评论