|NO.Z.00030|——————————|BigDataEnd|——|Hadoop&Spark.V04|——|Spark.v04|sparkcore|RDD编程高阶&RDD容错机制Checkpoint|

一、RDD容错机制Checkpoint
### --- 涉及到的算子:checkpoint;也是 Transformation

~~~     Spark中对于数据的保存除了持久化操作之外,还提供了检查点的机制;
### --- 检查点本质是通过将RDD写入高可靠的磁盘,主要目的是为了容错。检查点通过将

~~~     数据写入到HDFS文件系统实现了RDD的检查点功能。
~~~     Lineage过长会造成容错成本过高,这样就不如在中间阶段做检查点容错,
~~~     如果之后有节点出现问题而丢失分区,从做检查点的RDD开始重做Lineage,就会减少开销。
~~~     cache 和 checkpoint 是有显著区别的,缓存把 RDD 计算出来然后放在内存中,
~~~     但是 RDD 的依赖链不能丢掉, 当某个点某个 executor 宕了,
~~~     上面 cache 的RDD就会丢掉, 需要通过依赖链重放计算。
~~~     不同的是,checkpoint 是把 RDD 保存在 HDFS中,是多副本可靠存储,
~~~     此时依赖链可以丢掉,所以斩断了依赖链。
### --- 以下场景适合使用检查点机制:

~~~     DAG中的Lineage过长,如果重算,则开销太大
~~~     在宽依赖上做 Checkpoint 获得的收益更大
~~~     与cache类似 checkpoint 也是 lazy 的。
二、RDD容错机制Checkpoint
### --- RDD容错机制Checkpoint

scala> val rdd1 = sc.parallelize(1 to 100000)
rdd1: org.apache.spark.rdd.RDD[Int] = ParallelCollectionRDD[23] at parallelize at <console>:24
~~~     # 设置检查点目录

scala> sc.setCheckpointDir("/tmp/checkpoint")
21/10/19 21:35:08 WARN SparkContext: Spark is not running in local mode, therefore the checkpoint directory must not be on the local filesystem. Directory '/tmp/checkpoint' appears to be on the local filesystem.

scala> val rdd2 = rdd1.map(_*2)
rdd2: org.apache.spark.rdd.RDD[Int] = MapPartitionsRDD[24] at map at <console>:25

scala> rdd2.checkpoint
~~~     # checkpoint是lazy操作

scala> rdd2.isCheckpointed
res20: Boolean = false
~~~     # checkpoint之前的rdd依赖关系

scala> rdd2.dependencies(0).rdd
res21: org.apache.spark.rdd.RDD[_] = ParallelCollectionRDD[23] at parallelize at <console>:24

scala> rdd2.dependencies(0).rdd.collect
res22: Array[_] = Array(1, 2, 3, 4, 5, 6, 7, 8, 9, 10, 11, 12, 13, 14, 15, 16, 17, 18, 19, 20, 21, 22, 23, 24, 25, 26, 27, 28, 29, 30, 31, 32, 33, 34, 35, 36, 37, 38, 39, 40, 41, 42, 43, 44, 45, 46, 47, 48, 49, 50, 51, 52, 53, 54, 55, 56, 57, 58, 59, 60, 61, 62, 63, 64, 65, 66, 67, 68, 69, 70, 71, 72, 73, 74, 75, 76, 77, 78, 79, 80, 81, 82, 83, 84, 85, 86, 87, 88, 89, 90, 91, 92, 93, 94, 95, 96, 97, 98, 99, 100, 101, 102, 103, 104, 105, 106, 107, 108, 109, 110, 111, 112, 113, 114, 115, 116, 117, 118, 119, 120, 121, 122, 123, 124, 125, 126, 127, 128, 129, 130, 131, 132, 133, 134, 135, 136, 137, 138, 139, 140, 141, 142, 143, 144, 145, 146, 147, 148, 149, 150, 151, 152, 153, 154, 155, 156, 157, 158, 159, 160, 161, 162, 163, 164, 165, 166, 167, 168, 169, 170, 171, ...
~~~     # 执行一次action,触发checkpoint的执行

scala> rdd2.count
res23: Long = 100000                                                            

scala> rdd2.isCheckpointed
res24: Boolean = true
~~~     # 再次查看RDD的依赖关系。可以看到checkpoint后,RDD的lineage被截断,变成从checkpointRDD开始

scala> rdd2.dependencies(0).rdd
res25: org.apache.spark.rdd.RDD[_] = ReliableCheckpointRDD[25] at count at <console>:26

scala> rdd2.dependencies(0).rdd.collect
res26: Array[_] = Array(2, 4, 6, 8, 10, 12, 14, 16, 18, 20, 22, 24, 26, 28, 30, 32, 34, 36, 38, 40, 42, 44, 46, 48, 50, 52, 54, 56, 58, 60, 62, 64, 66, 68, 70, 72, 74, 76, 78, 80, 82, 84, 86, 88, 90, 92, 94, 96, 98, 100, 102, 104, 106, 108, 110, 112, 114, 116, 118, 120, 122, 124, 126, 128, 130, 132, 134, 136, 138, 140, 142, 144, 146, 148, 150, 152, 154, 156, 158, 160, 162, 164, 166, 168, 170, 172, 174, 176, 178, 180, 182, 184, 186, 188, 190, 192, 194, 196, 198, 200, 202, 204, 206, 208, 210, 212, 214, 216, 218, 220, 222, 224, 226, 228, 230, 232, 234, 236, 238, 240, 242, 244, 246, 248, 250, 252, 254, 256, 258, 260, 262, 264, 266, 268, 270, 272, 274, 276, 278, 280, 282, 284, 286, 288, 290, 292, 294, 296, 298, 300, 302, 304, 306, 308, 310, 312, 314, 316, 318, 320, ...
~~~     # 查看RDD所依赖的checkpoint文件
~~~     备注:checkpoint的文件作业执行完毕后不会被删除

scala> rdd2.getCheckpointFile
res27: Option[String] = Some(hdfs://hadoop01:9000/tmp/checkpoint/5b7e7925-86bb-435c-882e-9d2c31caf7c7/rdd-24)

 
 
 
 
 
 
 
 
 

Walter Savage Landor:strove with none,for none was worth my strife.Nature I loved and, next to Nature, Art:I warm'd both hands before the fire of life.It sinks, and I am ready to depart
                                                                                                                                                   ——W.S.Landor

 

posted on   yanqi_vip  阅读(34)  评论(0编辑  收藏  举报

相关博文:
阅读排行:
· Manus重磅发布:全球首款通用AI代理技术深度解析与实战指南
· 被坑几百块钱后,我竟然真的恢复了删除的微信聊天记录!
· 没有Manus邀请码?试试免邀请码的MGX或者开源的OpenManus吧
· 【自荐】一款简洁、开源的在线白板工具 Drawnix
· 园子的第一款AI主题卫衣上架——"HELLO! HOW CAN I ASSIST YOU TODAY
< 2025年3月 >
23 24 25 26 27 28 1
2 3 4 5 6 7 8
9 10 11 12 13 14 15
16 17 18 19 20 21 22
23 24 25 26 27 28 29
30 31 1 2 3 4 5

导航

统计

点击右上角即可分享
微信分享提示