Spark-Cache与Checkpoint
一、Cache缓存操作
scala> val rdd1 = sc.textFile("hdfs://192.168.146.111:9000/logs") rdd1: org.apache.spark.rdd.RDD[String] = hdfs://192.168.146.111:9000/logs MapPartitionsRDD[38] at textFile at <console>:24 scala> rdd1.count res13: Long = 40155 scala> rdd1.count res14: Long = 40155
scala> val rdd2 = sc.textFile("hdfs://192.168.146.111:9000/logs") rdd2: org.apache.spark.rdd.RDD[String] = hdfs://192.168.146.111:9000/logs MapPartitionsRDD[40] at textFile at <console>:24 scala> val rdd2Cache = rdd2.cache rdd2Cache: rdd2.type = hdfs://192.168.146.111:9000/logs MapPartitionsRDD[40] at textFile at <console>:24 scala> rdd2Cache.count res15: Long = 40155 scala> rdd2Cache.count res16: Long = 40155 scala> rdd2Cache.count res17: Long = 40155
二、Checpoint机制
scala> sc.setCheckpointDir("hdfs://192.168.146.111:9000/chechdir") scala> val rddc = rdd1.filter(_.contains("bigdata")) rddc: org.apache.spark.rdd.RDD[String] = MapPartitionsRDD[41] at filter at <console>:26 scala> rddc.checkpoint scala> rddc.count res21: Long = 7155