缓存Cache
1.spark的缓存级别参照【org.apache.spark.storage.StorageLevel.scala】
new StorageLevel(_useDisk,_useMemory, _useOffHeap,_deserialized,_replication: Int = 1)
val NONE = new StorageLevel(false, false, false, false)
val DISK_ONLY = new StorageLevel(true, false, false, false)
val DISK_ONLY_2 = new StorageLevel(true, false, false, false, 2)
val MEMORY_ONLY = new StorageLevel(false, true, false, true)
val MEMORY_ONLY_2 = new StorageLevel(false, true, false, true, 2)
val MEMORY_ONLY_SER = new StorageLevel(false, true, false, false)
val MEMORY_ONLY_SER_2 = new StorageLevel(false, true, false, false, 2)
val MEMORY_AND_DISK = new StorageLevel(true, true, false, true)
val MEMORY_AND_DISK_2 = new StorageLevel(true, true, false, true, 2)
val MEMORY_AND_DISK_SER = new StorageLevel(true, true, false, false)
val MEMORY_AND_DISK_SER_2 = new StorageLevel(true, true, false, false, 2)
val OFF_HEAP = new StorageLevel(true, true, true, false, 1)
默认缓存级别:def persist(): this.type = persist(StorageLevel.MEMORY_ONLY),
默认情况下persist() 会把数据以反序列化的形式缓存在JVM的堆空间中。
取消缓存,执行RDD.unpersist()
设置RDD的缓存级别,执行RDD.persist(StorageLevel.MEMORY_AND_DISK)