

* Return a new RDD by applying a function to all elements of this RDD.
* 一对一的进行RDD的转换操作,并且产生一个新的RDD储存所有的elements
def map[U: ClassTag](f: T => U): RDD[U]



   * Return a new RDD containing only the elements that satisfy a predicate.
   * 过滤的RDD转换操作
  def filter(f: T => Boolean): RDD[T] 


因为笔者在flatMap这个算子上吃亏比较多  这里会给出两个个案例



   * 通过一个算法将RDD多维化,但是输出却是平面的类型
  def flatMap[U: ClassTag](f: T => TraversableOnce[U]): RDD[U]














   * 将RDD进行分块操作,使该RDD区域的所有元素执行此命令
  def mapPartitions[U: ClassTag](
      f: Iterator[T] => Iterator[U],
      preservesPartitioning: Boolean = false): RDD[U]



  def mapPartitionsWithIndex[U: ClassTag](
      f: (Int, Iterator[T]) => Iterator[U],
      preservesPartitioning: Boolean = false): RDD[U]



   * 联合一个RDD,返回一个组合的RDD,但是两个RDD的类型得一样
  def union(other: RDD[T]): RDD[T]



   * 根据Key进行聚合操作
  def reduceByKey(func: (V, V) => V): RDD[(K, V)]




   * 延时处理,但是实际开发,reduceBykey用的更多,将key相同的value聚集到一起
  def groupByKey(partitioner: Partitioner): RDD[(K, Iterable[V])] 



def combineByKey[C](
      createCombiner: V => C,
      mergeValue: (C, V) => C,
      mergeCombiners: (C, C) => C,
      numPartitions: Int): RDD[(K, C)] 




  def aggregateByKey[U: ClassTag](zeroValue: U, partitioner: Partitioner)(seqOp: (U, V) => U,
      combOp: (U, U) => U): RDD[(K, U)] 




  def foldByKey(
      zeroValue: V,
      partitioner: Partitioner)(func: (V, V) => V): RDD[(K, V)] 


  def sortByKey(ascending: Boolean = true, numPartitions: Int = self.partitions.length)
    : RDD[(K, V)] 



  def sortBy[K](
      f: (T) => K,
      ascending: Boolean = true,
      numPartitions: Int = this.partitions.length)
      (implicit ord: Ordering[K], ctag: ClassTag[K]): RDD[T] 



//JOIN 只留下双方都有的KEY
//left JOIN 留下左边RDD的数据
//right JOIN 留下右边RDD的数据
def join[W](other: RDD[(K, W)], partitioner: Partitioner): RDD[(K, (V, W))]



  def coalesce(numPartitions: Int, shuffle: Boolean = false,
               partitionCoalescer: Option[PartitionCoalescer] = Option.empty)
              (implicit ord: Ordering[T] = null)
      : RDD[T] 


def repartition(numPartitions: Int)(implicit ord: Ordering[T] = null): RDD[T]


