Spark Programming--Fundamental operation
max
max(key=None)
Find the maximum item in this RDD.
Parameters:key – A function used to generate key for comparing
例子:
mean
mean()
Compute the mean of this RDD’s elements.
min
min(key=None)
Find the minimum item in this RDD.
Parameters:key – A function used to generate key for comparing
name/setName
name()
setName(name)
给RDD命名或者返回RDD的名字
例子:
others
sc.parallelize():创建RDD,建议使用xrange
getNumPartitions():获取分区数
sc.emptyRDD():返回一个空的RDD
glom():以分区为单位返回list
collect():返回list(一般是返回driver program)
例子:
sc.textFile(path):读取文件,返回RDD(具体见Actions II)
官网函数:textFile(name, minPartitions=None, use_unicode=True)
支持读取文件:a text file from HDFS, a local file system (available on all nodes), or any Hadoop-supported file system URI, and return it as an RDD of Strings.
例子(本地文件读取)