python spark 求解最大最小平均

rdd = sc.parallelizeDoubles(testData);

Now we’ll calculate the mean of our dataset.

1	LOGGER.info("Mean: " + rdd.mean());

There are similar methods for other statistics operation such as max, standard deviation, …etc.

Every time one of this method is invoked , Spark performs the operation on the entire RDD data. If more than one operations performed, it will repeat again and again which is very inefficient. To solve this, Spark provides “StatCounter” class which executes once and provides results of all basic statistics operations in the same time.

1	StatCounter statCounter = rdd.stats();

Now results can be accessed as follows,

LOGGER.info("Count: " + statCounter.count());

LOGGER.info("Min: " + statCounter.min());

LOGGER.info("Max: " + statCounter.max());

LOGGER.info("Sum: " + statCounter.sum());

LOGGER.info("Mean: " + statCounter.mean());

LOGGER.info("Variance: " + statCounter.variance());

LOGGER.info("Stdev: " + statCounter.stdev());

摘自：http://www.sparkexpert.com/tag/rdd/

posted @ 2017-07-12 10:15 bonelee 阅读(597) 评论(0) 收藏举报

刷新页面返回顶部

将者，智、信、仁、勇、严也。

Hi，我是李智华，华为-安全AI算法专家，欢迎来到安全攻防对抗的有趣世界。

python spark 求解最大最小平均

公告

将者，智、信、仁、勇、严也。

Hi，我是李智华，华为-安全AI算法专家，欢迎来到安全攻防对抗的有趣世界。

python spark 求解最大 最小 平均

公告

python spark 求解最大最小平均