pyspark编程实现(借助python内置reduce实现迭代运算)

行运算

from functools import reduce
mean_res = reduce(lambda data,idx :data.withColumn('mean', data['mean'] + data[idx]), range(len(mean_.columns)), mean_)
mean_res.show()
+----------+----+----+---------+
|total_bill| tip|size|     mean|
+----------+----+----+---------+
|     16.99|1.01| 2.0|     40.0|
|     10.34|1.66| 3.0|     30.0|
|     21.01| 3.5| 3.0|    55.02|
|     23.68|3.31| 2.0|    57.98|
|     24.59|3.61| 4.0|     64.4|
|     25.29|4.71| 4.0|     68.0|
|      8.77| 2.0| 2.0|    25.54|
|     26.88|3.12| 4.0|     68.0|
|     15.04|1.96| 2.0|     38.0|
|     14.78|3.23| 2.0|    40.02|
|     10.27|1.71| 2.0|27.960001|
|     35.26| 5.0| 4.0|    88.52|
|     15.42|1.57| 2.0|    37.98|
|     18.43| 3.0| 4.0|    50.86|
|     14.83|3.02| 2.0|     39.7|
|     21.58|3.92| 2.0|     55.0|
|     10.33|1.67| 3.0|     30.0|
|     16.29|3.71| 3.0|     46.0|
|     16.97| 3.5| 3.0|    46.94|
|     20.65|3.35| 3.0|     54.0|
+----------+----+----+---------+
only showing top 20 rows

mean_res.withColumn('mean', mean_res.mean / 3).show(5)
+----------+----+----+------------------+
|total_bill| tip|size|              mean|
+----------+----+----+------------------+
|     16.99|1.01| 2.0|13.333333333333334|
|     10.34|1.66| 3.0|              10.0|
|     21.01| 3.5| 3.0| 18.34000015258789|
|     23.68|3.31| 2.0|19.326666514078777|
|     24.59|3.61| 4.0| 21.46666717529297|
+----------+----+----+------------------+
only showing top 5 rows

高效更换列名

reduce(lambda data, newName: data.withColumnRenamed(newName, newName + '_'), [ele for ele in mean_res.columns], mean_res).show()
+-----------+----+-----+---------+
|total_bill_|tip_|size_|    mean_|
+-----------+----+-----+---------+
|      16.99|1.01|  2.0|     40.0|
|      10.34|1.66|  3.0|     30.0|
|      21.01| 3.5|  3.0|    55.02|
|      23.68|3.31|  2.0|    57.98|
|      24.59|3.61|  4.0|     64.4|
|      25.29|4.71|  4.0|     68.0|
|       8.77| 2.0|  2.0|    25.54|
|      26.88|3.12|  4.0|     68.0|
|      15.04|1.96|  2.0|     38.0|
|      14.78|3.23|  2.0|    40.02|
|      10.27|1.71|  2.0|27.960001|
|      35.26| 5.0|  4.0|    88.52|
|      15.42|1.57|  2.0|    37.98|
|      18.43| 3.0|  4.0|    50.86|
|      14.83|3.02|  2.0|     39.7|
|      21.58|3.92|  2.0|     55.0|
|      10.33|1.67|  3.0|     30.0|
|      16.29|3.71|  3.0|     46.0|
|      16.97| 3.5|  3.0|    46.94|
|      20.65|3.35|  3.0|     54.0|
+-----------+----+-----+---------+
only showing top 20 rows
posted @ 2020-11-21 16:47  seekerJunYu  阅读(20)  评论(0编辑  收藏  举报