计算分位数

 

4个分位数的取法:

df1 = spark.createDataFrame([(1,1),(1,2),(1,3),(1,4),(1,5),(1,6),(1,7),(1,8),(1,9),(1,10),(2,1),(2,10),(2,100)],['id','cnt'])

cnt_med_1 = F.expr('percentile_approx(cnt, 0.25)')
cnt_med_2 = F.expr('percentile_approx(cnt, 0.5)')
cnt_med_3 = F.expr('percentile_approx(cnt, 0.75)')
cnt_med_4 = F.expr('percentile_approx(cnt, 0.90)')

df1.groupBy('id').agg(F.max('cnt').alias('max_cnt'),cnt_med_1.alias('cnt_med_1'),cnt_med_2.alias('cnt_med_2'),cnt_med_3.alias('cnt_med_3'),cnt_med_4.alias('cnt_med_4')).show()

 

posted @ 2021-09-24 13:44  muyue123  阅读(303)  评论(0编辑  收藏  举报