Python数据分析库pandas ------ GroupBy数据聚合、等级分组、组迭代、链式转换、聚合分组后取值
数据聚合(GroupBy)
1 frame10 = pd.DataFrame({
2 'color': ['white','red','green','red','green'],
3 'object': ['pen','pencil','pencil','ashtray','pen'],
4 'price1' : [5.56,4.20,1.30,0.56,2.75],
5 'price2' : [4.75,4.12,1.60,0.75,3.15]
6 })
7 group = frame10['price1'].groupby(frame10['color'])
8 print(group, "\n-----*group*-----\n")
9 print(group.groups, "\n-----*group.groups*-----\n")
10 print(group.sum(), "\n-----*group.sum*-----\n")
11 print(group.mean(), "\n-----*group.mean*-----")
输出结果:
令x = group.sum(), 则可以取值为:
x.values
x.values
Out[20]: array([2.75, 1.3 , 0.56, 4.2 , 5.56])
等级分组
1 ggroup = frame10['price1'].groupby([frame10['color'],frame10['object']])
2 print(ggroup.groups, "\n-----*ggroup.groups*-----\n")
3 print(ggroup.sum(), "\n-----*ggroup.sum*-----\n")
4 print(frame10[['price1','price2']].groupby(frame10['color']).mean())
输出结果:
组迭代
1 for name,group in frame10.groupby('color'):
2 print(name)
3 print(group)
链式转换
1 result1 = frame10['price1'].groupby(frame10['color']).mean()
2 result2 = frame10.groupby(frame10['color']).mean()
3 print(result1, "\n-----*result1*-----\n")
4 print(result2, "\n-----*result2*-----\n")
5 print(frame10.groupby(frame10['color'])['price1'].mean(), "\n-----*frame10.groupby(frame10['color'])['price1'].mean()*-----\n")
6 print((frame10.groupby(frame10['color']).mean())['price1'], "\n-----*-----\n")
7 print(frame10.groupby('color').mean().add_prefix('mean_'))
输出结果:
清澈的爱,只为中国