pandas group分组与agg聚合
import pandas as pd
df = pd.DataFrame({'Country':['China','China', 'India', 'India', 'America', 'Japan', 'China', 'India'],
'Income':[10000, 10000, 5000, 5002, 40000, 50000, 8000, 5000],
'Age':[5000, 4321, 1234, 4010, 250, 250, 4500, 4321]})
Age Country Income
0 5000 China 10000
1 4321 China 10000
2 1234 India 5000
3 4010 India 5002
4 250 America 40000
5 250 Japan 50000
6 4500 China 8000
7 4321 India 5000
分组
单列分组
df_gb = df.groupby('Country')
for index, data in df_gb:
print(index)
print(data)
输出
America
Age Country Income
4 250 America 40000
China
Age Country Income
0 5000 China 10000
1 4321 China 10000
6 4500 China 8000
India
Age Country Income
2 1234 India 5000
3 4010 India 5002
7 4321 India 5000
Japan
Age Country Income
5 250 Japan 50000
多列分组
df_gb = df.groupby(['Country', 'Income'])
for (index1, index2), data in df_gb:
print((index1, index2))
print(data)
输出
('America', 40000)
Age Country Income
4 250 America 40000
('China', 8000)
Age Country Income
6 4500 China 8000
('China', 10000)
Age Country Income
0 5000 China 10000
1 4321 China 10000
('India', 5000)
Age Country Income
2 1234 India 5000
7 4321 India 5000
('India', 5002)
Age Country Income
3 4010 India 5002
('Japan', 50000)
Age Country Income(http://www.my516.com)
5 250 Japan 50000
聚合
对分组后数据进行聚合
默认情况对分组之后其他列进行聚合
df_agg = df.groupby('Country').agg(['min', 'mean', 'max'])
print(df_agg)
输出
Age Income
min mean max min mean max
Country
America 250 250.000000 250 40000 40000.000000 40000
China 4321 4607.000000 5000 8000 9333.333333 10000
India 1234 3188.333333 4321 5000 5000.666667 5002
Japan 250 250.000000 250 50000 50000.000000 50000
对分组后的部分列进行聚合
某些情况,只需要对部分数据进行不同的聚合操作,可以通过字典来构建
num_agg = {'Age':['min', 'mean', 'max']}
print(df.groupby('Country').agg(num_agg))
输出
Age
min mean max
Country
America 250 250.000000 250
China 4321 4607.000000 5000
India 1234 3188.333333 4321
Japan 250 250.000000 250
num_agg = {'Age':['min', 'mean', 'max'], 'Income':['min', 'max']}
print(df.groupby('Country').agg(num_agg))
输出
Age Income
min mean max min max
Country
America 250 250.000000 250 40000 40000
China 4321 4607.000000 5000 8000 10000
India 1234 3188.333333 4321 5000 5002
Japan 250 250.000000 250 50000 50000
---------------------