pd.pivot_table 透视表

实现透视表功能

参数如下：

pandas.pivot_table(data, values=None, index=None, columns=None, aggfunc='mean', fill_value=None, margins=False, dropna=True, margins_name='All', observed=False)

参数解释：

data：df

values：要被聚合的列，可选

index：可以是列，也可以是和df长度一样的array，这个索引就是我们横向维度，如果是多个维度使用list表达

colums：可以是列，是我们纵向的维度，如果是多个维度使用list表达

aggfunc：function, list of functions, dict, default numpy.mean，我们可以np.sum()，或者其他函数

fill_value：替换缺失值的值（在汇总后生成的数据透视表中）

margins：边距bool，默认为False，添加所有行/列（例如，小计/总计）

dropna： bool，默认为True，不要包括所有条目均为NaN的列。

margins_name： str，默认为“全部”，当margins为True时将包含总计的行/列的名称。

observed：观察到的布尔值，默认为False，仅当任何石斑鱼是分类者时才适用。如果为True：仅显示分类石斑鱼的观测值。如果为False：显示分类石斑鱼的所有值。

补充一些：

1.官网：https://pandas.pydata.org/pandas-docs/stable/reference/api/pandas.pivot_table.html

2.网上有一篇博客写得比较好：https://www.cnblogs.com/Yanjy-OnlyOne/p/11195621.html

ndex就是层次字段，要通过透视表获取什么信息就按照相应的顺序设置字段

而Values可以对需要的计算数据进行筛选

aggfunc参数可以设置我们对数据聚合时进行的函数操作

Columns类似Index可以设置列层次字段，它不是一个必要参数，作为一种分割数据的可选方式

例子

df = pd.DataFrame({"A": ["foo", "foo", "foo", "foo", "foo",
                         "bar", "bar", "bar", "bar"],
                   "B": ["one", "one", "one", "two", "two",
                         "one", "one", "two", "two"],
                   "C": ["small", "large", "large", "small",
                         "small", "large", "small", "small",
                         "large"],
                   "D": [1, 2, 2, 3, 3, 4, 5, 6, 7],
                   "E": [2, 4, 5, 5, 6, 6, 8, 9, 9]})
df

     A    B      C  D  E
0  foo  one  small  1  2
1  foo  one  large  2  4
2  foo  one  large  2  5
3  foo  two  small  3  5
4  foo  two  small  3  6
5  bar  one  large  4  6
6  bar  one  small  5  8
7  bar  two  small  6  9
8  bar  two  large  7  9

求和来汇总值

table = pd.pivot_table(df, values='D', index=['A', 'B'],
                    columns=['C'], aggfunc=np.sum)
table

C        large  small
A   B
bar one    4.0    5.0
    two    7.0    6.0
foo one    4.0    1.0
    two    NaN    6.0

使用fill_value参数填充缺少的值

table = pd.pivot_table(df, values='D', index=['A', 'B'],
                    columns=['C'], aggfunc=np.sum, fill_value=0)
table

C        large  small
A   B
bar one      4      5
    two      7      6
foo one      4      1
    two      0      6

取多个列的平均值进行汇总

table = pd.pivot_table(df, values=['D', 'E'], index=['A', 'C'],
                    aggfunc={'D': np.mean,
                             'E': np.mean})
table

                D         E
A   C
bar large  5.500000  7.500000
    small  5.500000  8.500000
foo large  2.000000  4.500000
    small  2.333333  4.333333


任何给定的值列计算多种类型的汇总

table = pd.pivot_table(df, values=['D', 'E'], index=['A', 'C'],
                    aggfunc={'D': np.mean,
                             'E': [min, max, np.mean]})
table

               D    E
            mean  max      mean  min
A   C
bar large  5.500000  9.0  7.500000  6.0
    small  5.500000  9.0  8.500000  8.0
foo large  2.000000  5.0  4.500000  4.0
    small  2.333333  6.0  4.333333  2.0

补充一点，如果要计算费数字型的计算也就是count,可以这样子使用

# Filter on 'Win'; make pivot table 
df[df.Result == 'Win'].pivot_table(index = 'Name', 
            values = ['Result', 'Speed'], 
            aggfunc = {'Result' : 'count', 
               'Speed' : 'mean'}, 
            fill_value = 0).rename(columns = {'Result' : 'Win'})

当然你也可以使用groupby的agg

#或者GROUPBY和聚合：

# groupby.agg() 
df[df.Result == 'Win'].groupby('Name').agg({'Result' : 'count', 
              'Speed' : 'mean'}).rename({'Result' : 'Win'})

posted on 2020-11-03 20:47 小小喽啰阅读(665) 评论(0) 编辑收藏举报