数据聚合与分组运算学习笔记3

4 透视表和交叉表

4.1 透视表

透视表(pivot table): 根据一个或多个键对数据进行聚合,并根据行和列上的分组键将数据分配到各个
矩形区域中。

  • DataFrame表有一个pivot_table方法,还有一个顶级的pandas.pivot_table
  • pivot_table还有一个分项小计选项,margins
In [32]: tips.pivot_table(index=['sex','smoker'])
Out[32]:
                   size       tip   tip_pct  total_bill
sex    smoker                                          
Female No      2.592593  2.773519  0.156921   18.105185
       Yes     2.242424  2.931515  0.182150   17.977879
Male   No      2.711340  3.113402  0.160669   19.791237
       Yes     2.500000  3.051167  0.152771   22.284500

In [33]: tips.pivot_table(['tip_pct','size'],index=['sex','day'],columns='smoker
    ...: ')
Out[33]:
              tip_pct                size          
smoker             No       Yes        No       Yes
sex    day                                         
Female Fri   0.165296  0.209129  2.500000  2.000000
       Sat   0.147993  0.163817  2.307692  2.200000
       Sun   0.165710  0.237075  3.071429  2.500000
       Thur  0.155971  0.163073  2.480000  2.428571
Male   Fri   0.138005  0.144730  2.000000  2.125000
       Sat   0.162132  0.139067  2.656250  2.629630
       Sun   0.158291  0.173964  2.883721  2.600000
       Thur  0.165706  0.164417  2.500000  2.300000

# 增加分类汇总,margins=True
In [34]: tips.pivot_table(['tip_pct','size'],index=['sex','day'],columns='smoker
   ...: ',margins=True)
Out[34]:
             tip_pct                          size                    
smoker             No       Yes       All        No       Yes       All
sex    day                                                             
Female Fri   0.165296  0.209129  0.199388  2.500000  2.000000  2.111111
      Sat   0.147993  0.163817  0.156470  2.307692  2.200000  2.250000
      Sun   0.165710  0.237075  0.181569  3.071429  2.500000  2.944444
      Thur  0.155971  0.163073  0.157525  2.480000  2.428571  2.468750
Male   Fri   0.138005  0.144730  0.143385  2.000000  2.125000  2.100000
      Sat   0.162132  0.139067  0.151577  2.656250  2.629630  2.644068
      Sun   0.158291  0.173964  0.162344  2.883721  2.600000  2.810345
      Thur  0.165706  0.164417  0.165276  2.500000  2.300000  2.433333
All          0.159328  0.163196  0.160803  2.668874  2.408602  2.569672

# 可以使用aggfunc传入其他聚合函数
In [37]: tips.pivot_table(['tip_pct','size'],index=['sex','smoker'],columns='day',aggfunc=len, margins=True)
Out[37]:
              tip_pct                           size                         
day               Fri   Sat   Sun  Thur    All   Fri   Sat   Sun  Thur    All
sex    smoker                                                                
Female No         2.0  13.0  14.0  25.0   54.0   2.0  13.0  14.0  25.0   54.0
       Yes        7.0  15.0   4.0   7.0   33.0   7.0  15.0   4.0   7.0   33.0
Male   No         2.0  32.0  43.0  20.0   97.0   2.0  32.0  43.0  20.0   97.0
       Yes        8.0  27.0  15.0  10.0   60.0   8.0  27.0  15.0  10.0   60.0
All              19.0  87.0  76.0  62.0  244.0  19.0  87.0  76.0  62.0  244.0

pivot_table参数:

4.2 交叉表

交叉表示一种计算分组频率的特殊透视表,

In [39]: tips[:5]
Out[39]:
   total_bill   tip     sex smoker  day    time  size   tip_pct
0       16.99  1.01  Female     No  Sun  Dinner     2  0.059447
1       10.34  1.66    Male     No  Sun  Dinner     3  0.160542
2       21.01  3.50    Male     No  Sun  Dinner     3  0.166587
3       23.68  3.31    Male     No  Sun  Dinner     2  0.139780
4       24.59  3.61  Female     No  Sun  Dinner     4  0.146808

In [40]:

In [40]: pd.crosstab([tips.time, tips.day],tips.smoker, margins=True)
Out[40]:
smoker        No  Yes  All
time   day                
Dinner Fri     3    9   12
       Sat    45   42   87
       Sun    57   19   76
       Thur    1    0    1
Lunch  Fri     1    6    7
       Thur   44   17   61
All          151   93  244
posted @ 2017-02-17 20:58  felo  阅读(293)  评论(0编辑  收藏  举报