pandas函数之pivot、pivot_table和crosstab

1、pivot方法

  pandas.pivot(data, index=None, columns=None, values=None)

  (1)参数

    data:DataFrame

    index:str or object or a list of str, optional。用于创建新DataFrame索引名称。 如果没有,则使用现有的索引。

    columns:str or object or a list of str。用于创建新DataFrame列名称。

    values:str, object or a list of the previous, optional。

  (2)举例    

 

     df = pd.DataFrame({'foo': ['one', 'one', 'one', 'two', 'two','two'], 
               'bar': ['A', 'B', 'C', 'A', 'B', 'C'],
               'baz': [1, 2, 3, 4, 5, 6],
               'zoo': ['x', 'y', 'z', 'q', 'w', 't']})
         foo   bar  baz  zoo
      0   one   A    1    x
      1   one   B    2    y
      2   one   C    3    z
      3   two   A    4    q
      4   two   B    5    w
      5   two   C    6    t
     pivot_df = pd.pivot(df,columns='bar')
           foo            baz            zoo          
      bar    A    B    C    A    B    C    A    B    C
      0    one  NaN  NaN  1.0  NaN  NaN    x  NaN  NaN
      1    NaN  one  NaN  NaN  2.0  NaN  NaN    y  NaN
      2    NaN  NaN  one  NaN  NaN  3.0  NaN  NaN    z
      3    two  NaN  NaN  4.0  NaN  NaN    q  NaN  NaN
      4    NaN  two  NaN  NaN  5.0  NaN  NaN    w  NaN
      5    NaN  NaN  two  NaN  NaN  6.0  NaN  NaN    t


    df.pivot(index='foo', columns='bar', values='baz')       bar A B C       foo       one 1 2 3       two 4 5
    df.pivot(index='foo', columns='bar')['baz']
      bar  A   B   C
      foo
      one  1   2   3
      two  4   5   6

    df.pivot(index='foo', columns='bar', values=['baz', 'zoo'])       baz zoo       bar A B C A B C       foo       one 1 2 3 x y z       two 4 5 6 q w t

    索引和列都是列表形式
    df = pd.DataFrame({"lev1": [1, 1, 1, 2, 2, 2],
                       "lev2": [1, 1, 2, 1, 1, 2],
                       "lev3": [1, 2, 1, 2, 1, 2],
                   "lev4": [1, 2, 3, 4, 5, 6],
                "values": [0, 1, 2, 3, 4, 5]})

           lev1 lev2 lev3 lev4 values
         0   1    1    1    1    0
         1   1    1    2    2    1
         2   1    2    1    3    2
         3   2    1    2    4    3
         4   2    1    1    5    4
         5   2    2    2    6    5


    df.pivot(index="lev1", columns=["lev2", "lev3"],values="values")
      lev2    1         2
      lev3    1    2    1    2
      lev1
      1     0.0  1.0  2.0  NaN
      2     4.0  3.0  NaN  5.0
    df.pivot(index=["lev1", "lev2"], columns=["lev3"],values="values")
            lev3    1    2
      lev1  lev2
         1     1  0.0  1.0
               2  2.0  NaN
         2     1  4.0  3.0
               2  NaN  5.0
    

 

2、pivot_table方法(透视表)

    pandas.pivot_table(data, values=None, index=None, columns=None, aggfunc='mean',fill_value=None, margins=False, 

              dropna=True, margins_name='All', observed=False,sort=True)
     (1)参数

      data:DataFrame

      values:列聚合值。

      index:column, Grouper, array, or list of the previous。

      columns:column, Grouper, array, or list of the previous。

      aggfunc:function, list of functions, dict, default numpy.mean。聚合方式

      fill_value:scalar, default None。Value to replace missing values with (in the resulting pivot table, after aggregation).(缺失值填充)

      margins:bool, default False。Add all row / columns

      dropna:bool, default True。Do not include columns whose entries are all NaN.

      margins_name:str, default ‘All’。Name of the row / column that will contain the totals when margins is True.

      observed:bool, default False

   (2)举例

      df = pd.DataFrame({"A": ["foo", "foo", "foo", "foo", "foo",

                 "bar", "bar", "bar", "bar"],

                 "B": ["one", "one", "one", "two", "two","one", "one", "two", "two"],

                 "C": ["small", "large", "large", "small","small", "large", "small", "small","large"],

                 "D": [1, 2, 2, 3, 3, 4, 5, 6, 7],

                 "E": [2, 4, 5, 5, 6, 6, 8, 9, 9]})

 

             A    B      C  D  E
        0  foo  one  small  1  2
        1  foo  one  large  2  4
        2  foo  one  large  2  5
        3  foo  two  small  3  5
        4  foo  two  small  3  6
        5  bar  one  large  4  6
        6  bar  one  small  5  8
        7  bar  two  small  6  9
        8  bar  two  large  7  9
    
      table = pd.pivot_table(df, values='D', index=['A', 'B'],columns=['C'], aggfunc=np.sum)

        C large small         A B         bar one 4.0 5.0         two 7.0 6.0         foo one 4.0 1.0         two NaN 6.0 

     table = pd.pivot_table(df, values='D', index=['A', 'B'],columns=['C'], aggfunc=np.sum, fill_value=0)
          C large small           A B           bar one 4 5           two 7 6           foo one 4 1           two 0 6
      
      table = pd.pivot_table(df, values=['D', 'E'], index=['A', 'C'],aggfunc={'D': np.mean,'E': np.mean})
       D E       A C       bar large 5.500000 7.500000       small 5.500000 8.500000       foo large 2.000000 4.500000       small 2.333333 4.333333

     table = pd.pivot_table(df, values=['D', 'E'], index=['A', 'C'],aggfunc={'D': np.mean,'E': [min, max, np.mean]})
       D E        mean max mean min       A C       bar large 5.500000 9 7.500000 6       small 5.500000 9 8.500000 8       foo large 2.000000 5 4.500000 4       small 2.333333 6 4.333333 2
3、crosstab方法(交叉表):默认计算分组结果的出现次数,如果需要计算其他分组信息(如分组内的均值、方差等),需要制定具体的values和aggfunc,aggfunc作用于values制定的属性上
    pandas.crosstab(index, columns, values=None, rownames=None, colnames=None, aggfunc=None,
              margins=False, margins_name='All', dropna=True, normalize=False)
    (1)参数
    index:
array-like, Series, or list of arrays/Series。分组的行信息
     column:
array-like, Series, or list of arrays/Series。分组的列信息
     values:
array-like, optional。要聚合的值。
     rownames:
sequence, default None。行名称。
     colnames:
sequence, default None。列名称。
     aggfunc:
function, optional。聚合函数,必须有values参数。
     margins:
bool, default False。Add row/column margins (subtotals).
    
margins_name:str, default ‘All’。Name of the row/column that will contain the totals when margins is True.
     dropna:
bool, default True。Do not include columns whose entries are all NaN.
    
normalize:bool, {‘all’, ‘index’, ‘columns’}, or {0,1}, default False。标准化统计各行列的百分比。如果传all或者True,将输出当前值对所有值得百分比(分母是所有行列的值)。如          果传“index”则对行进行百分比统计。如果传“columns”则对列进行百分比统计。如果margins为True,将会把All的值也计算在内进行百分比统计。
    (2)举例
      
a = np.array(["foo", "foo", "foo", "foo", "bar", "bar","bar", "bar", "foo", "foo", "foo"], dtype=object)
        b = np.array(["one", "one", "one", "two", "one", "one","one", "two", "two", "two", "one"], dtype=object)
        c = np.array(["dull", "dull", "shiny", "dull", "dull", "shiny","shiny", "dull", "shiny", "shiny", "shiny"],dtype=object)         pd.crosstab(a, [b, c], rownames=['a'], colnames=['b', 'c'])           b one two           c dull shiny dull shiny           a             bar 1 2 1 0           foo 2 2 1 2

        foo = pd.Categorical(['a', 'b'], categories=['a', 'b', 'c'])         bar = pd.Categorical(['d', 'e'], categories=['d', 'e', 'f'])         pd.crosstab(foo, bar)           col_0 d e           row_0           a 1 0           b 0 1
        pd.crosstab(foo, bar, dropna=False)           col_0 d e f           row_0           a 1 0 0           b 0 1 0           c 0 0 0
posted @ 2022-03-16 17:27  狗尾巴草oo  阅读(603)  评论(0编辑  收藏  举报