pandas的corsstab

 
pandas.crosstab(index, columns, values=None, rownames=None, colnames=None, aggfunc=None, margins=False, dropna=True, normalize=False)

index : array-like, Series, or list of arrays/Series

Values to group by in the rows

columns : array-like, Series, or list of arrays/Series

Values to group by in the columns

values : array-like, optional

Array of values to aggregate according to the factors. Requires aggfunc be specified.

aggfunc : function, optional

If specified, requires values be specified as well

rownames : sequence, default None

If passed, must match number of row arrays passed

colnames : sequence, default None

If passed, must match number of column arrays passed

margins : boolean, default False

Add row/column margins (subtotals)

dropna : boolean, default True

Do not include columns whose entries are all NaN

normalize : boolean, {‘all’, ‘index’, ‘columns’}, or {0,1}, default False

Normalize by dividing all values by the sum of values.

  • If passed ‘all’ or True, will normalize over all values.
  • If passed ‘index’ will normalize over each row.
  • If passed ‘columns’ will normalize over each column.
  • If margins is True, will also normalize margin values.

New in version 0.18.1.

In [1]:
import numpy as np
a = np.array(["foo", "foo", "foo", "foo", "bar", "bar","bar", "bar", "foo", "foo", "foo"], dtype=object)
a
In [2]:
b = np.array(["one", "one", "one", "two", "one", "one", "one", "two", "two", "two", "one"], dtype=object)
b
In [3]: 
pd.crosstab(a,b)
Out[3]:
col_0onetwo
row_0  
bar 3 1
foo 4 3
In [4]:
 pd.crosstab(a, b, rownames=['a'], colnames=['b'])
 
Out[4]:
bonetwo
a  
bar 3 1
foo 4 3
In [5] 
c = np.array(["dull", "dull", "shiny", "dull", "dull", "shiny","shiny", "dull", "shiny", "shiny", "shiny"],
               dtype=object)
c
In [6]:
import pandas as pd 
pd.crosstab(a, [b, c], rownames=['a'], colnames=['b', 'c'])
Out[6]:
bonetwo
cdullshinydullshiny
a    
bar 1 2 1 0
foo 2 2 1 2
In [7]:
foo1 = pd.Categorical(['a', 'b'], categories=['a', 'b', 'c'])
bar1= pd.Categorical(['d', 'e'], categories=['d', 'e', 'f'])
pd.crosstab(foo1, bar1,dropna='true')  
# 'c' and 'f' are not represented in the data,
# and will not be shown in the output because
# dropna is True by default. Set 'dropna=False'
# to preserve categories with no data 
 
Out[7]:
col_0def
row_0   
a 1 0 0
b 0 1 0
c 0 0 0
 
posted @ 2019-07-07 16:15  wqbin  阅读(177)  评论(0编辑  收藏  举报