pandas的corsstab
pandas.crosstab(index, columns, values=None, rownames=None, colnames=None, aggfunc=None, margins=False, dropna=True, normalize=False)
index : array-like, Series, or list of arrays/Series
Values to group by in the rows
columns : array-like, Series, or list of arrays/Series
Values to group by in the columns
values : array-like, optional
Array of values to aggregate according to the factors. Requires aggfunc be specified.
aggfunc : function, optional
If specified, requires values be specified as well
rownames : sequence, default None
If passed, must match number of row arrays passed
colnames : sequence, default None
If passed, must match number of column arrays passed
margins : boolean, default False
Add row/column margins (subtotals)
dropna : boolean, default True
Do not include columns whose entries are all NaN
normalize : boolean, {‘all’, ‘index’, ‘columns’}, or {0,1}, default False
Normalize by dividing all values by the sum of values.
- If passed ‘all’ or True, will normalize over all values.
- If passed ‘index’ will normalize over each row.
- If passed ‘columns’ will normalize over each column.
- If margins is True, will also normalize margin values.
New in version 0.18.1.
import numpy as np a = np.array(["foo", "foo", "foo", "foo", "bar", "bar","bar", "bar", "foo", "foo", "foo"], dtype=object) a
b = np.array(["one", "one", "one", "two", "one", "one", "one", "two", "two", "two", "one"], dtype=object) b
pd.crosstab(a, b, rownames=['a'], colnames=['b'])
c = np.array(["dull", "dull", "shiny", "dull", "dull", "shiny","shiny", "dull", "shiny", "shiny", "shiny"], dtype=object) c
import pandas as pd pd.crosstab(a, [b, c], rownames=['a'], colnames=['b', 'c'])
foo1 = pd.Categorical(['a', 'b'], categories=['a', 'b', 'c']) bar1= pd.Categorical(['d', 'e'], categories=['d', 'e', 'f']) pd.crosstab(foo1, bar1,dropna='true') # 'c' and 'f' are not represented in the data, # and will not be shown in the output because # dropna is True by default. Set 'dropna=False' # to preserve categories with no data