一些Pandas常用方法
Series(列)方法describe(),对于不同类型的变量的列,有不同返回值(http://pandas.pydata.org/pandas-docs/stable/generated/pandas.DataFrame.describe.html)
>>> s = pd.Series([1, 2, 3]) >>> s.describe() count 3.0 mean 2.0 std 1.0 min 1.0 25% 1.5 50% 2.0 75% 2.5 max 3.0
>>> s = pd.Series(['a', 'a', 'b', 'c']) >>> s.describe() count 4 unique 3 top a freq 2 dtype: object
列方法Series.
value_counts
(normalize=False, sort=True, ascending=False, bins=None, dropna=True)
返回各值的频数,如果normalize=True返回各个值的频率
crosstab方法pandas.
crosstab
(index, columns, values=None, rownames=None, colnames=None, aggfunc=None, margins=False, dropna=True, normalize=False)
作用Compute a simple cross-tabulation of two (or more) factors. By default computes a frequency table of the factors unless an array of values and an aggregation function are passed
举例
>>> a array([foo, foo, foo, foo, bar, bar, bar, bar, foo, foo, foo], dtype=object) >>> b array([one, one, one, two, one, one, one, two, two, two, one], dtype=object) >>> c array([dull, dull, shiny, dull, dull, shiny, shiny, dull, shiny, shiny, shiny], dtype=object) >>> crosstab(a, [b, c], rownames=['a'], colnames=['b', 'c']) b one two c dull shiny dull shiny a bar 1 2 1 0 foo 2 2 1 2
>>> foo = pd.Categorical(['a', 'b'], categories=['a', 'b', 'c']) >>> bar = pd.Categorical(['d', 'e'], categories=['d', 'e', 'f']) >>> crosstab(foo, bar) # 'c' and 'f' are not represented in the data, # but they still will be counted in the output col_0 d e f row_0 a 1 0 0 b 0 1 0 c 0 0 0