pandas.factorize()
pandas官网 http://pandas.pydata.org/pandas-docs/stable/generated/pandas.factorize.html
pandas.
factorize
(values, sort=False, order=None, na_sentinel=-1, size_hint=None)
Encode the object as an enumerated type or categorical variable.
作用是将object型变量转换成枚举型或者类别型
Prameters: |
values : sequence
sort : bool, default False
order
na_sentinel : int, default -1
size_hint : int, optional
|
---|---|
Returns: |
labels : ndarray
uniques : ndarray, Index, or Categorical
|
Example
1、 pd.factorize(values)
>>> labels, uniques = pd.factorize(['b', 'b', 'a', 'c', 'b'])
>>> labels
array([0, 0, 1, 2, 0])
>>> uniques
array(['b', 'a', 'c'], dtype=object)
2、 pd.factorize(values, sort = True)
>>> labels, uniques = pd.factorize(['b', 'b', 'a', 'c', 'b'], sort=True)
>>> labels
array([1, 1, 0, 2, 1])
>>> uniques
array(['a', 'b', 'c'], dtype=object)
3、Missing values are indicated in labels with na_sentinel (-1
by default). Note that missing values are never included in uniques.
>>> labels, uniques = pd.factorize(['b', None, 'a', 'c', 'b'])
>>> labels
array([ 0, -1, 1, 2, 0])
>>> uniques
array(['b', 'a', 'c'], dtype=object)
4、Thus far, we’ve only factorized lists (which are internally coerced to NumPy arrays). When factorizing pandas objects, the type of uniques will differ. For Categoricals, a Categorical is returned.
>>> cat = pd.Categorical(['a', 'a', 'c'], categories=['a', 'b', 'c'])
>>> labels, uniques = pd.factorize(cat)
>>> labels
array([0, 0, 1])
>>> uniques
[a, c]
Categories (3, object): [a, b, c]
Notice that 'b'
is in uniques.categories
, desipite not being present in cat.values
.
5、For all other pandas objects, an Index of the appropriate type is returned.
>>> cat = pd.Series(['a', 'a', 'c'])
>>> labels, uniques = pd.factorize(cat)
>>> labels
array([0, 0, 1])
>>> uniques
Index(['a', 'c'], dtype='object')