np.where与pd.Series.where,pd.DataFrame.where的用法及区别

np.where与pd.Series.where及pd.DataFrame用法不一样,下面一一进行学习,总结:

import numpy as np
import pandas as pd
help(np.where)
Help on built-in function where in module numpy.core.multiarray:

where(...)
    where(condition, [x, y])
    
    Return elements, either from `x` or `y`, depending on `condition`.
    
    If only `condition` is given, return ``condition.nonzero()``.
    
    Parameters
    ----------
    condition : array_like, bool
        When True, yield `x`, otherwise yield `y`.
    x, y : array_like, optional
        Values from which to choose. `x`, `y` and `condition` need to be
        broadcastable to some shape.
    
    Returns
    -------
    out : ndarray or tuple of ndarrays
        If both `x` and `y` are specified, the output array contains
        elements of `x` where `condition` is True, and elements from
        `y` elsewhere.
    
        If only `condition` is given, return the tuple
        ``condition.nonzero()``, the indices where `condition` is True.
    
    See Also
    --------
    nonzero, choose
    
    Notes
    -----
    If `x` and `y` are given and input arrays are 1-D, `where` is
    equivalent to::
    
        [xv if c else yv for (c,xv,yv) in zip(condition,x,y)]
    
    Examples
    --------
    >>> np.where([[True, False], [True, True]],
    ...          [[1, 2], [3, 4]],
    ...          [[9, 8], [7, 6]])
    array([[1, 8],
           [3, 4]])
    
    >>> np.where([[0, 1], [1, 0]])
    (array([0, 1]), array([1, 0]))
    
    >>> x = np.arange(9.).reshape(3, 3)
    >>> np.where( x > 5 )
    (array([2, 2, 2]), array([0, 1, 2]))
    >>> x[np.where( x > 3.0 )]               # Note: result is 1D.
    array([ 4.,  5.,  6.,  7.,  8.])
    >>> np.where(x < 5, x, -1)               # Note: broadcasting.
    array([[ 0.,  1.,  2.],
           [ 3.,  4., -1.],
           [-1., -1., -1.]])
    
    Find the indices of elements of `x` that are in `goodvalues`.
    
    >>> goodvalues = [3, 4, 7]
    >>> ix = np.isin(x, goodvalues)
    >>> ix
    array([[False, False, False],
           [ True,  True, False],
           [False,  True, False]])
    >>> np.where(ix)
    (array([1, 1, 2]), array([0, 1, 1]))

  • np.where用法

从上面帮助信息可以看到:np.where的参数有condition,可选参数x,y。
而有无可选参数以及可选参数x,y的维数将直接影响np.where的返回结果:如果没有可选参数x,y则相当于np.nonzero,返回condition数组的True或者非0的包含索引列表对的元组;如果有x,y则输出的数组形状首先与condition,x,y的一致(如果不一致,则广播为一致)根据condition的值来从x,y中挑选值。

(1)无可选参数,x,y

a=np.random.randint(0,high=2,size=(3,3));a
array([[0, 1, 1],
       [1, 1, 0],
       [1, 1, 0]])
np.where(a)
(array([0, 0, 1, 1, 2, 2], dtype=int64),
 array([1, 2, 0, 1, 0, 1], dtype=int64))

(2)有x,y,输出结果的形状是condition,x,y的广播后的数组的形状,然后根据condition从x,y中挑选。

cond=np.array([True,False])
x=np.arange(6).reshape(3,2);x
array([[0, 1],
       [2, 3],
       [4, 5]])
y=np.array([[100,200]])
cond.shape
(2,)
x.shape
(3, 2)
y.shape
(1, 2)

所以广播后的形状应该是(3,2)

result=np.where(cond,x,y);result
array([[  0, 200],
       [  2, 200],
       [  4, 200]])
result.shape
(3, 2)
  • pandas中的where
help(pd.DataFrame.where)
Help on function where in module pandas.core.generic:

where(self, cond, other=nan, inplace=False, axis=None, level=None, errors='raise', try_cast=False, raise_on_error=None)
    Return an object of same shape as self and whose corresponding
    entries are from self where `cond` is True and otherwise are from
    `other`.
    
    Parameters
    ----------
    cond : boolean NDFrame, array-like, or callable
        Where `cond` is True, keep the original value. Where
        False, replace with corresponding value from `other`.
        If `cond` is callable, it is computed on the NDFrame and
        should return boolean NDFrame or array. The callable must
        not change input NDFrame (though pandas doesn't check it).
    
        .. versionadded:: 0.18.1
            A callable can be used as cond.
    
    other : scalar, NDFrame, or callable
        Entries where `cond` is False are replaced with
        corresponding value from `other`.
        If other is callable, it is computed on the NDFrame and
        should return scalar or NDFrame. The callable must not
        change input NDFrame (though pandas doesn't check it).
    
        .. versionadded:: 0.18.1
            A callable can be used as other.
    
    inplace : boolean, default False
        Whether to perform the operation in place on the data
    axis : alignment axis if needed, default None
    level : alignment level if needed, default None
    errors : str, {'raise', 'ignore'}, default 'raise'
        - ``raise`` : allow exceptions to be raised
        - ``ignore`` : suppress exceptions. On error return original object
    
        Note that currently this parameter won't affect
        the results and will always coerce to a suitable dtype.
    
    try_cast : boolean, default False
        try to cast the result back to the input type (if possible),
    raise_on_error : boolean, default True
        Whether to raise on invalid data types (e.g. trying to where on
        strings)
    
        .. deprecated:: 0.21.0
    
    Returns
    -------
    wh : same type as caller
    
    Notes
    -----
    The where method is an application of the if-then idiom. For each
    element in the calling DataFrame, if ``cond`` is ``True`` the
    element is used; otherwise the corresponding element from the DataFrame
    ``other`` is used.
    
    The signature for :func:`DataFrame.where` differs from
    :func:`numpy.where`. Roughly ``df1.where(m, df2)`` is equivalent to
    ``np.where(m, df1, df2)``.
    
    For further details and examples see the ``where`` documentation in
    :ref:`indexing <indexing.where_mask>`.
    
    Examples
    --------
    >>> s = pd.Series(range(5))
    >>> s.where(s > 0)
    0    NaN
    1    1.0
    2    2.0
    3    3.0
    4    4.0
    
    >>> s.mask(s > 0)
    0    0.0
    1    NaN
    2    NaN
    3    NaN
    4    NaN
    
    >>> s.where(s > 1, 10)
    0    10.0
    1    10.0
    2    2.0
    3    3.0
    4    4.0
    
    >>> df = pd.DataFrame(np.arange(10).reshape(-1, 2), columns=['A', 'B'])
    >>> m = df % 3 == 0
    >>> df.where(m, -df)
       A  B
    0  0 -1
    1 -2  3
    2 -4 -5
    3  6 -7
    4 -8  9
    >>> df.where(m, -df) == np.where(m, df, -df)
          A     B
    0  True  True
    1  True  True
    2  True  True
    3  True  True
    4  True  True
    >>> df.where(m, -df) == df.mask(~m, -df)
          A     B
    0  True  True
    1  True  True
    2  True  True
    3  True  True
    4  True  True
    
    See Also
    --------
    :func:`DataFrame.mask`

从上面帮助信息可以看到:DataFrame和Series的where函数遵循的是if-then模式,即调用者(DataFrame,或者Series)中的元素对于在condition中为True的保留,为False的,用other填充(默认为nan),inplace默认为False,即返回一个与调用者形状一样的DataFrame或者Series,如果为True,则原地修改.其与mask方法正好相反.

  • np.where与DataFrame或Series的where方法的区别:

(1)numpy中是模块级别的函数,numpy模块下ndarray对象并没有where方法;而pandas没有模块级别where方法,只能通过DataFrame,Series对象来调用

(2)np.where中condition可以是数组,布尔值,而pandas的DataFrame及Series的condition不仅可以是数组,布尔值,还可以是函数句柄;
(3)前者有对于condition为True的选择集合x,而后者遵循的是if-then模式,仅对condition为False情况给出其选择集合
(4)前者返回值的形状与condition,x,y有关,是三者广播后数组的形状;而后者返回值与调用者保持一致
(5)后者有inplace参数,可以决定是返回一个新的对象还是对调用者原地修改;而前者本身就是要重组一个数组,所以没有inplace这个参数.

posted @ 2021-02-27 18:27  JohnYang819  阅读(1948)  评论(0编辑  收藏  举报