pandas.DataFrame.reindex的使用介绍

参考链接:https://pandas.pydata.org/pandas-docs/stable/reference/api/pandas.DataFrame.reindex.html#pandas.DataFrame.reindex

DataFrame.reindex(labels=Noneindex=Nonecolumns=Noneaxis=Nonemethod=Nonecopy=Truelevel=Nonefill_value=nanlimit=Nonetolerance=None)[source]

Conform Series/DataFrame to new index with optional filling logic.

Places NA/NaN in locations having no value in the previous index. A new object is produced unless the new index is equivalent to the current one and copy=False.

Parameters
keywords for axesarray-like, optional

New labels / index to conform to, should be specified using keywords. Preferably an Index object to avoid duplicating data.

method{None, ‘backfill’/’bfill’, ‘pad’/’ffill’, ‘nearest’}

Method to use for filling holes in reindexed DataFrame. Please note: this is only applicable to DataFrames/Series with a monotonically increasing/decreasing index.

  • None (default): don’t fill gaps

  • pad / ffill: Propagate last valid observation forward to next valid.

  • backfill / bfill: Use next valid observation to fill gap.

  • nearest: Use nearest valid observations to fill gap.

copybool, default True

Return a new object, even if the passed indexes are the same.

levelint or name

Broadcast across a level, matching Index values on the passed MultiIndex level.

fill_valuescalar, default np.NaN

Value to use for missing values. Defaults to NaN, but can be any “compatible” value.

limitint, default None

Maximum number of consecutive elements to forward or backward fill.

toleranceoptional

Maximum distance between original and new labels for inexact matches. The values of the index at the matching locations most satisfy the equation abs(index[indexer] target) <= tolerance.

Tolerance may be a scalar value, which applies the same tolerance to all values, or list-like, which applies variable tolerance per element. List-like includes list, tuple, array, Series, and must be the same size as the index and its dtype must exactly match the index’s type.

 

DataFrame.reindex supports two calling conventions

  • (index=index_labels, columns=column_labels, ...)

  • (labels, axis={'index', 'columns'}, ...)

We highly recommend using keyword arguments to clarify your intent.

 

通过查寻了解,这个主要是外部定义一个索引,返回一个新的df对象,对于新的索引的缺省项,可以设置一些默认值。

可以通过两种方式传参,推荐使用第一种。

参数col_level在我调试的版本中已经改为level

书中示例代码,该方法主要用于重设index,并且为新的index中的内容添加默认值。

In [123]: index = ['Firefox', 'Chrome', 'Safari', 'IE10', 'Konqueror'] 
     ...: df = pd.DataFrame({'http_status': [200, 200, 404, 404, 301], 
     ...:                   'response_time': [0.04, 0.02, 0.07, 0.08, 1.0]}, 
     ...:                   index=index)                                                                    

In [124]: df                                                                                                
Out[124]: 
           http_status  response_time
Firefox            200           0.04
Chrome             200           0.02
Safari             404           0.07
IE10               404           0.08
Konqueror          301           1.00

In [125]:     

  定义了一个df对象,定义了一个index

后面将定义一个新的index对象,另外使用默认参数

In [130]: new_index = ['Safari', 'Iceweasel', 'Comodo Dragon', 'IE10', 
     ...:              'Chrome']                                                                            

In [131]: df                                                                                                
Out[131]: 
           http_status  response_time
Firefox            200           0.04
Chrome             200           0.02
Safari             404           0.07
IE10               404           0.08
Konqueror          301           1.00

In [132]: df.reindex(index=new_index)                                                                       
Out[132]: 
               http_status  response_time
Safari               404.0           0.07
Iceweasel              NaN            NaN
Comodo Dragon          NaN            NaN
IE10                 404.0           0.08
Chrome               200.0           0.02

  生成了一个新的df对象,添加的index

我们也可以通过fill_value的选项来设置默认值

In [133]: df.reindex(index=new_index, fill_value='missing')                                                 
Out[133]: 
              http_status response_time
Safari                404          0.07
Iceweasel         missing       missing
Comodo Dragon     missing       missing
IE10                  404          0.08
Chrome                200          0.02

  也可以通过下面两种方式重设列的索引。

In [134]: df.reindex(columns=['http_status', 'user_agent'])                                                 
Out[134]: 
           http_status  user_agent
Firefox            200         NaN
Chrome             200         NaN
Safari             404         NaN
IE10               404         NaN
Konqueror          301         NaN

In [135]: df.reindex(['http_status', 'user_agent'], axis="columns")                                         
Out[135]: 
           http_status  user_agent
Firefox            200         NaN
Chrome             200         NaN
Safari             404         NaN
IE10               404         NaN
Konqueror          301         NaN

  为了进一步说明reindex的使用中,针对的有序索引,使用metho的参数,填写默认值。

首先创建一个时间索引的df对象

In [137]: date_index = pd.date_range('1/1/2010', periods=6, freq='D') 
     ...: df2 = pd.DataFrame({"prices": [100, 101, np.nan, 100, 89, 88]}, 
     ...:                    index=date_index) 
     ...:                                                                                                   

In [138]: df2                                                                                               
Out[138]: 
            prices
2010-01-01   100.0
2010-01-02   101.0
2010-01-03     NaN
2010-01-04   100.0
2010-01-05    89.0
2010-01-06    88.0

  然后通过reindex替换成一个时间周期更长的,并使用method参数。

In [139]: date_index2 = pd.date_range('12/29/2009', periods=10, freq='D')                                   

In [140]: df2.reindex(index=date_index2)                                                                    
Out[140]: 
            prices
2009-12-29     NaN
2009-12-30     NaN
2009-12-31     NaN
2010-01-01   100.0
2010-01-02   101.0
2010-01-03     NaN
2010-01-04   100.0
2010-01-05    89.0
2010-01-06    88.0
2010-01-07     NaN

In [141]: df2.reindex(index=date_index2, method='bfill')                                                    
Out[141]: 
            prices
2009-12-29   100.0
2009-12-30   100.0
2009-12-31   100.0
2010-01-01   100.0
2010-01-02   101.0
2010-01-03     NaN
2010-01-04   100.0
2010-01-05    89.0
2010-01-06    88.0
2010-01-07     NaN

In [142]:         

  从输出可以看出,默认的还是NAN参数,使用了后面数据为默认数据,新的索引已经添加了数据,但老的索引内的数据并没有修改。

如果需要更改,使用fillna的方法。

 

posted @ 2021-02-03 15:34  就是想学习  阅读(555)  评论(0编辑  收藏  举报