pandas对时间列分组求diff遇到的问题

例子:

df = pd.DataFrame()
df['A'] = [1, 1, 2]
df['B'] = [datetime.date(2018, 1, 2), datetime.date(2018, 1, 3), datetime.date(2018, 1, 3)]
df['C'] = df.groupby('A').B.diff()
df['C'] = df.C.dt.days

 

报错:

Traceback (most recent call last):
  File "D:\python_virtualenv\common\lib\site-packages\pandas-0.20.3-py3.6-win-amd64.egg\pandas\core\series.py", line 2820, in _make_dt_accessor
    return maybe_to_datetimelike(self)
  File "D:\python_virtualenv\common\lib\site-packages\pandas-0.20.3-py3.6-win-amd64.egg\pandas\core\indexes\accessors.py", line 84, in maybe_to_datetimelike
    "datetimelike index".format(type(data)))
TypeError: cannot convert an object of type <class 'pandas.core.series.Series'> to a datetimelike index

During handling of the above exception, another exception occurred:

Traceback (most recent call last):
  File "D:/学习/pandas_test/pandas_learn_20190102.py", line 49, in <module>
    test2()
  File "D:/学习/pandas_test/pandas_learn_20190102.py", line 32, in test2
    df['C'] = df.C.dt.days
  File "D:\python_virtualenv\common\lib\site-packages\pandas-0.20.3-py3.6-win-amd64.egg\pandas\core\generic.py", line 3077, in __getattr__
    return object.__getattribute__(self, name)
  File "D:\python_virtualenv\common\lib\site-packages\pandas-0.20.3-py3.6-win-amd64.egg\pandas\core\base.py", line 243, in __get__
    return self.construct_accessor(instance)
  File "D:\python_virtualenv\common\lib\site-packages\pandas-0.20.3-py3.6-win-amd64.egg\pandas\core\series.py", line 2822, in _make_dt_accessor
    raise AttributeError("Can only use .dt accessor with datetimelike "
AttributeError: Can only use .dt accessor with datetimelike values

 

原因:
分组求diff后的结果是:

A B C
0 1 2018-01-02 NaT
1 1 2018-01-03 1 days 00:00:00
2 2 2018-01-03 NaN

类型是:

A int64
B object
C object
dtype: object

预想的类型是:

A int64
B object
C timedelta64[ns]
dtype: object

解决:
原本尝试使用astype强制将object列,转成timedelta列

df['C'] = df.C.astype(pd.Timedelta)

这句代码不会报错,但是C列的类型不会改变,没有作用。

最后有两种处理方式:
提前定义B列为时间列:

df = pd.DataFrame()
df['A'] = [1, 1, 2]
df['B'] = [datetime.date(2018, 1, 2), datetime.date(2018, 1, 3), datetime.date(2018, 1, 3)]
df.B = pd.to_datetime(df.B)
df['C'] = df.groupby('A').B.diff()
df['C'] = df.C.dt.days

增加类型转换:

df = pd.DataFrame()
df['A'] = [1, 1, 2]
df['B'] = [datetime.date(2018, 1, 2), datetime.date(2018, 1, 3), datetime.date(2018, 1, 3)]
df['C'] = df.groupby('A').B.diff()
df['C'] = pd.to_timedelta(df.C, unit='d').dt.days
posted @ 2019-01-02 11:42  桑胡  阅读(3188)  评论(0编辑  收藏  举报