pandas对时间列分组求diff遇到的问题
例子:
df = pd.DataFrame() df['A'] = [1, 1, 2] df['B'] = [datetime.date(2018, 1, 2), datetime.date(2018, 1, 3), datetime.date(2018, 1, 3)] df['C'] = df.groupby('A').B.diff() df['C'] = df.C.dt.days
报错:
Traceback (most recent call last): File "D:\python_virtualenv\common\lib\site-packages\pandas-0.20.3-py3.6-win-amd64.egg\pandas\core\series.py", line 2820, in _make_dt_accessor return maybe_to_datetimelike(self) File "D:\python_virtualenv\common\lib\site-packages\pandas-0.20.3-py3.6-win-amd64.egg\pandas\core\indexes\accessors.py", line 84, in maybe_to_datetimelike "datetimelike index".format(type(data))) TypeError: cannot convert an object of type <class 'pandas.core.series.Series'> to a datetimelike index During handling of the above exception, another exception occurred: Traceback (most recent call last): File "D:/学习/pandas_test/pandas_learn_20190102.py", line 49, in <module> test2() File "D:/学习/pandas_test/pandas_learn_20190102.py", line 32, in test2 df['C'] = df.C.dt.days File "D:\python_virtualenv\common\lib\site-packages\pandas-0.20.3-py3.6-win-amd64.egg\pandas\core\generic.py", line 3077, in __getattr__ return object.__getattribute__(self, name) File "D:\python_virtualenv\common\lib\site-packages\pandas-0.20.3-py3.6-win-amd64.egg\pandas\core\base.py", line 243, in __get__ return self.construct_accessor(instance) File "D:\python_virtualenv\common\lib\site-packages\pandas-0.20.3-py3.6-win-amd64.egg\pandas\core\series.py", line 2822, in _make_dt_accessor raise AttributeError("Can only use .dt accessor with datetimelike " AttributeError: Can only use .dt accessor with datetimelike values
原因:
分组求diff后的结果是:
A B C 0 1 2018-01-02 NaT 1 1 2018-01-03 1 days 00:00:00 2 2 2018-01-03 NaN
类型是:
A int64 B object C object dtype: object
预想的类型是:
A int64 B object C timedelta64[ns] dtype: object
解决:
原本尝试使用astype强制将object列,转成timedelta列
df['C'] = df.C.astype(pd.Timedelta)
这句代码不会报错,但是C列的类型不会改变,没有作用。
最后有两种处理方式:
提前定义B列为时间列:
df = pd.DataFrame() df['A'] = [1, 1, 2] df['B'] = [datetime.date(2018, 1, 2), datetime.date(2018, 1, 3), datetime.date(2018, 1, 3)] df.B = pd.to_datetime(df.B) df['C'] = df.groupby('A').B.diff() df['C'] = df.C.dt.days
增加类型转换:
df = pd.DataFrame() df['A'] = [1, 1, 2] df['B'] = [datetime.date(2018, 1, 2), datetime.date(2018, 1, 3), datetime.date(2018, 1, 3)] df['C'] = df.groupby('A').B.diff() df['C'] = pd.to_timedelta(df.C, unit='d').dt.days