pandas dataframe按时间连续性分块
当时序数据不连续时,需要将连续的数据划分为一块,基于pandas dataframe的方案如下。
>>> df DateAnalyzed Val 1 2018-03-18 0.470253 2 2018-03-19 0.470253 3 2018-03-20 0.470253 4 2017-01-20 0.485949 # < watch out for this 5 2018-09-25 0.467729 6 2018-09-26 0.467729 7 2018-09-27 0.467729 >>> df.dtypes DateAnalyzed datetime64[ns] Val float64 dtype: object >>> dt = df['DateAnalyzed'] >>> day = pd.Timedelta('1d') >>> in_block = ((dt - dt.shift(-1)).abs() == day) | (dt.diff() == day) >>> in_block 1 True 2 True 3 True 4 False 5 True 6 True 7 True Name: DateAnalyzed, dtype: bool
>>> filt = df.loc[in_block] >>> breaks = filt['DateAnalyzed'].diff() != day >>> groups = breaks.cumsum() >>> groups 1 1 2 1 3 1 5 2 6 2 7 2 Name: DateAnalyzed, dtype: int64 >>> for _, frame in filt.groupby(groups): ... print(frame, end='\n\n') ... DateAnalyzed Val 1 2018-03-18 0.470253 2 2018-03-19 0.470253 3 2018-03-20 0.470253 DateAnalyzed Val 5 2018-09-25 0.467729 6 2018-09-26 0.467729 7 2018-09-27 0.467729