欢迎来到RankFan的Blogs

扩大
缩小

python 时间 日度 月度 季度

Pandas 如何处理时间数据,很详细的一个文档


df['Time'] = pd.date_range(end='2021-11-10', periods=230, freq='MS').strftime("%Y-%b")

数据转化为月度数据, 日数据转换为周数据、月数据或季度数据

benchmark_.index.to_period('M')
df_chn_usa_link_m = df_chn_usa_link_m.resample('M').mean()  # resample 月度数据,均值, 或者使用last、first
df_chn_usa_link_m.index.strftime('%Y-%m')  # 转化为str

pandas获取月底最后一个交易日对应数据pandas.tseries.offsets

benchmark_ = benchmark.copy()
benchmark_.index = pd.to_datetime(benchmark_.index)
benchmark_[benchmark_.index.day == benchmark_.index.days_in_month] # 只输出每个月最后一天 == 交易日
benchmark_.loc[benchmark_.groupby(benchmark_.index.to_period('M')).apply(lambda x: x.index.min())] # 只输出每个月交易日第一天
benchmark_.loc[benchmark_.groupby(benchmark_.index.to_period('M')).apply(lambda x: x.index.max())] # 只输出每个月最后一个交易日

字典值降序排列

[(k, v) for k, v in sorted(zip(del_var, del_variance), key=lambda x: -x[-1])]

对于公司、时间,去做时间窗的处理:

df.set_index(['cid', 'time'])
df_query = df.loc[df.index.isin(dateindexes, level=1), :].groupby(level=[0]).sum().astype(bool).astype('int8')

生成当前时刻,过去10天的数据:

from datetime import datetime, timedelta

values = range(10)
dates = [datetime.now()-timedelta(days=_) for _ in range(10)]

pandas对于时间序列数据处理的常用方法,可以构造一些细致的特征

# 导入相关库包
import pandas as pd
import numpy as np
import datetime
import time
import random
from calendar import monthrange

# 捏造数据

if __name__ == '__main__':
    df = pd.DataFrame(
          [['零售店01', '2021-10-01', '2021-10-01 11:47:34', '1993-11-03', '深圳', 100],
           ['零售店01', '2021-10-02', '2021-10-02 12:47:34', '1993-11-04', '深圳', 120],
           ['零售店01', '2021-10-03', '2021-10-03 11:47:34', '1993-10-03', '深圳', 140],
           ['零售店01', '2021-10-04', '2021-10-04 08:47:34', '1993-02-03', '深圳', 170],
           ['零售店01', '2021-10-05', '2021-10-05 11:47:34', '1993-02-03', '深圳', 190],
           ['零售店01', '2021-10-06', '2021-10-06 15:47:34', '1993-04-03', '深圳', 10],
           ['零售店01', '2021-10-07', '2021-10-07 17:47:34', '1993-02-03', '深圳', 20],
           ['零售店01', '2021-10-08', '2021-10-08 19:47:34', '1993-06-03', '深圳', 420],
           ['零售店01', '2021-10-09', '2021-10-09 11:47:34', '1993-03-03', '深圳', 230],
           ['零售店01', '2021-10-10', '2021-10-10 20:47:34', '1993-02-20', '深圳', 80]
          ], columns=['店铺名称', '统计日期', '大促开始时间', '店长出生日期', '店铺所在城市', '销量'])

    df.head()

    # 原先属于字符串,转datetime
    df['datetime64'] = pd.to_datetime(df['统计日期'])
    df['year'] = df['datetime64'].dt.year
    df['quarter'] = df['datetime64'].dt.quarter
    df['month'] = df['datetime64'].dt.month
    df['week'] = df['datetime64'].dt.isocalendar().week
    df['day'] = df['datetime64'].dt.day
    df['hour'] = df['datetime64'].dt.hour
    df['minute'] = df['datetime64'].dt.minute
    df['second'] = df['datetime64'].dt.second
    df['weekday'] = df['datetime64'].dt.weekday
    df['dayofyear'] = df['datetime64'].dt.dayofyear
    df['dayofweek'] = df['datetime64'].dt.dayofweek

    # df['weekofyear'] = df['datetime64'].dt.weekofyear

    ################################
    # 0-1 特征
    ################################
    df['is_work_day'] = np.where(df['dayofweek'].isin([5, 6]), 0, 1)  # 是否工作日
    df['is_month_start'] = np.where(df['datetime64'].dt.is_month_start, 1, 0)
    df['is_month_end'] = np.where(df['datetime64'].dt.is_month_end, 1, 0)

    # 特殊日子/公众假日
    special_day = ['2021-10-01', '2021-10-02']
    df['is_special_day'] = np.where(df['统计日期'].isin(special_day), 1, 0)
    # 是否凌晨
    df['is_before_dawn'] = np.where(df['hour'].isin([0, 1, 2, 3]), 1, 0)

    ################################
    # 时间差
    ################################

    # 获取前一天日期
    df['yesterday'] = df['datetime64'] - datetime.timedelta(days=1)
    # 日期差计算(天)
    df['day_dif'] = (df['datetime64'] - df['yesterday']).dt.days
    # 日期差计算(小时)
    df['hour_dif'] = (df['datetime64'] - df['yesterday']).values / np.timedelta64(1, 'h')  # 换成 D 则为 天, D,h

    ################################
    # 衍生特征
    ################################
    df = df.loc[:, ['店铺名称', '统计日期', '销量']]
    df['date'] = pd.to_datetime(df['统计日期'])

    # 时序值特征衍生前记得排序
    df.sort_values(['店铺名称', '统计日期'], ascending=[True, True], inplace=True)

    # 衍生时间滑动窗口统计变量
    f_min = lambda x: x.rolling(window=3, min_periods=1).min()
    f_max = lambda x: x.rolling(window=3, min_periods=1).max()
    f_mean = lambda x: x.rolling(window=3, min_periods=1).mean()
    f_std = lambda x: x.rolling(window=3, min_periods=1).std()
    f_median = lambda x: x.rolling(window=3, min_periods=1).median()
    function_list = [f_min, f_max, f_mean, f_std, f_median]
    function_name = ['min', 'max', 'mean', 'std', 'median']
    for i in range(len(function_list)):
        df[('stat_%s' % function_name[i])] = df.sort_values('统计日期', ascending=True).groupby(['店铺名称'])['销量'].apply(
            function_list[i])

    # 衍生lag变量
    for i in [1, 2, 3]:
        df["lag_{}".format(i)] = df['销量'].shift(i)

想获取交易所交易日的日期可以使用 tushareclick this, 没有注册可以点击 注册

pro.query('trade_cal', start_date='20180101', end_date='20181231')

END

posted on 2021-12-26 15:27  RankFan  阅读(665)  评论(0编辑  收藏  举报

导航