Numpy实现多列滑动窗口
最近项目遇到一个问题,计算区间内每天的beta值,每天日期往前推一年作为当天数据时间序列。
beta:是一个金融指标,衡量系统性风险,通过计算组合收益率Y与基准收益率X的回归系数得到
\[Y = \alpha+\beta*X
\]
解决方案为:取开始日期往前推一年到结束日期的数据,以结束日期为起点滑动计算,滑动窗口为365
由于目前pandas.rolling.apply模块只支持简单的单列滑动窗口计算,一开始尝试了for循环按日期切割数据计算,结果效率太低
经过各种学习于是就有了一个numpy版本的滑动窗口功能
以下为简化的实例代码
import pandas as pd
import numpy as np
from numpy.random import randn
from sklearn.linear_model import LinearRegression
from numpy.lib.stride_tricks import as_strided as stride
linreg = LinearRegression()
s = '2000-01-01'
e = '2000-01-20'
date_list = pd.date_range(s,e).tolist()
l = len(date_list)
df = pd.DataFrame()
df['THE_DATE']=date_list
df['X'] = randn(l)
df['Y'] = randn(l)
def roll_np(df: pd.DataFrame, apply_func: callable, window: int, columns: list, **kwargs):
return_col_num = len(columns)
df = df.sort_index( ascending=False)
v = df.values
dim0, dim1 = v.shape
stride0, stride1 = v.strides
stride_values = stride(v, (dim0 - (window - 1), window, dim1), (stride0, stride0, stride1))
result_values = np.full((dim0, return_col_num), np.nan)
for idx, values in enumerate(stride_values, window-1):
res = apply_func(values, **kwargs)
result_values[idx,] = res
res = pd.DataFrame(data=result_values,columns=columns).dropna().sort_values('THE_DATE').reset_index(drop=True)
res['THE_DATE'] = res['THE_DATE'].apply(lambda x:pd.to_datetime(x))
return res
def rolling_beta(df):
res = roll_np(df,cal_beta,6,['THE_DATE','beta','alpha'])
return res
def cal_beta(narr):
_date = narr[0,0]
X = np.array(narr[:,1]).reshape(-1,1)
Y = np.array(narr[:,2]).reshape(-1,1)
linreg.fit(X,Y)
beta = linreg.coef_[0,0]
alpha = linreg.intercept_
return np.array([_date.to_datetime64(),beta,alpha])
if __name__ == '__main__':
res = rolling_beta(df)
print(res)