金融量化学习---Python, MySQL, Pandas

这里用来记录一些在金融领域,尤其是银行相关的资金、债券、票据中应用到的数据管理与分析, 编程等心得或笔记,以及个人的一点小小兴趣(易经八卦、藏密禅修)等

导航

Date运算之二 --- pandas中的日期运算

日期获取

pandas读取excel文件时指定列的格式

import pandas as pd
file_name ='RB30A1.xlsx'
jsh_list =pd.read_excel(file_name, dtype={'交易编号':str,'交易日期':'datetime64'},header =2)

dt.strftime 获取年月日字符型数据

对于datetime64[ns]类型的数据,可以直接用dt.strftime提取;如果是object数据类型,则要先用pd.to_datetime()转换一下,例如:

pd.to_datetime(df.日期).dt.strftime('%Y-%m-%d')
Out[1]: 
20       2020-12-07
21       2020-12-07
22       2020-12-07
23       2020-12-07
24       2020-12-07
...
Name: 转贴日期, Length: 3723, dtype: object

pd.to_datetime(df.日期).dt.strftime('%Y%m')
Out[2]: 
20       202012
21       202012
22       202012
23       202012
24       202012

#以下是手工转的方式:
f = lambda x: str(x)[0:7]   #设置规则,把日期列转为str再提取前8个字符,即年和月
df['YM'] = df.日期.apply(f)  #插入一列

pd.to_datetime(df.日期).dt.strftime('%Y/%m/%d')
Out[3]: 
20       2020/12/07
21       2020/12/07
22       2020/12/07
23       2020/12/07
24       2020/12/07
   

dt还有其它用法,比如,dt.year,dt.month,dt.day,dt.hour,dt.time,dt.quarter,dt.weekday,dt.dayofyear ... 结尾没有()

日期计算

pandas 两列日期(object)相减求期限

#两列日期相减,得到天数的数字
df['tenor'] = ((df['date_maturity'])-df['date_discount']).dt.days  #不用dt.days计算出来的结果是timedelta64[ns]类型,加上dt.days就是数值型int64

#两日期相减,再用astype方法把天数换成年
df['days'] = df['date_maturity'])-df['date_discount']
df['years'] = df['days'].astype('timedelta64[Y]')

#麻烦一点的,先用to_datetime把该列转为日期
df['new_tenor'] = (pd.to_datetime(df['date_maturity'])-pd.to_datetime(df['date_discount'])).dt.days + df['adjust_day']
#一列日期和一个固定日期相减
import time,datetime
date_query= datetime.date.today()
df['tenor'] = ((df['date_maturity'])-date_query).dt.days

日期生成

Pandas时间序列:生成指定范围的日期

pandas.date_range(start=None, end=None, periods=None, freq=None, tz=None, normalize=False, name=None, closed=None, **kwargs)
https://pandas.pydata.org/pandas-docs/stable/reference/api/pandas.date_range.html

import pandas as pd
pd.date_range('11/1/2018','11/9/2018')

从指定日期生成指定天数的日期序列

import pandas as pd
import numpy as np
print pd.date_range(end='11/1/2018',periods=7)

以每月最后一个工作日为周期

import pandas as pd
print pd.date_range('1/1/2018','12/1/2018',freq='BM')

日期格式转换

numpy.datetime64 转为 pandas的日期格式

bond_dates =
array(['2012-09-10T00:00:00.000000000', '2013-01-11T00:00:00.000000000',
       '2013-02-08T00:00:00.000000000', '2013-03-18T00:00:00.000000000',
       '2014-06-01T00:00:00.000000000', '2014-12-07T00:00:00.000000000',
       '2015-04-25T00:00:00.000000000', '2015-12-02T00:00:00.000000000',
       '2016-02-16T00:00:00.000000000', '2016-06-08T00:00:00.000000000',
       '2016-10-19T00:00:00.000000000', '2017-02-15T00:00:00.000000000',
       '2017-07-21T00:00:00.000000000', '2018-01-27T00:00:00.000000000',
       '2018-07-06T00:00:00.000000000', '2018-10-12T00:00:00.000000000',
       '2019-03-07T00:00:00.000000000', '2020-05-12T00:00:00.000000000',
       '2020-06-24T00:00:00.000000000', '2021-01-19T00:00:00.000000000',
       '2021-03-16T00:00:00.000000000', '2021-06-16T00:00:00.000000000',
       '2021-08-18T00:00:00.000000000', '2021-11-17T00:00:00.000000000',
       '2022-02-23T00:00:00.000000000'], dtype='datetime64[ns]')

这种日期太长, 怎么转短一些?

import pandas as pd
pd_dates = pd.to_datetime(bond_dates)
#得到:
pd_dates
Out[1]: 
DatetimeIndex(['2012-09-10', '2013-01-11', '2013-02-08', '2013-03-18',
               '2014-06-01', '2014-12-07', '2015-04-25', '2015-12-02',
               '2016-02-16', '2016-06-08', '2016-10-19', '2017-02-15',
               '2017-07-21', '2018-01-27', '2018-07-06', '2018-10-12',
               '2019-03-07', '2020-05-12', '2020-06-24', '2021-01-19',
               '2021-03-16', '2021-06-16', '2021-08-18', '2021-11-17',
               '2022-02-23'],
              dtype='datetime64[ns]', freq=None)

python datetime 和 Quantlib 里的datetime的转换

import QuantLib as ql
from datetime import datetime,date,timedelta
today = (date.today() + timedelta(days = 0)).strftime("%Y-%m-%d")       #可以通过调节days = 正负天数,来得到自今日起的日期位移
Today = ql.Date(today,'%Y-%m-%d')

datetime.datetime转为datetime.date

from datetime import datetime,date
today = date.today()
#today: datetime.date(2020, 12, 24)

date_valuation = '2020-12-22'    #str
date_value = datetime.strptime(date_valuation,"%Y-%m-%d")      #类型是:datetime.datetime(2020, 12, 22, 0, 0)
date_value = datetime.strptime(date_valuation,"%Y-%m-%d").date()      #type: datetime.date(2020, 12, 22)

强行把一列转化为日期格式(在原dataframe里已改变)

#针对'2020-01-01' 这种str有效, 以下两句等效
dataframe.column_name.astype('datetime64[ns]')  
df.column_name = pd.to_datetime(df.column_name)

把一列长日期转换为短日期:

#bond_sql.deal_time:

0     2018-04-09 14:34:41
1     2018-04-10 14:23:56
2     2018-04-16 09:53:42
3     2018-04-23 14:21:50
4     2018-05-07 13:57:13

from datetime import datetime
bond_sql.deal_time = list(map(lambda t: datetime.date(t),bond_sql.deal_time))

如果直接用import datetime,则上面的datetime.date(t)要写成:datetime.datetime.date(t)

0      2018-04-09
1      2018-04-10
2      2018-04-16
3      2018-04-23
4      2018-05-07

另外一种情况,源表中的8位数字的日期20170110转为 2017-01-10的日期格式 :

data_list.日期
0      20170110
1      20170111
2      20170112
3      20170113
4      20170119
Name: 日期, Length: 5, dtype: object

pd.to_datetime(data_list.日期,format='%Y%m%d')
Out: 
0     2017-01-10
1     2017-01-11
2     2017-01-12
3     2017-01-13
4     2017-01-19
Name: 日期, Length: 5, dtype: datetime64[ns]
#赋值:
data_list.日期 = pd.to_datetime(data_list.日期,format='%Y%m%d')

把一个字符串转化为日期格式

import datetime
deal_date = '2021/01/05'
t1=datetime.datetime.strptime(deal_date,'%Y/%m/%d')
#Out: datetime.datetime(2021, 1, 5, 0, 0)

t1=datetime.datetime.strptime(deal_date,'%Y/%m/%d').date()
#Out: datetime.date(2021, 1, 5)

参考资料:

计算两个日期间隔年月日(带闰年判断)
日期天数差计算(Python)
利用Python中的pandas(date_range)库生成时间序列(time series)
https://www.cnblogs.com/OliverQin/p/12283897.html
获取中国日历
https://www.pypandas.cn/docs/user_guide/timeseries.html#时间戳-vs-时间段
https://www.cnblogs.com/lemonbit/p/6896499.html
https://www.jianshu.com/p/f29dddce3a9a

posted on 2020-11-02 09:32  chengjon  阅读(2567)  评论(0编辑  收藏  举报