Date运算之二 --- pandas中的日期运算
日期获取
pandas读取excel文件时指定列的格式
import pandas as pd
file_name ='RB30A1.xlsx'
jsh_list =pd.read_excel(file_name, dtype={'交易编号':str,'交易日期':'datetime64'},header =2)
dt.strftime 获取年月日字符型数据
对于datetime64[ns]类型的数据,可以直接用dt.strftime提取;如果是object数据类型,则要先用pd.to_datetime()转换一下,例如:
pd.to_datetime(df.日期).dt.strftime('%Y-%m-%d')
Out[1]:
20 2020-12-07
21 2020-12-07
22 2020-12-07
23 2020-12-07
24 2020-12-07
...
Name: 转贴日期, Length: 3723, dtype: object
pd.to_datetime(df.日期).dt.strftime('%Y%m')
Out[2]:
20 202012
21 202012
22 202012
23 202012
24 202012
#以下是手工转的方式:
f = lambda x: str(x)[0:7] #设置规则,把日期列转为str再提取前8个字符,即年和月
df['YM'] = df.日期.apply(f) #插入一列
pd.to_datetime(df.日期).dt.strftime('%Y/%m/%d')
Out[3]:
20 2020/12/07
21 2020/12/07
22 2020/12/07
23 2020/12/07
24 2020/12/07
dt还有其它用法,比如,dt.year,dt.month,dt.day,dt.hour,dt.time,dt.quarter,dt.weekday,dt.dayofyear ... 结尾没有()
日期计算
pandas 两列日期(object)相减求期限
#两列日期相减,得到天数的数字
df['tenor'] = ((df['date_maturity'])-df['date_discount']).dt.days #不用dt.days计算出来的结果是timedelta64[ns]类型,加上dt.days就是数值型int64
#两日期相减,再用astype方法把天数换成年
df['days'] = df['date_maturity'])-df['date_discount']
df['years'] = df['days'].astype('timedelta64[Y]')
#麻烦一点的,先用to_datetime把该列转为日期
df['new_tenor'] = (pd.to_datetime(df['date_maturity'])-pd.to_datetime(df['date_discount'])).dt.days + df['adjust_day']
#一列日期和一个固定日期相减
import time,datetime
date_query= datetime.date.today()
df['tenor'] = ((df['date_maturity'])-date_query).dt.days
日期生成
Pandas时间序列:生成指定范围的日期
pandas.date_range(start=None, end=None, periods=None, freq=None, tz=None, normalize=False, name=None, closed=None, **kwargs)
https://pandas.pydata.org/pandas-docs/stable/reference/api/pandas.date_range.html
import pandas as pd
pd.date_range('11/1/2018','11/9/2018')
从指定日期生成指定天数的日期序列
import pandas as pd
import numpy as np
print pd.date_range(end='11/1/2018',periods=7)
以每月最后一个工作日为周期
import pandas as pd
print pd.date_range('1/1/2018','12/1/2018',freq='BM')
日期格式转换
numpy.datetime64 转为 pandas的日期格式
bond_dates =
array(['2012-09-10T00:00:00.000000000', '2013-01-11T00:00:00.000000000',
'2013-02-08T00:00:00.000000000', '2013-03-18T00:00:00.000000000',
'2014-06-01T00:00:00.000000000', '2014-12-07T00:00:00.000000000',
'2015-04-25T00:00:00.000000000', '2015-12-02T00:00:00.000000000',
'2016-02-16T00:00:00.000000000', '2016-06-08T00:00:00.000000000',
'2016-10-19T00:00:00.000000000', '2017-02-15T00:00:00.000000000',
'2017-07-21T00:00:00.000000000', '2018-01-27T00:00:00.000000000',
'2018-07-06T00:00:00.000000000', '2018-10-12T00:00:00.000000000',
'2019-03-07T00:00:00.000000000', '2020-05-12T00:00:00.000000000',
'2020-06-24T00:00:00.000000000', '2021-01-19T00:00:00.000000000',
'2021-03-16T00:00:00.000000000', '2021-06-16T00:00:00.000000000',
'2021-08-18T00:00:00.000000000', '2021-11-17T00:00:00.000000000',
'2022-02-23T00:00:00.000000000'], dtype='datetime64[ns]')
这种日期太长, 怎么转短一些?
import pandas as pd
pd_dates = pd.to_datetime(bond_dates)
#得到:
pd_dates
Out[1]:
DatetimeIndex(['2012-09-10', '2013-01-11', '2013-02-08', '2013-03-18',
'2014-06-01', '2014-12-07', '2015-04-25', '2015-12-02',
'2016-02-16', '2016-06-08', '2016-10-19', '2017-02-15',
'2017-07-21', '2018-01-27', '2018-07-06', '2018-10-12',
'2019-03-07', '2020-05-12', '2020-06-24', '2021-01-19',
'2021-03-16', '2021-06-16', '2021-08-18', '2021-11-17',
'2022-02-23'],
dtype='datetime64[ns]', freq=None)
python datetime 和 Quantlib 里的datetime的转换
import QuantLib as ql
from datetime import datetime,date,timedelta
today = (date.today() + timedelta(days = 0)).strftime("%Y-%m-%d") #可以通过调节days = 正负天数,来得到自今日起的日期位移
Today = ql.Date(today,'%Y-%m-%d')
datetime.datetime转为datetime.date
from datetime import datetime,date
today = date.today()
#today: datetime.date(2020, 12, 24)
date_valuation = '2020-12-22' #str
date_value = datetime.strptime(date_valuation,"%Y-%m-%d") #类型是:datetime.datetime(2020, 12, 22, 0, 0)
date_value = datetime.strptime(date_valuation,"%Y-%m-%d").date() #type: datetime.date(2020, 12, 22)
强行把一列转化为日期格式(在原dataframe里已改变)
#针对'2020-01-01' 这种str有效, 以下两句等效
dataframe.column_name.astype('datetime64[ns]')
df.column_name = pd.to_datetime(df.column_name)
把一列长日期转换为短日期:
#bond_sql.deal_time:
0 2018-04-09 14:34:41
1 2018-04-10 14:23:56
2 2018-04-16 09:53:42
3 2018-04-23 14:21:50
4 2018-05-07 13:57:13
from datetime import datetime
bond_sql.deal_time = list(map(lambda t: datetime.date(t),bond_sql.deal_time))
如果直接用import datetime,则上面的datetime.date(t)要写成:datetime.datetime.date(t)
0 2018-04-09
1 2018-04-10
2 2018-04-16
3 2018-04-23
4 2018-05-07
另外一种情况,源表中的8位数字的日期20170110转为 2017-01-10的日期格式 :
data_list.日期
0 20170110
1 20170111
2 20170112
3 20170113
4 20170119
Name: 日期, Length: 5, dtype: object
pd.to_datetime(data_list.日期,format='%Y%m%d')
Out:
0 2017-01-10
1 2017-01-11
2 2017-01-12
3 2017-01-13
4 2017-01-19
Name: 日期, Length: 5, dtype: datetime64[ns]
#赋值:
data_list.日期 = pd.to_datetime(data_list.日期,format='%Y%m%d')
把一个字符串转化为日期格式
import datetime
deal_date = '2021/01/05'
t1=datetime.datetime.strptime(deal_date,'%Y/%m/%d')
#Out: datetime.datetime(2021, 1, 5, 0, 0)
t1=datetime.datetime.strptime(deal_date,'%Y/%m/%d').date()
#Out: datetime.date(2021, 1, 5)
参考资料:
计算两个日期间隔年月日(带闰年判断)
日期天数差计算(Python)
利用Python中的pandas(date_range)库生成时间序列(time series)
https://www.cnblogs.com/OliverQin/p/12283897.html
获取中国日历
https://www.pypandas.cn/docs/user_guide/timeseries.html#时间戳-vs-时间段
https://www.cnblogs.com/lemonbit/p/6896499.html
https://www.jianshu.com/p/f29dddce3a9a