pandas处理时间序列（1）：pd.Timestamp()、pd.Timedelta()、pd.datetime( )、 pd.Period()、pd.to_timestamp()、datetime.strftime()、pd.to_datetime( )、pd.to_period()

　　Pandas库是处理时间序列的利器，pandas有着强大的日期数据处理功能，可以按日期筛选数据、按日期显示数据、按日期统计数据。

pandas的实际类型主要分为：

timestamp（时间戳）
period（时期）
timedelta（时间间隔）

常用的日期处理函数有：

pd.to_datetime()
pd.to_period()
pd.date_range()
pd.period_range
resample

一、定义时间格式

1. pd.Timestamp()、pd.Timedelta()

（1）Timestamp时间戳

#定义timestamp
t1=pd.Timestamp('2019-01-10')
t2=pd.Timestamp('2018-12-10')
print(f't1= {t1}')
print(f't2= {t2}')
print(f't1与t2时间间隔：{(t1-t2).days}天')

#获取当前时间
now=pd.datetime.now()
print(now)
print(now.strftime('%Y-%m-%d'))

（2）Timedelta：实现datetime加减

对日期和时间进行加减实际上就是把datetime往后或往前计算，得到新的datetime。加减可以直接用+和-运算符，不过需要导入timedelta这个类：

#时间间隔
pd.Timedelta(days=5, minutes=50, seconds=20, milliseconds=10, microseconds=10, nanoseconds=10)

#计算当前时间往后100天的日期
dt=now+pd.Timedelta(days=100)
#只显示年月日
dt.strftime('%Y-%m-%d')

2. pd.Period()

#定义时期period，默认是A-DEC，代表年份，以12月作为最后一个月
p1=pd.Period('2019')
p2=pd.Period('2018')
print(f'p1={p1}年')
print(f'p2={p2}年')
print(f'p1和p2间隔{p1-p2}年')
#可以直接+、-整数（代表年）
print(f'十年前是{p1-10}年')

#通过asfreq转换时期频率
#以第一个月算,p1前面已赋值为2019年
p1.asfreq('M','start')

#以最后一个月算
p1.asfreq('M','end')

#财报季度
p=pd.Period('2019Q3',freq='Q-DEC')
#起始月日
print(p.asfreq('D','start'))
#结束月日
print(p.asfreq('D','end'))

3. pd.to_timestamp()

时期转为时间戳

#时间戳和时期相互转换
print(p1.to_timestamp(how='end'))
print(p1.to_timestamp(how='start'))

4. pd.to_period()

时间戳转为时期

#t1前面赋值为'2019-1-10'
#转换为月时期
print(t1.to_period('M'))
#转换为日时期
print(t1.to_period('D'))
print(t1.to_period('W'))

5. pd.to_datetime()

pandas.to_datetime（arg，errors ='raise'，utc = None，format = None，unit = None ）

（1）获取指定的时间和日期

当数据很多，且日期格式不标准时的时候，可以使用to_datetime，将DataFrame中的时间转换成统一标准。

例如：df[''date]数据类型为“object”，通过pd.to_datetime将该列数据转换为时间类型，即datetime。

df['date_formatted']=pd.to_datetime(df['date'],format='%Y-%m-%d')

常用时间：

（2）to_datetime可以处理那些被认为是缺失值的值(None、空字符串)

（3）将Str和Unicode转化为时间格式

6. strptime和strftime

（1）字符串转换成datetime格式: strptime

用户输入的日期和时间是字符串，要处理日期和时间，首先必须把str转换为datetime。转换方法是通过datetime.strptime()实现，需要一个日期和时间的格式化字符串：

df_data1  = pd.DataFrame(columns=['date','values'])
df_data1['date'] = ['2019-01-01','2019-01-02','2019-01-03','2019-01-04','2019-01-05']
df_data1['values'] = np.random.randn(5)
df_data1

df_data1['date'] = df_data1['date'].map(lambda x:datetime.strptime(x,'%Y-%m-%d'))
df_data1

注意转换后的datetime是没有时区信息的。

举例：将分开的年月日时整合，并设置为索引

数据集：

from datetime import datetime
# load data
def parse(x):
    return datetime.strptime(x, '%Y %m %d %H')
dataset = read_csv('raw.csv',  parse_dates = [['year', 'month', 'day', 'hour']], index_col=0, date_parser=parse)
dataset.drop('No', axis=1, inplace=True)
# manually specify column names
dataset.columns = ['pollution', 'dew', 'temp', 'press', 'wnd_dir', 'wnd_spd', 'snow', 'rain']
dataset.index.name = 'date'

（2）datetime变回string格式: strftime

如果已经有了datetime对象，要把它格式化为字符串显示给用户，就需要转换为str，转换方法是通过strftime()实现的，同样需要一个日期和时间的格式化字符串：

#定义一个DataFrame格式的数据df_data
df_data  = pd.DataFrame(columns=['date','values'])
df_data['date'] = pd.date_range('2019/01/01',periods=5)
df_data['values'] = np.random.randn(5)
df_data

用strftime把datetime格式的时间数据转换成string

df_data['date'] = df_data['date'].apply(lambda x:x.strftime('%Y/%m')) #datetime格式转成str

以下是时间格式定义

代码  说明
%Y  4位数的年
%y  2位数的年
%m  2位数的月[01,12]
%d  2位数的日[01，31]
%H  时（24小时制）[00,23]
%l  时（12小时制）[01,12]
%M  2位数的分[00,59]
%S  秒[00,61]有闰秒的存在
%w  用整数表示的星期几[0（星期天），6]
%F  %Y-%m-%d简写形式例如，2017-06-27
%D  %m/%d/%y简写形式

参考文献：

【1】python的时间转换datetime和pd.to_datetime

【2】pandas.to_datetime

posted @ 2019-04-05 22:41 nxf_rabbit75 阅读(42027) 评论(0) 收藏举报

刷新页面返回顶部