时间序列学习笔记3
4. 时区处理
时区处理很麻烦,一般就以UTC来处理。
UTC为协调世界时,是格林尼治时间的替代者,目前已经是国际标准。
In [1]: import pytz
In [4]: pytz.common_timezones[-5:]
Out[4]: ['US/Eastern', 'US/Hawaii', 'US/Mountain', 'US/Pacific', 'UTC']
In [5]: tz = pytz.timezone('Asia/Shanghai')
In [6]: tz
Out[6]: <DstTzInfo 'Asia/Shanghai' LMT+8:06:00 STD>
4.1 本地化和转换
默认情况下,pandas时间序列是单纯(naive)时区的。
In [11]: rng = pd.date_range('2/19/2017 9:30', periods=4, freq='D')
In [12]: ts = Series(np.random.randn(4),index=rng)
In [13]: ts.index.tz # 结果为空
In [14]: ts
Out[14]:
2017-02-19 09:30:00 0.530722
2017-02-20 09:30:00 1.459262
2017-02-21 09:30:00 -0.038216
2017-02-22 09:30:00 -0.671159
Freq: D, dtype: float64
# 可以在创建的时候直接赋值 tz=?
In [15]: pd.date_range('2/19/2017 9:30', periods=4, freq='D', tz='UTC')
Out[15]:
DatetimeIndex(['2017-02-19 09:30:00+00:00', '2017-02-20 09:30:00+00:00',
'2017-02-21 09:30:00+00:00', '2017-02-22 09:30:00+00:00'],
dtype='datetime64[ns, UTC]', freq='D')
# 从naive到有时区,使用tz_localize
In [16]: tz_utc = ts.tz_localize('UTC')
In [17]: tz_utc
Out[17]:
2017-02-19 09:30:00+00:00 0.530722
2017-02-20 09:30:00+00:00 1.459262
2017-02-21 09:30:00+00:00 -0.038216
2017-02-22 09:30:00+00:00 -0.671159
Freq: D, dtype: float64
In [18]: tz_utc.index.tz
Out[18]: <UTC>
# 使用 tz_convert进行修改时区
In [20]: tz_utc.tz_convert('Asia/Shanghai')
Out[20]:
2017-02-19 17:30:00+08:00 0.530722
2017-02-20 17:30:00+08:00 1.459262
2017-02-21 17:30:00+08:00 -0.038216
2017-02-22 17:30:00+08:00 -0.671159
Freq: D, dtype: float64
4.2 Timestamp对象
# 创建一个Timestamp对象
In [25]: stamp = pd.Timestamp('2017-2-19 12:10')
# naive to utc
In [26]: stamp_utc = stamp.tz_localize('UTC')
# 转换
In [29]: stamp_cn = stamp_utc.tz_convert('Asia/Shanghai')
# value 显示从unix纪元(1970.1.1)开始计算的纳秒数
In [30]: stamp_utc.value
Out[30]: 1487506200000000000
In [31]: stamp_cn.value
Out[31]: 1487506200000000000
In [32]: stamp.value # 三个都是一样的
Out[32]: 1487506200000000000
4.3 不同时区之间的运算
不同时区之间的运算最终都转换成了UTC,因为实际存储中都是以UTC时区来存储的。
In [33]: ts
Out[33]:
2017-02-19 09:30:00 0.530722
2017-02-20 09:30:00 1.459262
2017-02-21 09:30:00 -0.038216
2017-02-22 09:30:00 -0.671159
Freq: D, dtype: float64
In [34]: ts.index
Out[34]:
DatetimeIndex(['2017-02-19 09:30:00', '2017-02-20 09:30:00',
'2017-02-21 09:30:00', '2017-02-22 09:30:00'],
dtype='datetime64[ns]', freq='D')
In [35]: ts1 = ts[:2].tz_localize('Europe/London')
In [36]: ts2 = ts1.tz_convert('Europe/Moscow')
In [37]: result = ts1 + ts2 # ts1和ts2在不同的时区
In [38]: result.index # 结果都转变为了UTC
Out[38]: DatetimeIndex(['2017-02-19 09:30:00+00:00', '2017-02-20 09:30:00+00:00'], dtype='datetime64[ns, UTC]', freq='D')
In [39]: result
Out[39]:
2017-02-19 09:30:00+00:00 1.061445
2017-02-20 09:30:00+00:00 2.918524
Freq: D, dtype: float64
5. 时期及算术运算
period(时期)表示时间区间,如数日、数月等。
In [4]: p = pd.Period(2017)
In [5]: p
Out[5]: Period('2017', 'A-DEC')
In [6]: p + 1
Out[6]: Period('2018', 'A-DEC')
In [7]: pd.Period(2018) - p
Out[7]: 1
In [8]: rng = pd.period_range('1/1/2001','6/30/2001', freq='M')
In [9]: rng
Out[9]: PeriodIndex(['2001-01', '2001-02', '2001-03', '2001-04', '2001-05', '2001-06'], dtype='int64', freq='M')
In [10]: Series(np.random.randn(6), index=rng)
Out[10]:
2001-01 1.146489
2001-02 2.112800
2001-03 0.292746
2001-04 -0.841383
2001-05 -0.845565
2001-06 1.207504
Freq: M, dtype: float64
# 列表
In [11]: values = ['2001Q3','2002Q2','2003Q1']
In [13]: index = pd.PeriodIndex(values, freq='Q-DEC') # 以DEC月份作为年度最后一天,来计算季度
In [14]: index
Out[14]: PeriodIndex(['2001Q3', '2002Q2', '2003Q1'], dtype='int64', freq='Q-DEC')
In [26]: index.asfreq('Q-JUN') # 修改一下
Out[26]: PeriodIndex(['2002Q1', '2002Q4', '2003Q3'], dtype='int64', freq='Q-JUN')
5.1 period的频率转换
In [15]: p
Out[15]: Period('2017', 'A-DEC') # 按年取,取一年,年尾是12年31日
In [16]: p.asfreq('M', how='start') #
Out[16]: Period('2017-01', 'M')
In [17]: p.asfreq('M', how='end')
Out[17]: Period('2017-12', 'M')
In [18]: p = pd.Period('2017',freq='A-JUN') # 取2017年,以7月底为年终
In [19]: p.asfreq('M',how='end')
Out[19]: Period('2017-06', 'M')
In [20]: rng = pd.period_range('2006','2009',freq='A-DEC') # 取6-9的每年
In [21]: ts = Series(np.random.randn(len(rng)), index=rng)
In [22]: ts
Out[22]:
2006 -0.627032
2007 -1.409714
2008 0.072737
2009 1.240899
Freq: A-DEC, dtype: float64
In [23]: ts.asfreq('M', how='start') # 按月取,取第一个月
Out[23]:
2006-01 -0.627032
2007-01 -1.409714
2008-01 0.072737
2009-01 1.240899
Freq: M, dtype: float64
In [24]: ts.asfreq('B', how='end') # 修改频率到天,并取最后一天
Out[24]:
2006-12-29 -0.627032
2007-12-31 -1.409714
2008-12-31 0.072737
2009-12-31 1.240899
Freq: B, dtype: float64
5.2 按季度计算的时期频率
In [28]: rng = pd.period_range('2011Q3','2012Q4',freq='Q-JAN')
In [29]: rs = Series(np.arange(len(rng)), index=rng)
In [30]: new_rng = (rng.asfreq('B','e') - 1).asfreq('T','s') + 16*60
In [35]: rs.index = new_rng.to_timestamp()
In [36]: rs
Out[36]:
2010-10-28 16:00:00 0
2011-01-28 16:00:00 1
2011-04-28 16:00:00 2
2011-07-28 16:00:00 3
2011-10-28 16:00:00 4
2012-01-30 16:00:00 5
dtype: int64
5.3 将timestamp和period进行转换
In [38]: rng = pd.date_range('1/1/2001', periods=3, freq='M')
In [40]: ts = Series(np.random.randn(3), index=rng)
In [41]: pts = ts.to_period() # 转换成时期
In [42]: ts
Out[42]:
2001-01-31 0.619856
2001-02-28 -2.117066
2001-03-31 1.152329
Freq: M, dtype: float64
In [43]: pts
Out[43]:
2001-01 0.619856
2001-02 -2.117066
2001-03 1.152329
Freq: M, dtype: float64
In [45]: pts.to_timestamp(how='end') # 转换成时间戳
Out[45]:
2001-01-31 0.619856
2001-02-28 -2.117066
2001-03-31 1.152329
Freq: M, dtype: float64
5.4 通过数据创建PeriodIndex
In [47]: q = Series(range(1,5) * 7) # 创建季度
In [48]: y = Series(np.arange(1988,2016)) # 创建年份
In [49]: index = pd.PeriodIndex(year=y,quarter=q, freq='Q-DEC') # 创建index
In [50]: data = Series(np.random.randn(28), index=index)
In [51]: data
Out[51]:
1988Q1 -0.127187
1989Q2 -1.757196
1990Q3 0.826757
...
2013Q2 0.540955
2014Q3 0.531101
2015Q4 0.751739
Freq: Q-DEC, dtype: float64
待续。。。