Pandas
是一种构建于Numpy的高级数据结构和精巧工具,快速简单的处理数据。
支持自动或明确的数据对齐的带有标签轴的数据结构
整合的时间序列功能
以相同的数据结构来处理时间序列和非时间序列
支持传递元数据(坐标轴标签)的算术运算
>>> import pandas as pd >>> a=pd.Series([1,3,5,np.nan,6,8])#生成一个序列,np.nan是生成一个空的字符 >>> a 0 1.0 1 3.0 2 5.0 3 NaN 4 6.0 5 8.0 dtype: float64
>>> dates =pd.date_range('20160102',periods=6)##日期,周期是6,periods >>> dates DatetimeIndex(['2016-01-02', '2016-01-03', '2016-01-04', '2016-01-05', '2016-01-06', '2016-01-07'], dtype='datetime64[ns]', freq='D')
>>> df =pd.DataFrame(np.random.randn(6,4),index=dates,columns=list('ABCD')) >>> df A B C D 2016-01-02 0.461499 -0.935497 -1.008590 -0.438713 2016-01-03 -0.566233 -1.614755 1.207207 -1.286580 2016-01-04 2.002371 1.333078 0.264322 1.215232 2016-01-05 0.242900 -1.508960 1.651483 0.229316 2016-01-06 -0.365214 -0.518801 -0.141358 -0.051713 2016-01-07 0.539730 -0.235725 1.101934 -1.360333
>>> df.head() A B C D 2016-01-02 0.461499 -0.935497 -1.008590 -0.438713 2016-01-03 -0.566233 -1.614755 1.207207 -1.286580 2016-01-04 2.002371 1.333078 0.264322 1.215232 2016-01-05 0.242900 -1.508960 1.651483 0.229316 2016-01-06 -0.365214 -0.518801 -0.141358 -0.051713 >>> df.tail() A B C D 2016-01-03 -0.566233 -1.614755 1.207207 -1.286580 2016-01-04 2.002371 1.333078 0.264322 1.215232 2016-01-05 0.242900 -1.508960 1.651483 0.229316 2016-01-06 -0.365214 -0.518801 -0.141358 -0.051713 2016-01-07 0.539730 -0.235725 1.101934 -1.360333 >>> df.T##行列的转置 2016-01-02 2016-01-03 2016-01-04 2016-01-05 2016-01-06 2016-01-07 A 0.461499 -0.566233 2.002371 0.242900 -0.365214 0.539730 B -0.935497 -1.614755 1.333078 -1.508960 -0.518801 -0.235725 C -1.008590 1.207207 0.264322 1.651483 -0.141358 1.101934 D -0.438713 -1.286580 1.215232 0.229316 -0.051713 -1.360333
>>> df.sort_values(by='B')##以B这列进行排列 A B C D 2016-01-03 -0.566233 -1.614755 1.207207 -1.286580 2016-01-05 0.242900 -1.508960 1.651483 0.229316 2016-01-02 0.461499 -0.935497 -1.008590 -0.438713 2016-01-06 -0.365214 -0.518801 -0.141358 -0.051713 2016-01-07 0.539730 -0.235725 1.101934 -1.360333 2016-01-04 2.002371 1.333078 0.264322 1.215232