pandas 学习
有两种方式定义DataFrame
一个是利用字典序列,另一个是用numpy导入数据
第二种:
>>> datas=pd.date_range('20190101',periods=6) #numpy导入的方式 >>> datas DatetimeIndex(['2019-01-01', '2019-01-02', '2019-01-03', '2019-01-04', '2019-01-05', '2019-01-06'], dtype='datetime64[ns]', freq='D') >>> df=pd.DataFrame(np.random.randn(6,4),index=datas,columns=['x','y','z','w']) >>> df x y z w 2019-01-01 -2.124954 -1.515395 -0.638061 -1.357047 2019-01-02 0.574289 -0.487437 -0.560849 -1.208926 2019-01-03 0.789728 -0.596398 -0.177524 1.077749 2019-01-04 -0.787758 -0.852675 0.500035 -1.831362 2019-01-05 -0.427867 -1.682430 0.409840 -0.171332 2019-01-06 -0.046703 -0.703367 1.387183 -1.858081 >>> df2=pd.DataFrame(np.random.randn(3,4))
第一种:
>>> df2=pd.DataFrame({'A':1.,'B':pd.Timestamp('20130102'),'C':1.,'D':np.array([3]*4,dtype='int32'),'E':pd.Categorical(["test","train","bus","car"])}) >>> df2 A B C D E 0 1.0 2013-01-02 1.0 3 test 1 1.0 2013-01-02 1.0 3 train 2 1.0 2013-01-02 1.0 3 bus 3 1.0 2013-01-02 1.0 3 car >>>
可以打印出df2的行的名字和列的名字还有值--->
>>> df2.columns //打印行的名字 Index(['A', 'B', 'C', 'D', 'E'], dtype='object')
//打印列的名字
>>> df2.index
RangeIndex(start=0, stop=4, step=1)
>>> df2.dtypes //数据种类
A float64
B datetime64[ns]
C float64
D int32
E category
dtype: object
>>>df2.values //打印值
df2.T //行变列,列变行,就像矩阵的转置一样
排序索引:
>>> df2.sort_index(axis=1,ascending=False) E D C B A 0 test 3 1.0 2013-01-02 1.0 1 train 3 1.0 2013-01-02 1.0 2 bus 3 1.0 2013-01-02 1.0 3 car 3 1.0 2013-01-02 1.0 //axis=1对行排序,axis等于0,对列排序。ascending=False倒序,由大到小吧
//True 升序
排序值:
>>> df2.sort_values(by='E') A B C D E 2 1.0 2013-01-02 1.0 3 bus 3 1.0 2013-01-02 1.0 3 car 0 1.0 2013-01-02 1.0 3 test 1 1.0 2013-01-02 1.0 3 train //对E这一列排序
注释搞错了,用#