python之pandas&&DataFrame(二)
简单操作
Python-层次聚类-Hierarchical clustering
>>> data = pd.Series(np.random.randn(10),index=[['a','a','a','b','b','c','c','d','d','d'],[1,2,3,1,2,1,2,3,1,2]]) >>> data a 1 -0.168871 2 0.828841 3 0.786215 b 1 0.506081 2 -2.304898 c 1 0.864875 2 0.183091 d 3 -0.678791 1 -1.241735 2 0.778855 dtype: float64
Hierarchical与DataFrame之间的转换
>>> data.unstack() 1 2 3 a -0.168871 0.828841 0.786215 b 0.506081 -2.304898 NaN c 0.864875 0.183091 NaN d -1.241735 0.778855 -0.678791 >>> type(data.unstack()) <class 'pandas.core.frame.DataFrame'>
Merge,join,Concatenate
>>> df2 = pd.DataFrame({'apts':[55000,60000],'cars':[15000,12000]},index=['hangzhou','najing']) >>> df1 = pd.DataFrame({'apts':[55000,60000],'cars':[20000,30000]},index=['shanghai','beijing']) >>> df3 = pd.DataFrame({'apts':[55000,60000],'cars':[15000,12000]},index=['guangzhou','chongqing']) >>> [df1,df2,df3] [ apts cars shanghai 55000 20000 beijing 60000 30000, apts cars hangzhou 55000 15000 najing 60000 12000, apts cars guangzhou 55000 15000 chongqing 60000 12000] >>> pd.concat([df1,df2,df3]) apts cars shanghai 55000 20000 beijing 60000 30000 hangzhou 55000 15000 najing 60000 12000 guangzhou 55000 15000 chongqing 60000 12000 frames = [df1,df2,df3] >>> result2 = pd.concat(frames,keys=['x','y','z']) >>> result2 apts cars x shanghai 55000 20000 beijing 60000 30000 y hangzhou 55000 15000 najing 60000 12000 z guangzhou 55000 15000 chongqing 60000 12000
进行拼接concat
>>> df4 = pd.DataFrame({"salaries":[10000,30000,30000,20000,15000]},index=['suzhou','beijing','shanghai','guanghzou','tianjin']) >>> result3 = pd.concat([result,df4],axis=1) >>> result3 apts cars salaries beijing 60000.0 30000.0 30000.0 chongqing 60000.0 12000.0 NaN guanghzou NaN NaN 20000.0 guangzhou 55000.0 15000.0 NaN hangzhou 55000.0 15000.0 NaN najing 60000.0 12000.0 NaN shanghai 55000.0 20000.0 30000.0 suzhou NaN NaN 10000.0 tianjin NaN NaN 15000.0
合并两个DataFrame,并且只是交集
>>> result3 = pd.concat([result,df4],axis=1,join='inner') >>> result3 apts cars salaries shanghai 55000 20000 30000 beijing 60000 30000 30000
Series和DataFrame一起Concatenate
>>> s1 = pd.Series([60,50],index=['shanghai','beijing'],name='meal') >>> s1 shanghai 60 beijing 50 Name: meal, dtype: int64 >>> type(s1) <class 'pandas.core.series.Series'> >>> df1 apts cars shanghai 55000 20000 beijing 60000 30000 >>> type(df1) <class 'pandas.core.frame.DataFrame'> >>> pd.concat([df1,s1],axis=1) apts cars meal shanghai 55000 20000 60 beijing 60000 30000 50 >>>
Series可以使用append进行行添加也可以列添加,但是concat不可以
>>> s2 = pd.Series([18000,12000],index=['apts','cars'],name='xiamen') >>> s2 apts 18000 cars 12000 Name: xiamen, dtype: int64 >>> df1.append(s2) apts cars shanghai 55000 20000 beijing 60000 30000 xiamen 18000 12000 >>> pd.concat([df1,s2],axis=0) 0 apts cars shanghai NaN 55000.0 20000.0 beijing NaN 60000.0 30000.0 apts 18000.0 NaN NaN cars 12000.0 NaN NaN >>> pd.concat([df1,s2],axis=1) apts cars xiamen apts NaN NaN 18000.0 beijing 60000.0 30000.0 NaN cars NaN NaN 12000.0 shanghai 55000.0 20000.0 NaN >>>
merge合并
>>> df1 = pd.DataFrame({"salaries":[10000,30000,30000,20000,15000],'cities':['suzhou','beijing','shanghai','guanghzou','tianjin']}) >>> df4 = pd.DataFrame({'apts':[55000,60000],'cars':[15000,12000],'cities':['shanghai','beijing']}) >>> result = pd.merge(df1,df4,on='cities') #on表示合并的列
>>> result cities salaries apts cars 0 beijing 30000 60000 12000 1 shanghai 30000 55000 15000
>>> result = pd.merge(df1,df4,on='cities',how='right') >>> result cities salaries apts cars 0 beijing 30000 60000 12000 1 shanghai 30000 55000 15000 >>> result = pd.merge(df1,df4,on='cities',how='left') >>> result cities salaries apts cars 0 suzhou 10000 NaN NaN 1 beijing 30000 60000.0 12000.0 2 shanghai 30000 55000.0 15000.0 3 guanghzou 20000 NaN NaN 4 tianjin 15000 NaN NaN