pandas(八)重塑和轴向旋转
重塑层次化索引
层次化索引为DataFrame的重排提供了良好的一致性操作,主要方法有
stack :将数据的列旋转为行
unstack:将数据的行转换为列
用一个dataframe对象举例
In [4]: data = DataFrame(np.arange(6).reshape((2,3)),index = pd.Index(['Ohio','Colorado'],name='state'),columns = pd.Index(['one','two','three'],name = 'number')) In [5]: data Out[5]: number one two three state Ohio 0 1 2 Colorado 3 4 5 In [6]: data.stack()#将列索引转换为行索引 Out[6]: state number Ohio one 0 two 1 three 2 Colorado one 3 two 4 three 5 dtype: int32 In [7]: data.unstack()#将行索引转换为列索引 Out[7]: number state one Ohio 0 Colorado 3 two Ohio 1 Colorado 4 three Ohio 2 Colorado 5 dtype: int32 In [9]: data.unstack().index Out[9]: MultiIndex(levels=[['one', 'two', 'three'], ['Ohio', 'Colorado']], labels=[[0, 0, 1, 1, 2, 2], [0, 1, 0, 1, 0, 1]], names=['number', 'state']) In [10]:
对于DataFrame,无论是使用unstack,还是stack,得到都是一个Series对象
Series对象,只有unstack方法。
默认情况下,unstack操作的是最内层,传入分层级别的编号或名称即可对相应级别的索引做操作。
In [21]: result.unstack(0) Out[21]: state Ohio Colorado number one 0 3 two 1 4 three 2 5 In [22]: result.unstack() Out[22]: number one two three state Ohio 0 1 2 Colorado 3 4 5 In [23]: result.unstack('state') Out[23]: state Ohio Colorado number one 0 3 two 1 4 three 2 5
如果不是所有的级别的值都能在个分组中找到的话,则unstack会引入缺失值
In [24]: s1 =Series([0,1,2,3],index = ['a','b','c','d']) In [25]: s2 = Series([4,5,6],index = ['c','d','e']) In [26]: data2 = pd.concat([s1,s2],keys = ['one','two']) In [27]: data2 Out[27]: one a 0 b 1 c 2 d 3 two c 4 d 5 e 6 dtype: int64 In [28]: data2.unstack() Out[28]: a b c d e one 0.0 1.0 2.0 3.0 NaN two NaN NaN 4.0 5.0 6.0 In [29]: data2.unstack(0) Out[29]: one two a 0.0 NaN b 1.0 NaN c 2.0 4.0 d 3.0 5.0 e NaN 6.0
而stack默认会滤除缺失值。
在对DataFrame进行旋转操作时,旋转的轴会成为旋转后索引的最低级别。也就是最内层索引。