set_index()与reset_index()函数
一 set_index()函数
1 主要是理解drop和append参数,注意与reset_index()参数的不同.
import pandas as pd df = pd.DataFrame({'a': range(4), 'b': range(4, 0, -1), 'c': ['one', 'one', 'two', 'two'], 'd': ['a','b','c','d']}) print(df) # a b c d # 0 0 4 one a # 1 1 3 one b # 2 2 2 two c # 3 3 1 two d # set_index()的drop参数默认为True,如下即默认将普通列c列置为索引列后,将原先的普通列c列删除. # 注意它与reset_index()的drop不同,reset_index()中的drop默认为False,且这个drop为True时,删除的是原先的index列 df.set_index(['c'], inplace=True) print(df) # a b d # c # one 0 4 a # one 1 3 b # two 2 2 c # two 3 1 d # append参数为True,会保留原先的索引,为False时,新设置的索引会覆盖原先的索引,它类似与reset_index()中的drop. df.set_index(['b'], inplace=True, append=True) print(df) # a d # c b # one 4 0 a # 3 1 b # two 2 2 c # 1 3 d
二 reset_index()函数
1 重置索引后,drop参数默认为False,想要删除原先的索引列要置为True.想要在原数据上修改要inplace=True.特别是不赋值的情况必须要加,否则drop无效.
all_user_repay.reset_index(drop=True,inplace=True)
df1 = pd.DataFrame({'A': ['A0', 'A1'], 'B': ['B0', 'B1'], 'C': ['C0', 'C1'], 'D': ['D0', 'D1']}) df2 = pd.DataFrame({'A': ['A4', 'A5'], 'B': ['B4', 'B5'], 'C': ['C4', 'C5'], 'D': ['D4', 'D5']}) frames = [df1, df2] result = pd.concat(frames) print(result.reset_index()) # index A B C D # 0 0 A0 B0 C0 D0 # 1 1 A1 B1 C1 D1 # 2 0 A4 B4 C4 D4 # 3 1 A5 B5 C5 D5 print(result.reset_index(drop=True)) # A B C D # 0 A0 B0 C0 D0 # 1 A1 B1 C1 D1 # 2 A4 B4 C4 D4 # 3 A5 B5 C5 D5
Series.reset_index()
注意参数level默认移除原先的全部索引,即将原先的全部索引都置为普通列.
如果给level赋值,则只有所赋值的索引列置为普通列,其余的留下做索引列.
参考:http://pandas.pydata.org/pandas-docs/stable/reference/api/pandas.Series.reset_index.html?highlight=reset_index#pandas.Series.reset_index
arrays = [np.array(['bar', 'bar', 'baz', 'baz']), np.array(['one', 'two', 'one', 'two'])] s2 = pd.Series( range(4), name='foo', index=pd.MultiIndex.from_arrays(arrays, names=['a', 'b'])) print(s2) #这里如果想要保留修改不能用inplace参数,只能再赋给另一个引用 print(s2.reset_index(level='a')) print(s2.reset_index()) print(type(s2)) # a b # bar one 0 # two 1 # baz one 2 # two 3 # Name: foo, dtype: int64 # a foo # b # one bar 0 # two bar 1 # one baz 2 # two baz 3 # a b foo # 0 bar one 0 # 1 bar two 1 # 2 baz one 2 # 3 baz two 3 # <class 'pandas.core.series.Series'>
2 把某一列设为索引列
df.set_index('列名',inplace=True)