pandas 索引占用内存大小显示 pd.index.memory_usage()
>>> s = pd.Series(np.zeros(10**6))
>>> s.index
RangeIndex(start=0, stop=1000000, step=1)
>>> s.index.memory_usage() # in bytes
128 # the same as for Series([0.])
现在,如果我们删除一个元素,索引隐式地转换为类似于dict的结构,如下所示:
>>> s.drop(1, inplace=True)
>>> s.index
Int64Index([ 0, 2, 3, 4, 5, 6, 7,
...
999993, 999994, 999995, 999996, 999997, 999998, 999999],
dtype='int64', length=999999)
>>> s.index.memory_usage()
7999992
该结构消耗8Mb内存!为了摆脱它,回到轻量级的类range结构,添加如下代码:
>>> s.reset_index(drop=True, inplace=True)
>>> s.index
RangeIndex(start=0, stop=999999, step=1)
>>> s.index.memory_usage()
128