pandas 索引、选取和过滤
Series索引的工作方式类似于NumPy数组的索引,不过Series的索引值不只是整数,如:
import numpy as np import pandas as pd from pandas import Series,DataFrame obj=Series(np.arange(4),index=['a','b','c','d'])
obj=Series(np.arange(4),index=['a','b','c','d']) obj Out[10]: a 0 b 1 c 2 d 3 dtype: int32
obj['b'] Out[11]: 1 obj[1] Out[12]: 1 obj[2:4] Out[13]: c 2 d 3 dtype: int32 obj[['b','a','d']] Out[14]: b 1 a 0 d 3 dtype: int32 obj[[1,3]] Out[15]: b 1 d 3 dtype: int32
obj[obj<2]
Out[17]:
a 0
b 1
dtype: int32
#利用标签索引与普通的Python切片运算不同 #因为末端是包含的 obj['b':'c']=5 obj Out[24]: a 0 b 5 c 5 d 3 dtype: int32
DataFrame 进行索引其实就是获取一个或者多个列:
获取列:指定列名称即可
data=DataFrame(np.arange(16).reshape((4,4)),index=['Ohio','Colorado','Utah','New York'],columns=['one','two','three','four']) data Out[26]: one two three four Ohio 0 1 2 3 Colorado 4 5 6 7 Utah 8 9 10 11 New York 12 13 14 15 data['two'] Out[27]: Ohio 1 Colorado 5 Utah 9 New York 13 Name: two, dtype: int32 data[['three','one']] Out[28]: three one Ohio 2 0 Colorado 6 4 Utah 10 8 New York 14 12
获取行:
(1)通过切片或布尔型数组;
(2)通过布尔型DataFrame进行索引;
(3)在行上标签索引,引入索引字段ix,它可以通过NumPy式的标记法及轴标签从DataFrame中选取行和列的子集。
#切片获取行 data[:2] Out[29]: one two three four Ohio 0 1 2 3 Colorado 4 5 6 7 #布尔型数组获取行 data[data['three']>5] Out[30]: one two three four Colorado 4 5 6 7 Utah 8 9 10 11 New York 12 13 14 15 #布尔型DataFrame进行索引 data<5 Out[31]: one two three four Ohio True True True True Colorado True False False False Utah False False False False New York False False False False #将data<5的数值赋值为0 data[data<5]=0 data Out[33]: one two three four Ohio 0 0 0 0 Colorado 0 5 6 7 Utah 8 9 10 11 New York 12 13 14 15 #行上进行标签索引,使用索引字段ix data.ix['Colorado',['two','three']] Out[34]: two 5 three 6 Name: Colorado, dtype: int32 data.ix[['Colorado','Utah'],[3,0,1]] Out[35]: four one two Colorado 7 0 5 Utah 11 8 9 #索引的是行索引号为2的数据,也就是行Utah data.ix[2] Out[36]: one 8 two 9 three 10 four 11 Name: Utah, dtype: int32 data.ix[:'Utah','two'] Out[37]: Ohio 0 Colorado 5 Utah 9 Name: two, dtype: int32 #索引data.three>5的行 data.ix[data.three>5,:3] Out[38]: one two three Colorado 0 5 6 Utah 8 9 10 New York 12 13 14
DataFrame的索引选项
#选取DataFrame的单个列或者一组列 obj[val] #选取的单个行或者一组行 obj.ix[val] #选取单个列或列的子集 obj.ix[:,val] #同时选取行和列 obj.ix[val1,val2]