pandas 索引、选取和过滤

Series索引的工作方式类似于NumPy数组的索引,不过Series的索引值不只是整数,如:

import numpy as np
import pandas as pd
from pandas import Series,DataFrame
obj=Series(np.arange(4),index=['a','b','c','d'])
obj=Series(np.arange(4),index=['a','b','c','d'])
obj
Out[10]: 
a    0
b    1
c    2
d    3
dtype: int32
obj['b']
Out[11]: 1

obj[1]
Out[12]: 1

obj[2:4]
Out[13]: 
c    2
d    3
dtype: int32

obj[['b','a','d']]
Out[14]: 
b    1
a    0
d    3
dtype: int32

obj[[1,3]]
Out[15]: 
b    1
d    3
dtype: int32

obj[obj<2]
Out[17]:
a 0
b 1
dtype: int32

#利用标签索引与普通的Python切片运算不同
#因为末端是包含的
obj['b':'c']=5

obj
Out[24]: 
a    0
b    5
c    5
d    3
dtype: int32
 
DataFrame 进行索引其实就是获取一个或者多个列:
 
获取列:指定列名称即可
data=DataFrame(np.arange(16).reshape((4,4)),index=['Ohio','Colorado','Utah','New York'],columns=['one','two','three','four'])

data
Out[26]: 
          one  two  three  four
Ohio        0    1      2     3
Colorado    4    5      6     7
Utah        8    9     10    11
New York   12   13     14    15


data['two']
Out[27]: 
Ohio         1
Colorado     5
Utah         9
New York    13
Name: two, dtype: int32

data[['three','one']]
Out[28]: 
          three  one
Ohio          2    0
Colorado      6    4
Utah         10    8
New York     14   12

 

获取行:
(1)通过切片或布尔型数组;
(2)通过布尔型DataFrame进行索引;
(3)在行上标签索引,引入索引字段ix,它可以通过NumPy式的标记法及轴标签从DataFrame中选取行和列的子集。
#切片获取行
data[:2]
Out[29]: 
          one  two  three  four
Ohio        0    1      2     3
Colorado    4    5      6     7

#布尔型数组获取行
data[data['three']>5]
Out[30]: 
          one  two  three  four
Colorado    4    5      6     7
Utah        8    9     10    11
New York   12   13     14    15

#布尔型DataFrame进行索引
data<5
Out[31]: 
            one    two  three   four
Ohio       True   True   True   True
Colorado   True  False  False  False
Utah      False  False  False  False
New York  False  False  False  False

#将data<5的数值赋值为0
data[data<5]=0

data
Out[33]: 
          one  two  three  four
Ohio        0    0      0     0
Colorado    0    5      6     7
Utah        8    9     10    11
New York   12   13     14    15

#行上进行标签索引,使用索引字段ix
data.ix['Colorado',['two','three']]
Out[34]: 
two      5
three    6
Name: Colorado, dtype: int32

data.ix[['Colorado','Utah'],[3,0,1]]
Out[35]: 
          four  one  two
Colorado     7    0    5
Utah        11    8    9

#索引的是行索引号为2的数据,也就是行Utah
data.ix[2]
Out[36]: 
one       8
two       9
three    10
four     11
Name: Utah, dtype: int32

data.ix[:'Utah','two']
Out[37]: 
Ohio        0
Colorado    5
Utah        9
Name: two, dtype: int32

#索引data.three>5的行
data.ix[data.three>5,:3]
Out[38]: 
          one  two  three
Colorado    0    5      6
Utah        8    9     10
New York   12   13     14

 

DataFrame的索引选项

#选取DataFrame的单个列或者一组列
obj[val]
#选取的单个行或者一组行
obj.ix[val]
#选取单个列或列的子集
obj.ix[:,val]
#同时选取行和列
obj.ix[val1,val2]

 

 
posted @ 2018-07-10 12:43  平淡才是真~~  阅读(3331)  评论(0编辑  收藏  举报