Pandas之Series的使用
(一)熟悉pandas的两个工具数据结构:Series和DataFrame
Series
Series
是一种一维的数组对象,它包含了一个值序列(与Numpy
中的类型相似),并且包含了数据标签,称为索引(index)
创建一个Series
对象
import pandas as pd
obj = pd.Series([4,7,-5,3])
obj
0 4
1 7
2 -5
3 3
dtype: int64
Series
的使用
-
左边的是索引,右边是值。默认索引从
0
到N-1
(N
是数据的长度) -
可以通过
values
属性和index
属性获得值和索引obj.values array([ 4, 7, -5, 3], dtype=int64) obj.index RangeIndex(start=0, stop=4, step=1)
-
可以创建一个索引序列,用标签标识每个数据点
obj2 = pd.Series([4,7,-5,3],index=['d','b','a','c']) obj2 d 4 b 7 a -5 c 3 dtype: int64 obj2.index Index(['d', 'b', 'a', 'c'], dtype='object') obj2['a'] -5
-
使用布尔值数组进行过滤
obj2[obj2 > 2] d 4 b 7 c 3 dtype: int64 np.exp(obj2) d 54.598150 b 1096.633158 a 0.006738 c 20.085537 dtype: float64
-
可以传入字典生成
Series
对象sdata = {'Ohio':35000,'Texas':71000,'Oregon':16000,'Utah':5000} obj3 = pd.Series(sdata) obj3 obj3 Ohio 35000 Texas 71000 Oregon 16000 Utah 5000 dtype: int64 Texas states = ['California','Ohio','Oregon','Texas'] obj4 = pd.Series(sdata, index=states) 4 obj4 California NaN Ohio 35000.0 Oregon 16000.0 Texas 71000.0 dtype: float64
-
pandas
中使用isnull
和notnull
函数来检查缺失数据pd.isnull(obj4) California True Ohio False Oregon False Texas False dtype: bool pd.notnull(obj4) California False Ohio True Oregon True Texas True dtype: bool obj4.isnull() California True Ohio False Oregon False Texas False dtype: bool
-
自动对齐索引
obj4 California NaN Ohio 35000.0 Oregon 16000.0 Texas 71000.0 dtype: float64 obj3 Ohio 35000 Texas 71000 Oregon 16000 Utah 5000 dtype: int64 obj3+obj4 California NaN Ohio 70000.0 Oregon 32000.0 Texas 142000.0 Utah NaN dtype: float64
-
Series
对象自身和其索引都有name
属性obj4.name = 'population' obj4.index.name = 'state' obj4 state California NaN Ohio 35000.0 Oregon 16000.0 Texas 71000.0 Name: population, dtype: float64