Pandas之Series的使用

(一)熟悉pandas的两个工具数据结构:SeriesDataFrame

Series

Series是一种一维的数组对象,它包含了一个值序列(与Numpy中的类型相似),并且包含了数据标签,称为索引(index)

创建一个Series对象

import pandas as pd

obj = pd.Series([4,7,-5,3])
obj
0    4
1    7
2   -5
3    3
dtype: int64

Series的使用

  • 左边的是索引,右边是值。默认索引从0N-1N是数据的长度)

  • 可以通过values属性和index属性获得值和索引

    obj.values
    array([ 4,  7, -5,  3], dtype=int64)
    obj.index
    RangeIndex(start=0, stop=4, step=1)
    
  • 可以创建一个索引序列,用标签标识每个数据点

    obj2 = pd.Series([4,7,-5,3],index=['d','b','a','c'])
    
    obj2
    d    4
    b    7
    a   -5
    c    3
    dtype: int64
    
    obj2.index
    Index(['d', 'b', 'a', 'c'], dtype='object')
    
    obj2['a']
    -5
    
  • 使用布尔值数组进行过滤

    obj2[obj2 > 2]
    d    4
    b    7
    c    3
    dtype: int64
    
    np.exp(obj2)
    d      54.598150
    b    1096.633158
    a       0.006738
    c      20.085537
    dtype: float64
    
  • 可以传入字典生成Series对象

    sdata = {'Ohio':35000,'Texas':71000,'Oregon':16000,'Utah':5000}
    obj3 = pd.Series(sdata)
    obj3
    obj3
    Ohio      35000
    Texas     71000
    Oregon    16000
    Utah       5000
    dtype: int64
    Texas
    states = ['California','Ohio','Oregon','Texas']
    obj4 = pd.Series(sdata, index=states)
    4
    obj4
    California        NaN
    Ohio          35000.0
    Oregon        16000.0
    Texas         71000.0
    dtype: float64
    
  • pandas中使用isnullnotnull函数来检查缺失数据

    pd.isnull(obj4)
    California     True
    Ohio          False
    Oregon        False
    Texas         False
    dtype: bool
    
    pd.notnull(obj4)
    California    False
    Ohio           True
    Oregon         True
    Texas          True
    dtype: bool
    
    
    obj4.isnull()
    California     True
    Ohio          False
    Oregon        False
    Texas         False
    dtype: bool
    
  • 自动对齐索引

    obj4
    California        NaN
    Ohio          35000.0
    Oregon        16000.0
    Texas         71000.0
    dtype: float64
        
    obj3
    Ohio      35000
    Texas     71000
    Oregon    16000
    Utah       5000
    dtype: int64
    
    obj3+obj4
    California         NaN
    Ohio           70000.0
    Oregon         32000.0
    Texas         142000.0
    Utah               NaN
    dtype: float64
    
  • Series对象自身和其索引都有name属性

    obj4.name = 'population'
    obj4.index.name = 'state'
    obj4
    state
    California        NaN
    Ohio          35000.0
    Oregon        16000.0
    Texas         71000.0
    Name: population, dtype: float64
    
posted @ 2020-08-08 16:08  Techoc  阅读(162)  评论(0编辑  收藏  举报