pandas使用

一.Series

1.Series的建立

import numpy as np

import pandas as pd

(1)直接建立

pddata_1 = pd.Series(data=[1,2,3],index = [4,17,'a'],dtype = np.float64)

pddata_1 = pd.Series(data=[1],index = [4],dtype = np.float64) 仅含有单个元素,记得加括号(为了形成下面说的四种形式)不然会报错

(2)用字典建立

In [68]: dicttna={1:'foo',3:'drt',8:'tyue'}

In [69]: serie_12=pd.Series(dicttna)

data里面添加的数据可以为数组,列表,元祖,生成器四种形式

index索引中内容可以自定义,但是必须和data的长度一致

2.获取数据值和索引(用values和index)

(1)获取所有的值

pddata_1.values
Out[19]: array([ 1., 2., 3.]) 返回一个数组

(2)获取索引

pddata_1.index
Out[21]: Int64Index([4, 5, 6], dtype='int64') 返回的数据形式未知

(3)利用索引获取某一个数值

pddata_1['a']
Out[25]: 3.0

3.Series的增删改查

增:类似于字典直接添加,还可以用append 见下5(3)

删:drop(只是视图上删除,原数据没有改变) del 彻底删除元素

　　pddata_1.drop('a') 注意是小括号

　　del pddata_1['a']

　　pddata.pop('a') 会返回删除的值

改:update 见下面5(2)

查:见2 获取数据和索引

4.Series数据的过滤和计算

(1)过滤

注意逻辑与用&表示,逻辑或用|表示(区分python的使用)

逻辑两边必须加括号,加括号,加括号

seriesdata_2[(seriesdata_2>1) & (seriesdata_2<10)]
Out[34]:
t 9
o 3
dtype: int64

(2)计算

seriesdata_2*3
Out[35]:
a 102
d -12
e -135
y -111
f 96
t 27
u 3
o 9
dtype: int64

In [38]: np.sin(seriesdata_2)

Out[38]:

a 0.529083

d 0.756802

e -0.850904

y 0.643538

f 0.551427

t 0.412118

u 0.841471

o 0.141120

dtype: float64

5.Series类似于字典的结构

Series的数据类似于字典的键,index类似于字典的值

(1)Series的in功能

datas_pys=pd.Series(range(4),index=['i','want','to','do'])

datas_pys
Out[40]:
i 0
want 1
to 2
do 3
dtype: int32

'want' in datas_pys 返回True值,相当于字典中查找键是否存在
Out[41]: True

(2)Series的update功能

注意点:update内部必须是Series函数

更新多个数据

datas_pys.update(pd.Series([2,3,4],index = ['want','to','do']))

datas_pys
Out[44]:
i 0
want 2
to 3
do 4
dtype: int32

更新单个数据

datas_pys.update(pd.Series([11],index=['to']))

datas_pys
Out[49]:
i 0
want 2
to 11
do 4
dtype: int32

(3)Series的append功能

可以类似于字典直接添加元素

data_1

Out[58]:
i 55
want 11
to 2
do 22
dtype: int64

data_1['wang']=66

data_1
Out[60]:
i 55
want 11
to 2
do 22
wang 66
dtype: int64

s1 = pd.Series([1,2,3])

s2 =pd.Series([4,5,6,7])

s3 = pd.Series([4,5,6],index=[5,6,7])

s1.append(s2)
Out[57]:
0 1
1 2
2 3
0 4 注意新添加的数据的索引从0开始重新填写
1 5
2 6
3 7
dtype: int64

In [62]: s1.append(s3)

Out[62]:

0 1

1 2

2 3

5 4 添加的索引为s3已经定义好的索引

6 5

7 6

In [63]: s1.append(s2, ignore_index=True)

Out[63]:

0 1

1 2

2 3

3 4 如果添加 ignore_index=True,那么索引会自动按顺序添加

4 5

5 6

dtype: int64

(4)isnull和notnull函数可用于检测数据缺失。

In [79]: dit_113={'lin':139,'zhang':134,'wang':173,'tan':None}

In [80]: serie_123=pd.Series(dit_113)

In [81]: serie_123

Out[81]:

lin 139.0

tan NaN

wang 173.0

zhang 134.0

dtype: float64

In [82]: pd.isnull(serie_123)

Out[82]:

lin False

tan True

wang False

zhang False

dtype: bool

In [83]: pd.notnull(serie_123)

Out[83]:

lin True

tan False

wang True

zhang True

dtype: bool

serie_123.notnull()

Out[85]:

lin True

tan False

wang True

zhang True

dtype: bool

posted on 2018-03-31 22:42 风过竹影阅读(226) 评论(0) 编辑收藏举报