Pandas库--DataFrame

2.DataFrame

DataFrame是一个表格型的数据结构,类似于Excel或sql表

它含有一组有序的列,每列可以是不同的值类型(数值、字符串、布尔值等)

DataFrame既有行索引也有列索引,它可以被看做由Series组成的字典(共用同一个索引)

用多维数组字典、列表字典生成 DataFrame

 
data = {'state': ['Ohio', 'Ohio', 'Ohio', 'Nevada', 'Nevada'], 'year': [2000, 2001, 2002, 2001, 2002], 'pop': [1.5, 1.7, 3.6, 2.4, 2.9]}
frame = pd.DataFrame(data)
print(frame)
    state  year  pop
0    Ohio  2000  1.5
1    Ohio  2001  1.7
2    Ohio  2002  3.6
3  Nevada  2001  2.4
4  Nevada  2002  2.9

 

#如果指定了列顺序,则DataFrame的列就会按照指定顺序进行排列
frame1 = pd.DataFrame(data, columns=['year', 'state', 'pop'])
print(frame1)
   year   state  pop
0  2000    Ohio  1.5
1  2001    Ohio  1.7
2  2002    Ohio  3.6
3  2001  Nevada  2.4
4  2002  Nevada  2.9

  

跟原Series一样,如果传入的列在数据中找不到,就会产生NAN值

frame2 = pd.DataFrame(data, columns=['year', 'state', 'pop', 'debt'], index=['one', 'two', 'three', 'four', 'five'])
print(frame2)
       year   state  pop debt
one    2000    Ohio  1.5  NaN
two    2001    Ohio  1.7  NaN
three  2002    Ohio  3.6  NaN
four   2001  Nevada  2.4  NaN
five   2002  Nevada  2.9  NaN

  

用 Series 字典或字典生成 DataFrame

d = {'one': pd.Series([1., 2., 3.], index=['a', 'b', 'c']),
     'two': pd.Series([1., 2., 3., 4.], index=['a', 'b', 'c', 'd'])}
print(pd.DataFrame(d))
   one  two
a  1.0  1.0
b  2.0  2.0
c  3.0  3.0
d  NaN  4.0

#通过类似字典标记的方式或属性的方式,可以将DataFrame的列获取为一个Series,返回的Series拥有原DataFrame相同的索引

print(frame2['state'])
one        Ohio
two        Ohio
three      Ohio
four     Nevada
five     Nevada
Name: state, dtype: object

  

列可以通过赋值的方式进行修改,例如,给那个空的“delt”列赋上一个标量值或一组值

 
frame2['debt'] = 16.5
print(frame2)
       year   state  pop  debt
one    2000    Ohio  1.5  16.5
two    2001    Ohio  1.7  16.5
three  2002    Ohio  3.6  16.5
four   2001  Nevada  2.4  16.5
five   2002  Nevada  2.9  16.5

  

print(frame2)
frame2['new'] = frame2['debt' ]* frame2['pop'] 
print(frame2)
       year   state  pop  debt
one    2000    Ohio  1.5  16.5
two    2001    Ohio  1.7  16.5
three  2002    Ohio  3.6  16.5
four   2001  Nevada  2.4  16.5
five   2002  Nevada  2.9  16.5
       year   state  pop  debt    new
one    2000    Ohio  1.5  16.5  24.75
two    2001    Ohio  1.7  16.5  28.05
three  2002    Ohio  3.6  16.5  59.40
four   2001  Nevada  2.4  16.5  39.60
five   2002  Nevada  2.9  16.5  47.85

  

frame2['debt'] = np.arange(5.)
print(frame2)
       year   state  pop  debt    new
one    2000    Ohio  1.5   0.0  24.75
two    2001    Ohio  1.7   1.0  28.05
three  2002    Ohio  3.6   2.0  59.40
four   2001  Nevada  2.4   3.0  39.60
five   2002  Nevada  2.9   4.0  47.85

# 对DataFrame进行索引取值

>>> print(frame.iloc[[i for i in range(1,3)]])
  state  year  pop
1  Ohio  2001  1.7
2  Ohio  2002  3.6
>>> print(frame.loc[[i for i in range(1,3)]])
  state  year  pop
1  Ohio  2001  1.7
2  Ohio  2002  3.6

# 获取DataFrame的索引值 

>>> print(frame.index)
RangeIndex(start=0, stop=5, step=1)
>>> print(frame.index.tolist())
[0, 1, 2, 3, 4]

# 获取DataFrame的columns

>>> print(frame.columns)
Index(['state', 'year', 'pop'], dtype='object')
>>> print(frame.columns.tolist())
['state', 'year', 'pop']

  

 

 

posted @ 2021-03-18 11:24  华小电  阅读(143)  评论(0编辑  收藏  举报