pandas.DataFrame—构建二维、尺寸可变的表格数据结构

参考:https://pandas.pydata.org/pandas-docs/stable/reference/api/pandas.DataFrame.html

语法格式

class pandas.DataFrame(data=Noneindex=Nonecolumns=Nonedtype=Nonecopy=None)

常用的几个参数解释:

  • data: 一系列数据,包括多种类型;
  • index: 索引值,行标签,默认值为RangeIndex(0, 1, 2, …, n);
  • columns: 列标签,默认值为RangeIndex(0, 1, 2, …, n);
  • dtype: 设置数据类型;
  • copy: 布尔值或None,表示是否拷贝数据。

代码示例

import pandas as pd
import numpy as np

#利用列表创建DataFrame
d1 = [[3,"negative",2],[4,"negative",6],[11,"positive",0],[12,"positive",2]]
df1 = pd.DataFrame(d1, columns=["xuhao","result","value"])
print(df1)
# xuhao    result  value
# 0      3  negative      2
# 1      4  negative      6
# 2     11  positive      0
# 3     12  positive      2
print(df1.dtypes)
# xuhao      int64
# result    object
# value      int64
# dtype: object

#利用字典创建DataFrame
d2  = {'xuhao': [3,4,11,12], 'result': ["negative","negative","positive",
    "positive"],"value":[2,6,0,2]}
df2 = pd.DataFrame(d2, dtype=np.int8)
print (df2)
#    xuhao    result  value
# 0      3  negative      2
# 1      4  negative      6
# 2     11  positive      0
# 3     12  positive      2
print(df2.dtypes)
# xuhao       int8
# result    object
# value       int8
# dtype: object

#利用包含Series的字典创建DataFrame
d3  = {'xuhao': [3,4,11,12], 'result': ["negative","negative","positive",
"positive"],"value": pd.Series([2,3], index=[2,3])}
df3 = pd.DataFrame(d3,index=[0, 1, 2, 3])
print (df3)
#    xuhao    result  value
# 0      3  negative    NaN
# 1      4  negative    NaN
# 2     11  positive    2.0
# 3     12  positive    3.0

#利用numpy ndarray创建DataFrame
df4 = pd.DataFrame(np.array([[3,"negative",2],[4,"negative",6],[11,"positive",0],\
    [12,"positive",2]]), columns=["xuhao","result","value"])
print(df4)
#   xuhao    result value
# 0     3  negative     2
# 1     4  negative     6
# 2    11  positive     0
# 3    12  positive     2

#利用包含标签列的numpy ndarray创建DataFrame
d5 = np.array([(1,3,2),(2,4,6),(3,1,0),(4,3,2)],
    dtype=[("xuhao", "i4"), ("result", "i4"), ("value", "i4")])
df5 = pd.DataFrame(d5)
df6 = pd.DataFrame(d5, columns=["result","value"])
print(df5)
#    xuhao  result  value
# 0      1       3      2
# 1      2       4      6
# 2      3       1      0
# 3      4       3      2
print(df6)
#    result  value
# 0       3      2
# 1       4      6
# 2       1      0
# 3       3      2

#利用dataclass创建DataFrame
from dataclasses import make_dataclass
mydata = make_dataclass("mydata", [("result", str), ("value", int)])
df7 = pd.DataFrame([mydata("positive", 0), mydata("negative", 3), mydata("positive", 3)])
print(df7)
#      result  value
# 0  positive      0
# 1  negative      3
# 2  positive      3

这里有一点需要注意的是,当利用列表或numpy array创建DataFrame时,在列表或numpy array外添加[]时可改变DataFrame维度。

import pandas as pd
import numpy as np

df1 = pd.DataFrame(np.array([3,"negative",2])) #生成3x1维DataFrame
print(df1)
#           0
# 0         3
# 1  negative
# 2         2
df2
df2 = pd.DataFrame([np.array([3,"negative",2])]) #生成1x3维DataFrame
print(df2)
#    0         1  2
# 0  3  negative  2

报错“ValueError: Shape of passed values is (6, 1), indices imply (6, 6)”时,通常通过添加[]即可解决。

posted @ 2023-04-25 20:32  yayagogogo  阅读(45)  评论(0编辑  收藏  举报