pandas.DataFrame—构建二维、尺寸可变的表格数据结构
参考:https://pandas.pydata.org/pandas-docs/stable/reference/api/pandas.DataFrame.html
语法格式
class pandas.DataFrame(data=None, index=None, columns=None, dtype=None, copy=None)
常用的几个参数解释:
- data: 一系列数据,包括多种类型;
- index: 索引值,行标签,默认值为RangeIndex(0, 1, 2, …, n);
- columns: 列标签,默认值为RangeIndex(0, 1, 2, …, n);
- dtype: 设置数据类型;
- copy: 布尔值或None,表示是否拷贝数据。
代码示例
import pandas as pd
import numpy as np
#利用列表创建DataFrame
d1 = [[3,"negative",2],[4,"negative",6],[11,"positive",0],[12,"positive",2]]
df1 = pd.DataFrame(d1, columns=["xuhao","result","value"])
print(df1)
# xuhao result value
# 0 3 negative 2
# 1 4 negative 6
# 2 11 positive 0
# 3 12 positive 2
print(df1.dtypes)
# xuhao int64
# result object
# value int64
# dtype: object
#利用字典创建DataFrame
d2 = {'xuhao': [3,4,11,12], 'result': ["negative","negative","positive",
"positive"],"value":[2,6,0,2]}
df2 = pd.DataFrame(d2, dtype=np.int8)
print (df2)
# xuhao result value
# 0 3 negative 2
# 1 4 negative 6
# 2 11 positive 0
# 3 12 positive 2
print(df2.dtypes)
# xuhao int8
# result object
# value int8
# dtype: object
#利用包含Series的字典创建DataFrame
d3 = {'xuhao': [3,4,11,12], 'result': ["negative","negative","positive",
"positive"],"value": pd.Series([2,3], index=[2,3])}
df3 = pd.DataFrame(d3,index=[0, 1, 2, 3])
print (df3)
# xuhao result value
# 0 3 negative NaN
# 1 4 negative NaN
# 2 11 positive 2.0
# 3 12 positive 3.0
#利用numpy ndarray创建DataFrame
df4 = pd.DataFrame(np.array([[3,"negative",2],[4,"negative",6],[11,"positive",0],\
[12,"positive",2]]), columns=["xuhao","result","value"])
print(df4)
# xuhao result value
# 0 3 negative 2
# 1 4 negative 6
# 2 11 positive 0
# 3 12 positive 2
#利用包含标签列的numpy ndarray创建DataFrame
d5 = np.array([(1,3,2),(2,4,6),(3,1,0),(4,3,2)],
dtype=[("xuhao", "i4"), ("result", "i4"), ("value", "i4")])
df5 = pd.DataFrame(d5)
df6 = pd.DataFrame(d5, columns=["result","value"])
print(df5)
# xuhao result value
# 0 1 3 2
# 1 2 4 6
# 2 3 1 0
# 3 4 3 2
print(df6)
# result value
# 0 3 2
# 1 4 6
# 2 1 0
# 3 3 2
#利用dataclass创建DataFrame
from dataclasses import make_dataclass
mydata = make_dataclass("mydata", [("result", str), ("value", int)])
df7 = pd.DataFrame([mydata("positive", 0), mydata("negative", 3), mydata("positive", 3)])
print(df7)
# result value
# 0 positive 0
# 1 negative 3
# 2 positive 3
这里有一点需要注意的是,当利用列表或numpy array创建DataFrame时,在列表或numpy array外添加[]时可改变DataFrame维度。
import pandas as pd
import numpy as np
df1 = pd.DataFrame(np.array([3,"negative",2])) #生成3x1维DataFrame
print(df1)
# 0
# 0 3
# 1 negative
# 2 2
df2
df2 = pd.DataFrame([np.array([3,"negative",2])]) #生成1x3维DataFrame
print(df2)
# 0 1 2
# 0 3 negative 2
报错“ValueError: Shape of passed values is (6, 1), indices imply (6, 6)”时,通常通过添加[]即可解决。