Pandas

DataFrame对象

1、对DataFrame遍历


# 1、iterrows()方法
# 逐行迭代，将DataFrame的每一行迭代成(index, Series)对，可以通过row[name]访问。

for index, row in df.iterrows():
    print row["c1"], row["c2"]


# 2、itertuples()方法
# 逐行迭代，将DataFrame的每一行作为一个元组进行迭代，可以通过row[name]访问元素，比iterrows()效率更高。

for row in df.itertuples(index=True, name='Pandas'):
    print getattr(row, "c1"), getattr(row, "c2")


# 3、iteritems()方法
# 按列遍历，将DataFrame的每一列迭代成（列名，系列）对，可以通过row[index]访问。

for date, row in df.iteritems():
    print(date)
for date, row in df.iteritems():
    print(row)
for date, row in df.iteritems():
    print(row[0], row[1], row[2])

2、过滤数据

2.1、DataFrame.filter(items=None, like=None, regex=None, axis=None)

参数：

items ：对指定的行或列进行筛选。

like ：字符串保持标签远离类似于label == True的axis。

regex ：字符串（正则表达式）保持标签远离re.search（regex，label）== True的axis。

axis ：整数或字符串axis名称要过滤的轴。默认情况下，这是信息轴，‘index’用于Series，‘columns’用于DataFrame。

返回值：

与输入对象相同的类型

Notes

items，like和regex参数执行相互排斥。

axis默认为使用[]建立索引时使用的信息轴。

例子

>>> df = pd.DataFrame(np.array(([1, 2, 3], [4, 5, 6])),
...                   index=['mouse', 'rabbit'],
...                   columns=['one', 'two', 'three'])

>>> # select columns by name
>>> df.filter(items=['one', 'three'])
         one  three
mouse     1      3
rabbit    4      6

>>> # select columns by regular expression
>>> df.filter(regex='e$', axis=1)
         one  three
mouse     1      3
rabbit    4      6

>>> # select rows containing 'bbi'
>>> df.filter(like='bbi', axis=0)
         one  two  three
rabbit    4    5      6

2.2、直接值过滤

dataframe = dataframe[dataframe["列名"] > 20]

名称	属性&方法描述
T	行和列转置。
axes	返回一个仅以行轴标签和列轴标签为成员的列表。
dtypes	返回每列数据的数据类型。
empty	DataFrame中没有数据或者任意坐标轴的长度为0，则返回True。
ndim	轴的数量，也指数组的维数。
shape	返回一个元组，表示了 DataFrame 维度。
size	DataFrame中的元素数量。
values	使用 numpy 数组表示 DataFrame 中的元素值。
head()	返回前 n 行数据。
tail()	返回后 n 行数据。
shift()	将行或列移动指定的步幅长度

DataFrame转换为字典

DataFrame.to_dict() 函数介绍
pandas中经常用的是 DataFrame.to_dict() 函数将dataFrame转化为字典类型（字典的查询速度很快）

函数DataFrame.to_dict(orient=‘dict’, into=<class ‘dict’>)

orient =‘dict’，是函数默认的，转化后的字典形式：{column(列名) : {index(行名) : value(值)}}；
orient =‘list’ ，转化后的字典形式：{column(列名) :{[values](值)}};
orient =‘series’ ，转化后的字典形式：{column(列名) : Series (values) (值)};
orient =‘split’ ，转化后的字典形式：{‘index’ : [index]，‘columns’ :[columns]，’data‘ : [values]};
orient =‘records’ ，转化后是 list形式：[{column(列名) :value(值)}…{column:value}];
orient =‘index’ ，转化后的字典形式：{index(值) :{column(列名) : value(值)}};
dataFrame.to_dict() 结果默认 index 是 key ,其他字段是和 index 对应的 value

细节可参考：

https://blog.csdn.net/m0_43609475/article/details/125328938

https://pandas.pydata.org/pandas-docs/stable/reference/api/pandas.DataFrame.to_dict.html

应用场景：根据一列去一个字典中匹配对应值

import pandas as pd

dict_map = {'a': 1, 'b': 2, 'c': 3}

df = pd.DataFrame({'col1': ['a', 'b', 'a', 'c'], 'col2': [1, 2, 3, 4]})

df['new_col'] = df['col1'].map(dict_map)

借鉴：

http://c.biancheng.net/pandas/dataframe.html

posted @ 2023-01-17 15:30 钟鼎山林阅读(208) 评论(0) 编辑收藏举报

会员力量，点亮园子希望

刷新页面返回顶部