DataFrame.iterrows的一种用法
import pandas as pd
import numpy as np
help(pd.DataFrame.iterrows)
Help on function iterrows in module pandas.core.frame:
iterrows(self)
Iterate over DataFrame rows as (index, Series) pairs.
Notes
-----
1. Because ``iterrows`` returns a Series for each row,
it does **not** preserve dtypes across the rows (dtypes are
preserved across columns for DataFrames). For example,
>>> df = pd.DataFrame([[1, 1.5]], columns=['int', 'float'])
>>> row = next(df.iterrows())[1]
>>> row
int 1.0
float 1.5
Name: 0, dtype: float64
>>> print(row['int'].dtype)
float64
>>> print(df['int'].dtype)
int64
To preserve dtypes while iterating over the rows, it is better
to use :meth:`itertuples` which returns namedtuples of the values
and which is generally faster than ``iterrows``.
2. You should **never modify** something you are iterating over.
This is not guaranteed to work in all cases. Depending on the
data types, the iterator returns a copy and not a view, and writing
to it will have no effect.
Returns
-------
it : generator
A generator that iterates over the rows of the frame.
See also
--------
itertuples : Iterate over DataFrame rows as namedtuples of the values.
iteritems : Iterate over (column name, Series) pairs.
运用iterrows()
返回的index和row,其中index是行索引,row是包含改行信息的Series的迭代器。运用这个方法,可以一行一行的增加特殊要求的列(前提是首先初始化该特殊要求的列)
xx=np.random.randint(9,size=(6,3))
tests=pd.DataFrame(xx,columns=['one','two','three']);tests
one | two | three | |
---|---|---|---|
0 | 3 | 0 | 4 |
1 | 1 | 0 | 3 |
2 | 1 | 4 | 4 |
3 | 7 | 3 | 2 |
4 | 7 | 5 | 0 |
5 | 5 | 8 | 8 |
现在我们想加上一列,这一列的要求如下:如果同行的'one'+'two'+'three'是奇数,则写上奇数,如果是偶数,则写上偶数。
tests['special']='ini'
for index,row in tests.iterrows():
num=(row.values[:-1]).sum()
if num%2 :
row['special']='奇数'
else:
row['special']='偶数'
tests.loc[index]=row #将Series 迭代器row赋值给tests的index行
tests
one | two | three | special | |
---|---|---|---|---|
0 | 3 | 0 | 4 | 奇数 |
1 | 1 | 0 | 3 | 偶数 |
2 | 1 | 4 | 4 | 奇数 |
3 | 7 | 3 | 2 | 偶数 |
4 | 7 | 5 | 0 | 偶数 |
5 | 5 | 8 | 8 | 奇数 |
#####
愿你一寸一寸地攻城略地,一点一点地焕然一新
#####