pandas-遍历和迭代

pandas-遍历和迭代

遍历数据是最常见的一种方式,pandas同样也可以遍历。

iterrows() 或 itertuples()：这两个方法用于遍历 DataFrame 的行。

iterrows() 返回一个迭代器，产生索引和行的元组，而 itertuples() 返回一个迭代器，产生包含每行数据的命名元组。

iterrows()

iterrows() 

输出：index：label或label元组行的索引。对于一个 MultiIndex 	则需要一个元组。
     data：Series,行的数据作为Series。
     
interrows为每一行返回一个 Series，所以它不会跨行保留数据类型,因为返回的是Series 执行效率一般

import pandas as pd
import numpy as np
df = pd.DataFrame(np.arange(20).reshape(4,5),
                  columns=list("abcde"))
print (df)

#    a   b   c   d   e
#0   0   1   2   3   4
#1   5   6   7   8   9
#2  10  11  12  13  14
#3  15  16  17  18  1


for index, row in df.iterrows():
    print(row)
# a    0
# b    1
# c    2
# d    3
# e    4
# Name: 0, dtype: int32
# 0 2
# a    5
# b    6
# c    7
# d    8
# e    9
# Name: 1, dtype: int32
# 5 7
# a    10
# b    11
# c    12
# d    13
# e    14
# Name: 2, dtype: int32
# 10 12
# a    15
# b    16
# c    17
# d    18
# e    19

for index, row in df.iterrows():
    print(row["a"])
#0
#5
#10
#15

for index, row in df.iterrows():
    print(row[["c","d"]],row.e)
#c    2
#d    3
#Name: 0, dtype: int32 4
#c    7
#d    8
#Name: 1, dtype: int32 9
#c    12
#d    13
#Name: 2, dtype: int32 14
#c    17
#d    18
#Name: 3, dtype: int32 19 
 
for index, row in df.iterrows():
    print(row.e)   
#4
#9
#14
#19

itertuples()

DataFrame.itertuples(self, index=True, name='Pandas')  
按行遍历，将DataFrame的每一行迭代为元组。  速度快

import pandas as pd
import numpy as np
df = pd.DataFrame({"code":["python","c+++c#","java++","golang"],
                    "year":[1997,1978,1994,2008],
                    "home":["micsoft","zijiejt","meituan","tengxun"]
})
for row in df.itertuples():
    print(row)
    
Pandas(Index=0, code='python', year=1997, home='micsoft')
Pandas(Index=1, code='c+++c#', year=1978, home='zijiejt')
Pandas(Index=2, code='java++', year=1994, home='meituan')
Pandas(Index=3, code='golang', year=2008, home='tengxun')

for row in df.itertuples():
    print(type(row))
    print(row.code)
    
# <class 'pandas.core.frame.Pandas'>
# python
# <class 'pandas.core.frame.Pandas'>
# c+++c#
# <class 'pandas.core.frame.Pandas'>
# java++
# <class 'pandas.core.frame.Pandas'>
# golang

借助zip()

for tup in zip(df['a'], df['b']):
    print(tup,type(tup[1:]))

import pandas as pd
import numpy as np
df = pd.DataFrame({"code":["python","c+++c#","java++","golang"],
                    "year":[1997,1978,1994,2008],
                    "home":["micsoft","zijiejt","meituan","tengxun"]
})

for tup in zip(df.index, df.code, df.year,df.home):
    print(tup)
    
#(0, 'python', 1997, 'micsoft')
#(1, 'c+++c#', 1978, 'zijiejt')
#(2, 'java++', 1994, 'meituan')
#(3, 'golang', 2008, 'tengxun')

for i in df

# for i in df:  # 并不是遍历行的方式 
import pandas as pd
import numpy as np
df = pd.DataFrame(np.arange(20).reshape(4,5),
                  columns=list("abcde"))
print (df)
for i in df:
    print(i)
    
#a
#b
#c
#d
#e

备注

iterrows() 和 itertuples() 通常用于在循环中逐行处理 DataFrame 数据。

但是，当处理大型 DataFrame 时，应避免使用它们，因为它们可能会消耗大量内存。

apply()：apply() 方法允许你使用自定义函数来应用任何操作到 DataFrame 或 Series 的行或列。

可以在 DataFrame 或 Series 上使用它，以便应用自定义函数到每个元素或列。

import pandas as pd  
  
# 创建一个简单的 DataFrame  
df = pd.DataFrame({'A': [1, 2, 3], 
                   'B': [4, 5, 6], 
                   'C': [7, 8, 9]})  
  
print(df)
# 自定义一个函数，用于打印每行数据  
def print_row(row):  
    row=row.tolist()
    print(row)  
  
# 使用 apply() 方法结合自定义函数来遍历 DataFrame 的每行数据  
df.apply(print_row, axis=1)

#    A  B  C
# 0  1  4  7
# 1  2  5  8
# 2  3  6  9
# [1, 4, 7]
# [2, 5, 8]
# [3, 6, 9]