【843】dataframe相关操作 (match merge)

Reference:

29-April-2024

use to_string() to print the entire DataFrame.
Check the number of maximum returned rows: print(pd.options.display.max_rows)ref: https://www.w3schools.com/python/pandas/pandas_csv.asp
Increase the maximum number of rows to display the entire DataFrame: pd.options.display.max_rows = 9999ref: https://www.w3schools.com/python/pandas/pandas_csv.asp
The DataFrames object has a method called info(), that gives you more information about the data set.
Remove all rows with NULL values: dropna()
The fillna() method allows us to replace empty cells with a value
Pandas uses the mean() median() and mode() methods to calculate the respective values for a specified column
- Mean = the average value (the sum of all values divided by number of values).
- Median = the value in the middle, after you have sorted all values ascending.
- Mode = the value that appears most frequently.
Pandas has a to_datetime() method to convert all cells in the 'Date' column into dates: df['Date'] = pd.to_datetime(df['Date'])
Remove rows with a NULL value in the "Date" column: df.dropna(subset=['Date'], inplace = True)

pandas.DataFrame.merge (match columns)

DataFrame.merge(right, how='inner', on=None, left_on=None, right_on=None, left_index=False, right_index=False, sort=False, suffixes=('_x', '_y'), copy=None, indicator=False, validate=None)[source]
Merge DataFrame or named Series objects with a database-style join.

Parameters:

how{‘left’, ‘right’, ‘outer’, ‘inner’, ‘cross’}, default ‘inner’

Type of merge to be performed.

left: use only keys from left frame, similar to a SQL left outer join; preserve key order. (Keep all left dataframe's records, and no matched cells will be set with "NaN")

right: use only keys from right frame, similar to a SQL right outer join; preserve key order. (Keep all right dataframe's records, and no matched cells will be set with "NaN")

outer: use union of keys from both frames, similar to a SQL full outer join; sort keys lexicographically. (Keep both of the records, and no matched cells will be set with "NaN")

inner: use intersection of keys from both frames, similar to a SQL inner join; preserve the order of the left keys. (Only keep the matched records)

cross: creates the cartesian product from both frames, preserves the order of the left keys.

import pandas as pd # Create two sample DataFrames df1 = pd.DataFrame({'key': ['A', 'B', 'C', 'D', 'E', 'F'], 'value1': [1, 2, 3, 4, 5, 6]}) df2 = pd.DataFrame({'key': ['A', 'B', 'C', 'G'], 'value2': [5, 6, 7, 8]}) print("Left DataFrame:") print(df1) print() print("Right DataFrame:") print(df2) print() # Merge the DataFrames based on the 'key' column merged_df1 = pd.merge(df1, df2, on='key', how='left') # Display the merged DataFrame print("how='left': Keep all left dataframe's records, and no matched cells will be set with \"NaN\"") print(merged_df1) print() # Merge the DataFrames based on the 'key' column merged_df2 = pd.merge(df1, df2, on='key', how='right') # Display the merged DataFrame print("how='right': Keep all right dataframe's records, and no matched cells will be set with \"NaN\"") print(merged_df2) print() # Merge the DataFrames based on the 'key' column merged_df3 = pd.merge(df1, df2, on='key', how='outer') # Display the merged DataFrame print("how='outer': Keep both of the records, and no matched cells will be set with \"NaN\"") print(merged_df3) print() # Merge the DataFrames based on the 'key' column merged_df4 = pd.merge(df1, df2, on='key', how='inner') # Display the merged DataFrame print("how='inner': Only keep the matched records") print(merged_df4)

Output:

Left DataFrame: key value1 0 A 1 1 B 2 2 C 3 3 D 4 4 E 5 5 F 6 Right DataFrame: key value2 0 A 5 1 B 6 2 C 7 3 G 8 how='left': Keep all left dataframe's records, and no matched cells will be set with "NaN" key value1 value2 0 A 1 5.0 1 B 2 6.0 2 C 3 7.0 3 D 4 NaN 4 E 5 NaN 5 F 6 NaN how='right': Keep all right dataframe's records, and no matched cells will be set with "NaN" key value1 value2 0 A 1.0 5 1 B 2.0 6 2 C 3.0 7 3 G NaN 8 how='outer': Keep both of the records, and no matched cells will be set with "NaN" key value1 value2 0 A 1.0 5.0 1 B 2.0 6.0 2 C 3.0 7.0 3 D 4.0 NaN 4 E 5.0 NaN 5 F 6.0 NaN 6 G NaN 8.0 how='inner': Only keep the matched records key value1 value2 0 A 1 5 1 B 2 6 2 C 3 7

With two columns:

# Create two sample DataFrames df1 = pd.DataFrame({'key': ['A', 'B', 'C', 'D', 'E', 'F'], 'value': [1, 2, 3, 4, 5, 6]}) df2 = pd.DataFrame({'key': ['A', 'B', 'C', 'G'], 'value': [1, 6, 3, 8]}) print("Left DataFrame:") print(df1) print() print("Right DataFrame:") print(df2) print() # Merge the DataFrames based on the 'key' column merged_df4 = pd.merge(df1, df2, on=['key', 'value'], how='inner') # Display the merged DataFrame print("how='inner': Only keep the matched records for both columns of 'key' and 'value'") print(merged_df4)

Output:

Left DataFrame: key value 0 A 1 1 B 2 2 C 3 3 D 4 4 E 5 5 F 6 Right DataFrame: key value 0 A 1 1 B 6 2 C 3 3 G 8 how='inner': Only keep the matched records for both columns of 'key' and 'value' key value 0 A 1 1 C 3

pandas与SQL对比：Comparison with SQL

按照列排序：pandas.DataFrame.sort_values

创建dataframe：pandas读取字典（dict）数据

数据表join，生成新列：PANDAS 数据合并与重塑（join/merge篇）

pandas的merge方法提供了一种类似于SQL的内存链接操作，官网文档提到它的性能会比其他开源语言的数据操作（例如R）要高效。和SQL语句的对比可以看这里
merge的列需要有同样的名称
merge的参数
- on：列名，join用来对齐的那一列的名字，用到这个参数的时候一定要保证左表和右表用来对齐的那一列都有相同的列名。
- left_on：左表对齐的列，可以是列名，也可以是和dataframe同样长度的arrays。
- right_on：右表对齐的列，可以是列名，也可以是和dataframe同样长度的arrays。
- left_index/ right_index: 如果是True的haunted以index作为对齐的key。
- how：数据融合的方法。
- sort：根据dataframe合并的keys按字典顺序排序，默认是，如果置false可以提高表现。

Example:

 pd.merge(gdf_africa, df_af_mp, how='left', on='country')

posted on 2023-06-15 18:31 McDelfino 阅读(18) 评论(0) 编辑收藏举报

刷新页面返回顶部

登录后才能查看或发表评论，立即登录或者逛逛博客园首页

相关博文：

· [966] Merge two DataFrames horizontally

· 【810】pandas的DataFrame的append方法详细介绍

· Pandas

· 【Python实战】Pandas：让你像写SQL一样做数据分析（一）

· Pandas的使用

阅读排行：
· DeepSeek 开源周回顾「GitHub 热点速览」
· 记一次.NET内存居高不下排查解决与启示
· 物流快递公司核心技术能力-地址解析分单基础技术分享
· .NET 10首个预览版发布：重大改进与新特性概览！
· .NET10 - 预览版1新功能体验（一）

历史上的今天：
2020-06-15 【469】RNN, LSTM参考博客
2018-06-15 【318】C# 学习笔记
2016-06-15 【206】Firefox 扩展收集

alex_bn_lee

导航

公告

统计

搜索

常用链接

最新随笔

我的标签

积分与排名

随笔分类 (1762)

随笔档案 (1207)

相册 (9)

阅读排行榜

评论排行榜

推荐排行榜

最新评论

【843】dataframe相关操作 (match merge)

pandas.DataFrame.merge (match columns)

	import pandas as pd

	# Create two sample DataFrames
	df1 = pd.DataFrame({'key': ['A', 'B', 'C', 'D', 'E', 'F'],
	'value1': [1, 2, 3, 4, 5, 6]})
	df2 = pd.DataFrame({'key': ['A', 'B', 'C', 'G'],
	'value2': [5, 6, 7, 8]})

	print("Left DataFrame:")
	print(df1)
	print()
	print("Right DataFrame:")
	print(df2)
	print()

	# Merge the DataFrames based on the 'key' column
	merged_df1 = pd.merge(df1, df2, on='key', how='left')

	# Display the merged DataFrame
	print("how='left': Keep all left dataframe's records, and no matched cells will be set with \"NaN\"")
	print(merged_df1)
	print()

	# Merge the DataFrames based on the 'key' column
	merged_df2 = pd.merge(df1, df2, on='key', how='right')

	# Display the merged DataFrame
	print("how='right': Keep all right dataframe's records, and no matched cells will be set with \"NaN\"")
	print(merged_df2)
	print()

	# Merge the DataFrames based on the 'key' column
	merged_df3 = pd.merge(df1, df2, on='key', how='outer')

	# Display the merged DataFrame
	print("how='outer': Keep both of the records, and no matched cells will be set with \"NaN\"")
	print(merged_df3)
	print()

	# Merge the DataFrames based on the 'key' column
	merged_df4 = pd.merge(df1, df2, on='key', how='inner')

	# Display the merged DataFrame
	print("how='inner': Only keep the matched records")
	print(merged_df4)

	Left DataFrame:
	key value1
	0 A 1
	1 B 2
	2 C 3
	3 D 4
	4 E 5
	5 F 6

	Right DataFrame:
	key value2
	0 A 5
	1 B 6
	2 C 7
	3 G 8

	how='left': Keep all left dataframe's records, and no matched cells will be set with "NaN"
	key value1 value2
	0 A 1 5.0
	1 B 2 6.0
	2 C 3 7.0
	3 D 4 NaN
	4 E 5 NaN
	5 F 6 NaN

	how='right': Keep all right dataframe's records, and no matched cells will be set with "NaN"
	key value1 value2
	0 A 1.0 5
	1 B 2.0 6
	2 C 3.0 7
	3 G NaN 8

	how='outer': Keep both of the records, and no matched cells will be set with "NaN"
	key value1 value2
	0 A 1.0 5.0
	1 B 2.0 6.0
	2 C 3.0 7.0
	3 D 4.0 NaN
	4 E 5.0 NaN
	5 F 6.0 NaN
	6 G NaN 8.0

	how='inner': Only keep the matched records
	key value1 value2
	0 A 1 5
	1 B 2 6
	2 C 3 7

	# Create two sample DataFrames
	df1 = pd.DataFrame({'key': ['A', 'B', 'C', 'D', 'E', 'F'],
	'value': [1, 2, 3, 4, 5, 6]})
	df2 = pd.DataFrame({'key': ['A', 'B', 'C', 'G'],
	'value': [1, 6, 3, 8]})

	print("Left DataFrame:")
	print(df1)
	print()
	print("Right DataFrame:")
	print(df2)
	print()

	# Merge the DataFrames based on the 'key' column
	merged_df4 = pd.merge(df1, df2, on=['key', 'value'], how='inner')

	# Display the merged DataFrame
	print("how='inner': Only keep the matched records for both columns of 'key' and 'value'")
	print(merged_df4)

	Left DataFrame:
	key value
	0 A 1
	1 B 2
	2 C 3
	3 D 4
	4 E 5
	5 F 6

	Right DataFrame:
	key value
	0 A 1
	1 B 6
	2 C 3
	3 G 8

	how='inner': Only keep the matched records for both columns of 'key' and 'value'
	key value
	0 A 1
	1 C 3