[909] Remove duplicated rows based on multiple columns in Pandas

In a Pandas DataFrame, you can remove duplicated rows based on multiple columns using the drop_duplicates() method. Here's how you can do it:

 import pandas as pd
 
# Sample DataFrame
data = {
    'A': [1, 2, 3, 2, 1],
    'B': ['apple', 'banana', 'cherry', 'banana', 'apple'],
    'C': [10, 20, 30, 20, 10]
}
 
df = pd.DataFrame(data)
 
# Remove duplicates based on columns A and B
df = df.drop_duplicates(subset=['A', 'B'])
 
# Display the resulting DataFrame
print(df)

In this example, we have a DataFrame with three columns, and we want to remove duplicates based on columns 'A' and 'B'. The subset parameter is set to a list of column names ('A' and 'B') to specify which columns should be considered when checking for duplicates. The resulting DataFrame will have duplicate rows removed based on the specified columns.

You can also use the keep parameter to control which duplicate values to keep. By default, it's set to 'first', which keeps the first occurrence and removes subsequent duplicates. You can set it to 'last' to keep the last occurrence and remove earlier duplicates or 'False' to remove all duplicates. For example:

 # Remove duplicates based on columns A and B, keeping the last occurrence
df = df.drop_duplicates(subset=['A', 'B'], keep='last')

This code will keep the last occurrence of a duplicated row based on columns 'A' and 'B'. Adjust the subset and keep parameters according to your specific requirements.

posted on 2023-10-17 13:11 McDelfino 阅读(15) 评论(0) 编辑收藏举报

刷新页面返回顶部

登录后才能查看或发表评论，立即登录或者逛逛博客园首页

相关博文：

· [1068] Find records with duplicate values in the specific column

· [1080] Remove duplicated records based on a specific column in GeoPandas

· pandas学习-函数drop_duplicates的用法

· pandas去除重复的列

· pandas判断和删除重复——duplicated()、drop_duplicates()

阅读排行：
· DeepSeek 开源周回顾「GitHub 热点速览」
· 记一次.NET内存居高不下排查解决与启示
· 物流快递公司核心技术能力-地址解析分单基础技术分享
· .NET 10首个预览版发布：重大改进与新特性概览！
· .NET10 - 预览版1新功能体验（一）

历史上的今天：
2022-10-17 【753】Transformer模型
2020-10-17 【492】状态转移：初识马尔科夫链
2019-10-17 【443】Tweets Analysis Q&A
2016-10-17 【229】Raster Calculator - 栅格计算器

alex_bn_lee

导航

公告

统计

搜索

常用链接

最新随笔

我的标签

积分与排名

随笔分类 (1762)

随笔档案 (1207)

相册 (9)

阅读排行榜

评论排行榜

推荐排行榜

最新评论

[909] Remove duplicated rows based on multiple columns in Pandas

	import pandas as pd

	# Sample DataFrame
	data = {
	'A': [1, 2, 3, 2, 1],
	'B': ['apple', 'banana', 'cherry', 'banana', 'apple'],
	'C': [10, 20, 30, 20, 10]
	}

	df = pd.DataFrame(data)

	# Remove duplicates based on columns A and B
	df = df.drop_duplicates(subset=['A', 'B'])

	# Display the resulting DataFrame
	print(df)