[909] Remove duplicated rows based on multiple columns in Pandas
In a Pandas DataFrame, you can remove duplicated rows based on multiple columns using the drop_duplicates()
method. Here's how you can do it:
import pandas as pd # Sample DataFrame data = { 'A': [1, 2, 3, 2, 1], 'B': ['apple', 'banana', 'cherry', 'banana', 'apple'], 'C': [10, 20, 30, 20, 10] } df = pd.DataFrame(data) # Remove duplicates based on columns A and B df = df.drop_duplicates(subset=['A', 'B']) # Display the resulting DataFrame print(df)
In this example, we have a DataFrame with three columns, and we want to remove duplicates based on columns 'A' and 'B'. The subset
parameter is set to a list of column names ('A' and 'B') to specify which columns should be considered when checking for duplicates. The resulting DataFrame will have duplicate rows removed based on the specified columns.
You can also use the keep
parameter to control which duplicate values to keep. By default, it's set to 'first', which keeps the first occurrence and removes subsequent duplicates. You can set it to 'last' to keep the last occurrence and remove earlier duplicates or 'False' to remove all duplicates. For example:
# Remove duplicates based on columns A and B, keeping the last occurrence df = df.drop_duplicates(subset=['A', 'B'], keep='last')
This code will keep the last occurrence of a duplicated row based on columns 'A' and 'B'. Adjust the subset
and keep
parameters according to your specific requirements.
【推荐】国内首个AI IDE,深度理解中文开发场景,立即下载体验Trae
【推荐】编程新体验,更懂你的AI,立即体验豆包MarsCode编程助手
【推荐】抖音旗下AI助手豆包,你的智能百科全书,全免费不限次数
【推荐】轻量又高性能的 SSH 工具 IShell:AI 加持,快人一步
· DeepSeek 开源周回顾「GitHub 热点速览」
· 记一次.NET内存居高不下排查解决与启示
· 物流快递公司核心技术能力-地址解析分单基础技术分享
· .NET 10首个预览版发布:重大改进与新特性概览!
· .NET10 - 预览版1新功能体验(一)
2022-10-17 【753】Transformer模型
2020-10-17 【492】状态转移:初识马尔科夫链
2019-10-17 【443】Tweets Analysis Q&A
2016-10-17 【229】Raster Calculator - 栅格计算器