[1080] Remove duplicated records based on a specific column in GeoPandas
To remove duplicated records based on a specific column in GeoPandas, you can use the drop_duplicates
method. Here's how you can do it:
Example Script
import geopandas as gpd from shapely.geometry import Point # Sample GeoDataFrame data = { 'ID': [1, 2, 2, 3, 4, 4, 4], 'Name': ['Alice', 'Bob', 'Charlie', 'David', 'Eve', 'Frank', 'Grace'], 'geometry': [Point(1, 2), Point(2, 3), Point(3, 4), Point(4, 5), Point(5, 6), Point(6, 7), Point(7, 8)] } gdf = gpd.GeoDataFrame(data, crs="EPSG:4326") # Remove duplicated records based on the 'ID' column gdf_cleaned = gdf.drop_duplicates(subset='ID', keep='first') print(gdf_cleaned)
Explanation:
- Import Libraries: Import
geopandas
andPoint
fromshapely.geometry
. - Create a Sample GeoDataFrame: Define a GeoDataFrame with a column (
ID
) that contains duplicate values. - Drop Duplicates: Use the
drop_duplicates
method with thesubset
parameter set to the column of interest (ID
in this case). Thekeep='first'
parameter ensures that only the first occurrence of each duplicate is retained. You can also usekeep='last'
to keep the last occurrence orkeep=False
to drop all duplicates.
Result:
This script will return a GeoDataFrame with duplicates removed based on the specified column.
Example Output:
ID Name geometry 0 1 Alice POINT (1.00000 2.00000) 1 2 Bob POINT (2.00000 3.00000) 3 3 David POINT (4.00000 5.00000) 4 4 Eve POINT (5.00000 6.00000)
In this example, only the first occurrence of each ID
is kept, and all subsequent duplicates are removed.
Feel free to try this out, and let me know if you need any further assistance or have any other questions!
分类:
Python Study
【推荐】国内首个AI IDE,深度理解中文开发场景,立即下载体验Trae
【推荐】编程新体验,更懂你的AI,立即体验豆包MarsCode编程助手
【推荐】抖音旗下AI助手豆包,你的智能百科全书,全免费不限次数
【推荐】轻量又高性能的 SSH 工具 IShell:AI 加持,快人一步
· DeepSeek 开源周回顾「GitHub 热点速览」
· 记一次.NET内存居高不下排查解决与启示
· 物流快递公司核心技术能力-地址解析分单基础技术分享
· .NET 10首个预览版发布:重大改进与新特性概览!
· .NET10 - 预览版1新功能体验(一)
2014-12-04 【156】我的工资条
2011-12-04 【006】◀▶ C#学习(五) - Winform<1>