[1080] Remove duplicated records based on a specific column in GeoPandas

To remove duplicated records based on a specific column in GeoPandas, you can use the drop_duplicates method. Here's how you can do it:

Example Script

 import geopandas as gpd
from shapely.geometry import Point
 
# Sample GeoDataFrame
data = {
    'ID': [1, 2, 2, 3, 4, 4, 4],
    'Name': ['Alice', 'Bob', 'Charlie', 'David', 'Eve', 'Frank', 'Grace'],
    'geometry': [Point(1, 2), Point(2, 3), Point(3, 4), Point(4, 5), Point(5, 6), Point(6, 7), Point(7, 8)]
}
gdf = gpd.GeoDataFrame(data, crs="EPSG:4326")
 
# Remove duplicated records based on the 'ID' column
gdf_cleaned = gdf.drop_duplicates(subset='ID', keep='first')
 
print(gdf_cleaned)

Explanation:

Import Libraries: Import geopandas and Point from shapely.geometry.
Create a Sample GeoDataFrame: Define a GeoDataFrame with a column (ID) that contains duplicate values.
Drop Duplicates: Use the drop_duplicates method with the subset parameter set to the column of interest (ID in this case). The keep='first' parameter ensures that only the first occurrence of each duplicate is retained. You can also use keep='last' to keep the last occurrence or keep=False to drop all duplicates.

Result:

This script will return a GeoDataFrame with duplicates removed based on the specified column.

Example Output:

    ID     Name                 geometry
0   1    Alice  POINT (1.00000 2.00000)
1   2      Bob  POINT (2.00000 3.00000)
3   3    David  POINT (4.00000 5.00000)
4   4      Eve  POINT (5.00000 6.00000)

In this example, only the first occurrence of each ID is kept, and all subsequent duplicates are removed.

Feel free to try this out, and let me know if you need any further assistance or have any other questions!

posted on 2024-12-04 14:10 McDelfino 阅读(10) 评论(0) 编辑收藏举报

刷新页面返回顶部

登录后才能查看或发表评论，立即登录或者逛逛博客园首页

相关博文：

· [1081] The syntax and usage for the drop_duplicates and duplicated functions in a GeoDataFrame in GeoPandas.

· [909] Remove duplicated rows based on multiple columns in Pandas

· pandas学习-函数drop_duplicates的用法

· pandas去除重复的列

· Mysql中使用SQL删除多字段相同的重复记录

阅读排行：
· DeepSeek 开源周回顾「GitHub 热点速览」
· 记一次.NET内存居高不下排查解决与启示
· 物流快递公司核心技术能力-地址解析分单基础技术分享
· .NET 10首个预览版发布：重大改进与新特性概览！
· .NET10 - 预览版1新功能体验（一）

历史上的今天：
2014-12-04 【156】我的工资条
2011-12-04 【006】◀▶ C#学习(五) - Winform<1>

alex_bn_lee

导航

公告

统计

搜索

常用链接

最新随笔

我的标签

积分与排名

随笔分类 (1762)

随笔档案 (1207)

相册 (9)

阅读排行榜

评论排行榜

推荐排行榜

最新评论

[1080] Remove duplicated records based on a specific column in GeoPandas

Example Script

Explanation:

Result:

Example Output:

	import geopandas as gpd
	from shapely.geometry import Point

	# Sample GeoDataFrame
	data = {
	'ID': [1, 2, 2, 3, 4, 4, 4],
	'Name': ['Alice', 'Bob', 'Charlie', 'David', 'Eve', 'Frank', 'Grace'],
	'geometry': [Point(1, 2), Point(2, 3), Point(3, 4), Point(4, 5), Point(5, 6), Point(6, 7), Point(7, 8)]
	}
	gdf = gpd.GeoDataFrame(data, crs="EPSG:4326")

	# Remove duplicated records based on the 'ID' column
	gdf_cleaned = gdf.drop_duplicates(subset='ID', keep='first')

	print(gdf_cleaned)

	ID Name geometry
	0 1 Alice POINT (1.00000 2.00000)
	1 2 Bob POINT (2.00000 3.00000)
	3 3 David POINT (4.00000 5.00000)
	4 4 Eve POINT (5.00000 6.00000)