[1081] The syntax and usage for the drop_duplicates and duplicated functions in a GeoDataFrame in GeoPandas.
Certainly! Here's the syntax and usage for the drop_duplicates
and duplicated
functions in a GeoDataFrame in GeoPandas.
drop_duplicates
Function
The drop_duplicates
method removes duplicate rows based on one or more columns.
Syntax:
GeoDataFrame.drop_duplicates(subset=None, keep='first', inplace=False)
Parameters:
subset
(optional): Column label or sequence of labels to consider for identifying duplicates. Defaults to all columns.keep
(optional): Determines which duplicates to mark:'first'
: Drop duplicates except for the first occurrence.'last'
: Drop duplicates except for the last occurrence.False
: Drop all duplicates.
inplace
(optional): IfTrue
, do operation inplace and returnNone
.
Example:
import geopandas as gpd from shapely.geometry import Point # Sample GeoDataFrame data = {'ID': [1, 2, 2, 3, 4, 4, 4], 'Name': ['Alice', 'Bob', 'Charlie', 'David', 'Eve', 'Frank', 'Grace'], 'geometry': [Point(1, 2), Point(2, 3), Point(3, 4), Point(4, 5), Point(5, 6), Point(6, 7), Point(7, 8)]} gdf = gpd.GeoDataFrame(data, crs="EPSG:4326") # Drop duplicates based on the 'ID' column, keeping the first occurrence gdf_unique = gdf.drop_duplicates(subset='ID', keep='first') print(gdf_unique)
duplicated
Function
The duplicated
method returns a Boolean Series indicating duplicate rows.
Syntax:
GeoDataFrame.duplicated(subset=None, keep='first')
Parameters:
subset
(optional): Column label or sequence of labels to consider for identifying duplicates. Defaults to all columns.keep
(optional): Determines which duplicates to mark:'first'
: Mark duplicates asTrue
except for the first occurrence.'last'
: Mark duplicates asTrue
except for the last occurrence.False
: Mark all duplicates asTrue
.
Example:
# Identify duplicate rows based on the 'ID' column duplicates = gdf.duplicated(subset='ID', keep='first') print(duplicates)
Explanation:
drop_duplicates
: Removes duplicate rows from the GeoDataFrame based on specified columns.duplicated
: Identifies duplicate rows and returns a Boolean Series indicating which rows are duplicates.
These methods are useful for data cleaning and ensuring that your GeoDataFrame contains unique entries as required.
Feel free to try these examples and let me know if you need any further assistance!
分类:
Python Study
【推荐】国内首个AI IDE,深度理解中文开发场景,立即下载体验Trae
【推荐】编程新体验,更懂你的AI,立即体验豆包MarsCode编程助手
【推荐】抖音旗下AI助手豆包,你的智能百科全书,全免费不限次数
【推荐】轻量又高性能的 SSH 工具 IShell:AI 加持,快人一步
· DeepSeek 开源周回顾「GitHub 热点速览」
· 记一次.NET内存居高不下排查解决与启示
· 物流快递公司核心技术能力-地址解析分单基础技术分享
· .NET 10首个预览版发布:重大改进与新特性概览!
· .NET10 - 预览版1新功能体验(一)
2014-12-04 【156】我的工资条
2011-12-04 【006】◀▶ C#学习(五) - Winform<1>