alex_bn_lee

导航

< 2025年3月 >
23 24 25 26 27 28 1
2 3 4 5 6 7 8
9 10 11 12 13 14 15
16 17 18 19 20 21 22
23 24 25 26 27 28 29
30 31 1 2 3 4 5

统计

[1081] The syntax and usage for the drop_duplicates and duplicated functions in a GeoDataFrame in GeoPandas.

Certainly! Here's the syntax and usage for the drop_duplicates and duplicated functions in a GeoDataFrame in GeoPandas.

drop_duplicates Function

The drop_duplicates method removes duplicate rows based on one or more columns.

Syntax:

GeoDataFrame.drop_duplicates(subset=None, keep='first', inplace=False)

Parameters:

  • subset (optional): Column label or sequence of labels to consider for identifying duplicates. Defaults to all columns.
  • keep (optional): Determines which duplicates to mark:
    • 'first': Drop duplicates except for the first occurrence.
    • 'last': Drop duplicates except for the last occurrence.
    • False: Drop all duplicates.
  • inplace (optional): If True, do operation inplace and return None.

Example:

import geopandas as gpd
from shapely.geometry import Point
# Sample GeoDataFrame
data = {'ID': [1, 2, 2, 3, 4, 4, 4], 'Name': ['Alice', 'Bob', 'Charlie', 'David', 'Eve', 'Frank', 'Grace'], 'geometry': [Point(1, 2), Point(2, 3), Point(3, 4), Point(4, 5), Point(5, 6), Point(6, 7), Point(7, 8)]}
gdf = gpd.GeoDataFrame(data, crs="EPSG:4326")
# Drop duplicates based on the 'ID' column, keeping the first occurrence
gdf_unique = gdf.drop_duplicates(subset='ID', keep='first')
print(gdf_unique)

duplicated Function

The duplicated method returns a Boolean Series indicating duplicate rows.

Syntax:

GeoDataFrame.duplicated(subset=None, keep='first')

Parameters:

  • subset (optional): Column label or sequence of labels to consider for identifying duplicates. Defaults to all columns.
  • keep (optional): Determines which duplicates to mark:
    • 'first': Mark duplicates as True except for the first occurrence.
    • 'last': Mark duplicates as True except for the last occurrence.
    • False: Mark all duplicates as True.

Example:

# Identify duplicate rows based on the 'ID' column
duplicates = gdf.duplicated(subset='ID', keep='first')
print(duplicates)

Explanation:

  • drop_duplicates: Removes duplicate rows from the GeoDataFrame based on specified columns.
  • duplicated: Identifies duplicate rows and returns a Boolean Series indicating which rows are duplicates.

These methods are useful for data cleaning and ensuring that your GeoDataFrame contains unique entries as required.

Feel free to try these examples and let me know if you need any further assistance!

posted on   McDelfino  阅读(16)  评论(0编辑  收藏  举报

相关博文:
阅读排行:
· DeepSeek 开源周回顾「GitHub 热点速览」
· 记一次.NET内存居高不下排查解决与启示
· 物流快递公司核心技术能力-地址解析分单基础技术分享
· .NET 10首个预览版发布:重大改进与新特性概览!
· .NET10 - 预览版1新功能体验(一)
历史上的今天:
2014-12-04 【156】我的工资条
2011-12-04 【006】◀▶ C#学习(五) - Winform<1>
点击右上角即可分享
微信分享提示