Python for Data Science - Removing duplicates

Chapter 2 - Data Preparation Basics

Segment 3 - Removing duplicates

import numpy as np
import pandas as pd

from pandas import Series, DataFrame

Removing duplicates

DF_obj = DataFrame({'column 1':[1,1,2,2,3,3,3],
                    'column 2':['a','a','b','b','c','c','c'],
                    'column 3':['A','A','B','B','C','C','C']})
DF_obj
column 1 column 2 column 3
0 1 a A
1 1 a A
2 2 b B
3 2 b B
4 3 c C
5 3 c C
6 3 c C
DF_obj.duplicated()
0    False
1     True
2    False
3     True
4    False
5     True
6     True
dtype: bool
DF_obj.drop_duplicates()
column 1 column 2 column 3
0 1 a A
2 2 b B
4 3 c C
DF_obj = DataFrame({'column 1':[1,1,2,2,3,3,3],
                    'column 2':['a','a','b','b','c','c','c'],
                    'column 3':['A','A','B','B','C','D','C']})
DF_obj
column 1 column 2 column 3
0 1 a A
1 1 a A
2 2 b B
3 2 b B
4 3 c C
5 3 c D
6 3 c C
DF_obj.drop_duplicates(['column 3'])
column 1 column 2 column 3
0 1 a A
2 2 b B
4 3 c C
5 3 c D

posted @ 2021-01-02 15:03  晨风_Eric  阅读(64)  评论(0编辑  收藏  举报