python数据清洗(pandas使用)

对于给定的样例数据:

 

 对其进行缺失值填补、名字切分、删除重复值操作:

import pandas as pd
from pandas import DataFrame,Series
df = DataFrame(pd.read_excel("F:\\python入门\\数据1\\food.xlsx"))
print('原始数据为:\n',df)
#利用均值填充缺失值
df['ounces'].fillna(df['ounces'].mean(),inplace=True)
print('填充均值后的数据:\n',df)
#将food列拆分成两列
df[['first_name','last_name']]=df['food'].str.split(expand=True)
df.drop('food',axis=1,inplace=True)
print('将食物名称拆分后的数据:\n',df)
#删除重复数据
df.drop_duplicates(['first_name','last_name'],inplace=True)
print('删除重复值后的数据:\n',df)
#df.to_excel("F:\\python入门\\数据1\\food_new.xlsx")

结果:

原始数据为:
           food  ounces  animal
0        bacon     4.0     pig
1  pulled pork     3.0     pig
2        bacon     NaN     pig
3     Pastrami     6.0     cow
4  corned beef     7.5     cow
5        Bacon     8.0     pig
6     pastrami    -3.0     cow
7    honey ham     5.0     pig
8     nova lox     6.0  salmon
填充均值后的数据:
           food  ounces  animal
0        bacon  4.0000     pig
1  pulled pork  3.0000     pig
2        bacon  4.5625     pig
3     Pastrami  6.0000     cow
4  corned beef  7.5000     cow
5        Bacon  8.0000     pig
6     pastrami -3.0000     cow
7    honey ham  5.0000     pig
8     nova lox  6.0000  salmon
将食物名称拆分后的数据:
    ounces  animal first_name last_name
0  4.0000     pig      bacon      None
1  3.0000     pig     pulled      pork
2  4.5625     pig      bacon      None
3  6.0000     cow   Pastrami      None
4  7.5000     cow     corned      beef
5  8.0000     pig      Bacon      None
6 -3.0000     cow   pastrami      None
7  5.0000     pig      honey       ham
8  6.0000  salmon       nova       lox
删除重复值后的数据:
    ounces  animal first_name last_name
0     4.0     pig      bacon      None
1     3.0     pig     pulled      pork
3     6.0     cow   Pastrami      None
4     7.5     cow     corned      beef
5     8.0     pig      Bacon      None
6    -3.0     cow   pastrami      None
7     5.0     pig      honey       ham
8     6.0  salmon       nova       lox

 

posted @ 2020-08-07 11:05  夏日的向日葵  阅读(613)  评论(0编辑  收藏  举报