python数据清洗(pandas使用)
对于给定的样例数据:
对其进行缺失值填补、名字切分、删除重复值操作:
import pandas as pd from pandas import DataFrame,Series df = DataFrame(pd.read_excel("F:\\python入门\\数据1\\food.xlsx")) print('原始数据为:\n',df) #利用均值填充缺失值 df['ounces'].fillna(df['ounces'].mean(),inplace=True) print('填充均值后的数据:\n',df) #将food列拆分成两列 df[['first_name','last_name']]=df['food'].str.split(expand=True) df.drop('food',axis=1,inplace=True) print('将食物名称拆分后的数据:\n',df) #删除重复数据 df.drop_duplicates(['first_name','last_name'],inplace=True) print('删除重复值后的数据:\n',df) #df.to_excel("F:\\python入门\\数据1\\food_new.xlsx")
结果:
原始数据为: food ounces animal 0 bacon 4.0 pig 1 pulled pork 3.0 pig 2 bacon NaN pig 3 Pastrami 6.0 cow 4 corned beef 7.5 cow 5 Bacon 8.0 pig 6 pastrami -3.0 cow 7 honey ham 5.0 pig 8 nova lox 6.0 salmon 填充均值后的数据: food ounces animal 0 bacon 4.0000 pig 1 pulled pork 3.0000 pig 2 bacon 4.5625 pig 3 Pastrami 6.0000 cow 4 corned beef 7.5000 cow 5 Bacon 8.0000 pig 6 pastrami -3.0000 cow 7 honey ham 5.0000 pig 8 nova lox 6.0000 salmon 将食物名称拆分后的数据: ounces animal first_name last_name 0 4.0000 pig bacon None 1 3.0000 pig pulled pork 2 4.5625 pig bacon None 3 6.0000 cow Pastrami None 4 7.5000 cow corned beef 5 8.0000 pig Bacon None 6 -3.0000 cow pastrami None 7 5.0000 pig honey ham 8 6.0000 salmon nova lox 删除重复值后的数据: ounces animal first_name last_name 0 4.0 pig bacon None 1 3.0 pig pulled pork 3 6.0 cow Pastrami None 4 7.5 cow corned beef 5 8.0 pig Bacon None 6 -3.0 cow pastrami None 7 5.0 pig honey ham 8 6.0 salmon nova lox