Kaggle学习笔记之缺失值处理

来源是Kaggle中级机器学习 Missing Values部分。

处理缺失值的三种方法

  1. 删除带有缺失值的列
  2. 缺失值插补
  3. 缺失值插补拓展

举例实现

1. 删除带有缺失值的列

# Get names of columns with missing values
cols_with_missing = [col for col in X_train.columns
                     if X_train[col].isnull().any()]

# Drop columns in training and validation data
reduced_X_train = X_train.drop(cols_with_missing, axis=1)
reduced_X_valid = X_valid.drop(cols_with_missing, axis=1)

2. 缺失值插补

SimpleImputer默认策略为mean,用每列的均值替换缺失值,还可设置为most_frequent, median, constant

from sklearn.impute import SimpleImputer

# Imputation
my_imputer = SimpleImputer()
imputed_X_train = pd.DataFrame(my_imputer.fit_transform(X_train))
imputed_X_valid = pd.DataFrame(my_imputer.transform(X_valid))

# Imputation removed column names; put them back
imputed_X_train.columns = X_train.columns
imputed_X_valid.columns = X_valid.columns

3. 缺失值插补拓展

先进行缺失值插补,再添加列用于表示插补位置。

# Make copy to avoid changing original data (when imputing)
X_train_plus = X_train.copy()
X_valid_plus = X_valid.copy()

# Make new columns indicating what will be imputed
for col in cols_with_missing:
    X_train_plus[col + '_was_missing'] = X_train_plus[col].isnull()
    X_valid_plus[col + '_was_missing'] = X_valid_plus[col].isnull()

# Imputation
my_imputer = SimpleImputer()
imputed_X_train_plus = pd.DataFrame(my_imputer.fit_transform(X_train_plus))
imputed_X_valid_plus = pd.DataFrame(my_imputer.transform(X_valid_plus))

# Imputation removed column names; put them back
imputed_X_train_plus.columns = X_train_plus.columns
imputed_X_valid_plus.columns = X_valid_plus.columns

结果分析

MAE from Approach 1 (Drop columns with missing values):
183550.22137772635

MAE from Approach 2 (Imputation):
178166.46269899711

MAE from Approach 3 (An Extension to Imputation):
178927.503183954

这份训练数据包括10864行和12列,只有三列存在缺失数据且缺失数据均不超过每列数据的一半,因此,直接删除缺失值相关数据会同时删除很多有用的信息,而进行缺失值插补表现更好。

posted @   ikventure  阅读(177)  评论(0编辑  收藏  举报
相关博文:
阅读排行:
· 震惊!C++程序真的从main开始吗?99%的程序员都答错了
· 【硬核科普】Trae如何「偷看」你的代码?零基础破解AI编程运行原理
· 单元测试从入门到精通
· 上周热点回顾(3.3-3.9)
· winform 绘制太阳,地球,月球 运作规律
点击右上角即可分享
微信分享提示