【Task4(2天)】 模型评估
记录5个模型(逻辑回归、SVM、决策树、随机森林、XGBoost)关于accuracy、precision,recall和F1-score、auc值的评分表格,并画出ROC曲线。时间:2天
可以参照以下格式:
说明:这份数据集是金融数据(非原始数据,已经处理过了),我们要做的是预测贷款用户是否会逾期。表格中 "status" 是结果标签:0表示未逾期,1表示逾期。
1.绘图绘表格函数
这里直接用的是上一篇的处理后的数据,定义好的模型
from sklearn.metrics import recall_score,precision_score,f1_score,accuracy_score,roc_curve,roc_auc_score import numpy as np
def plot_roc_curve(fpr_train, tpr_train,fpr_test,tpr_test, name=None): plt.plot(fpr_train, tpr_train, linewidth=2,c='r',label='train') plt.plot(fpr_test, tpr_test, linewidth=2,c='b',label='test') plt.plot([0, 1], [0, 1], 'k--') plt.axis([0, 1, 0, 1]) plt.xlabel('False Positive Rate') plt.ylabel('True Positive Rate') plt.title(name) plt.legend(loc='best') plt.show() def metrics(models,X_train_scaled,X_test_scaled,y_train,y_test): results_test = pd.DataFrame(columns=['recall_score','precision_score','f1_score','accuracy_score','AUC']) results_train = pd.DataFrame(columns=['recall_score','precision_score','f1_score','accuracy_score','AUC']) for model in models: name = str(model) result_train = [] result_test = [] model = models[model] model.fit(X_train_scaled,y_train) y_pre_test = model.predict(X_test_scaled) y_pre_train = model.predict(X_train_scaled) result_test.append(round(recall_score(y_pre_test,y_test),2)) result_test.append(round(precision_score(y_pre_test,y_test),2)) result_test.append(round(f1_score(y_pre_test,y_test),2)) result_test.append(round(accuracy_score(y_pre_test,y_test),2)) result_test.append(round(roc_auc_score(y_pre_test,y_test),2)) result_train.append(round(recall_score(y_pre_train,y_train),2)) result_train.append(round(precision_score(y_pre_train,y_train),2)) result_train.append(round(f1_score(y_pre_train,y_train),2)) result_train.append(round(accuracy_score(y_pre_train,y_train),2)) result_train.append(round(roc_auc_score(y_pre_train,y_train),2)) fpr_train, tpr_train, thresholds_train = roc_curve(y_pre_train,y_train) fpr_test, tpr_test, thresholds_test = roc_curve(y_pre_test,y_test) plot_roc_curve(fpr_train, tpr_train,fpr_test,tpr_test,name) results_test.loc[name] = result_test results_train.loc[name] = result_train return results_test,results_train
results_test,results_train = metrics(models,X_train_scaled,X_test_scaled,y_train,y_test)
结果如下
训练集:(数模型过拟合的很厉害!!)
测试集:
模型ROC曲线: