skill

1.基础模型开发，这种模式应该使我们能够快速理解问题和数据，对数据有个认识
# 概览
df.head()
#描述性统计
df.describe()
#处理缺失值
#处理非数值型变量
pd.Categorical()

2.数据将通过探索性数据分析和特征提取进行研究和丰富,探索数据是数据分析非常重要的步骤，
排除不相关的特征最主要的原因是为了防止过拟合。怕模型学到训练集中没用的特征，而真实集中没有该特征
方法:1.特征与因变量的关系
2.特征组合+特征抽取(选取相关特征)
3.构建机器学习模型
4.给出结论

1.(Build)快速的建立起整个模型
2.(measurr)评估模型性能
3.(learn)做出修正以提高性能

Learning curves allow us to diagnose if the is overfitting or underfitting.
overfitting解决方法：
1.降低模型复杂度
2.收集更多数据
underfitting
1.改进模型
2.提高数据质量

Validation curves are a tool that we can use to improve the performance of our model. It counts as a way of tuning our hyperparameters.

validation curve

x轴是某个超参数值，y轴是损失函数值。描绘的是某超参数取不同值，得到的对应损失函数值
from sklearn.model_selection import validation_curve

学习曲线描绘的是不同训练数据量对应的损失函数值

from sklearn.metrics import classification_report #class report

posted @ 2018-06-09 18:03 blog_hfg 阅读(221) 评论(0) 收藏举报

刷新页面返回顶部