学习 ML 过程中的一些概念及阐述
Random Forest
a set of decision trees, make classification by voting (maybe with some weight)
多颗决策树, 采用类似投票的方式(可以占一定比重)决定分类
Bagging & Boosting
letting weak models consist of strong model
用多个弱模型组成强模型
Bagging
randomly sampling the training data with replacement
generally 68% non-repeating is selected from the data
repeat the operation above in each classifier (like SVM, decision tree...)
Advantage: reduce the variance
有放回地随机选择与样本集等大的样本
不重复的样本约 68%
对每个分类器重复上述操作, 比如支持向量机, 决策树
优点: 降低方差
Boosting (here directly introduce adaptive boosting)
do default training in the first classifier
raise the weight of mis-classified samples in the training of the next classifier
repeat the operation above
assign weight to each classifier based on the accuracy
对第一个分类器做默认训练
提高被错分的样本在下一个分类器的训练时的权重
重复上述操作
基于准确率对每个分类器分配权重
PDP & ICE
个人觉得有点像线代矩阵的条件数 \(cond_f(A)\), 只不过这里是想了解每个特征对预测结果/单一样本的稳定性
Partial Dependence Plot
vary the feature of interest across its range of values, while keeping all the other features constant
to watch how much one feature can affect the prediction
useful for visualizing and understanding the relationship between features and the target variable
在一定范围内,改变一个特征的,同时保持其他特征不变
来观察某一个特征对预测结果造成的影响
对可视化以及理解特征与目标变量间关系很有用
Individual Conditional Expectation
represent how a change in a feature affects the prediction for a single data point
展示对于一个数据点, 一个特征的变化对预测结果的影响
Permutation Importance
random re-permutate one of the features, check if the accuracy is lower
to know the importance of a feature
对某一个特征进行随机重排,检查正确率是否下降
以此来得知一个特征的重要性