学习 ML 过程中的一些概念及阐述

Random Forest

a set of decision trees, make classification by voting (maybe with some weight)

多颗决策树, 采用类似投票的方式(可以占一定比重)决定分类

Bagging & Boosting

letting weak models consist of strong model

用多个弱模型组成强模型

Bagging

randomly sampling the training data with replacement

generally 68% non-repeating is selected from the data

repeat the operation above in each classifier (like SVM, decision tree...)

Advantage: reduce the variance

有放回地随机选择与样本集等大的样本

不重复的样本约 68%

对每个分类器重复上述操作, 比如支持向量机, 决策树

优点: 降低方差

Boosting (here directly introduce adaptive boosting)

do default training in the first classifier

raise the weight of mis-classified samples in the training of the next classifier

repeat the operation above

assign weight to each classifier based on the accuracy

对第一个分类器做默认训练

提高被错分的样本在下一个分类器的训练时的权重

重复上述操作

基于准确率对每个分类器分配权重

PDP & ICE

个人觉得有点像线代矩阵的条件数 \(cond_f(A)\), 只不过这里是想了解每个特征对预测结果/单一样本的稳定性

Partial Dependence Plot

vary the feature of interest across its range of values, while keeping all the other features constant

to watch how much one feature can affect the prediction

useful for visualizing and understanding the relationship between features and the target variable

在一定范围内,改变一个特征的,同时保持其他特征不变

来观察某一个特征对预测结果造成的影响

对可视化以及理解特征与目标变量间关系很有用

Individual Conditional Expectation

represent how a change in a feature affects the prediction for a single data point

展示对于一个数据点, 一个特征的变化对预测结果的影响

Permutation Importance

random re-permutate one of the features, check if the accuracy is lower

to know the importance of a feature

对某一个特征进行随机重排,检查正确率是否下降

以此来得知一个特征的重要性

posted @ 2023-12-04 23:37  LacLic  阅读(32)  评论(0编辑  收藏  举报