COMP9517 week5 机器学习

https://echo360.org.au/lesson/29e8e4ac-4007-4e31-b682-aa9da17ddd36/classroom#sortDirection=desc

https://webcms3.cse.unsw.edu.au/static/uploads/course/COMP9517/20T2/d553e782841ce957695fba698756b72f159c2ebae4e65061c4d361b642e2ca85/COMP9517_20T2W5_Pattern_Recognition_Part_2.pdf

总结：

　　1.Pattern recognition

　　2. Nearest Class Mean Classifier

　　　　1）计算 k 类样本的class mean vector

　　　　2） test set的输入向量X，被分到一类，if it is much closer to the mean vector of class 𝑘 than to any other class mean vector

　　　　3) X与 mean vector 之间的距离用 Euclidean distance 计算

　　　　4）如果某类k，有两个子类，则分别计算两个子类的 mean vector

　　　　5） Pros : classes之间间隔远，区别明显时，该模型效果好

　　　　6） Cons: 样本复杂，多类时，使用效果差；不能处理outliers，missing data

　　3.K-nearest Neighbours (KNN)

　　　　1） sample会被分发label，该label是KNN中的多数类

　　　　2）计算离sample最近的k个邻居时，计算距离的方法一般使用（计算连续various时）Euclidean distance

　　　　3） for discrete variables, Hamming distance.

　　　　4） Pros:

　　　　　　1.没有 training step

　　　　　　2. Decision surfaces are non-linear

　　　　5) Cons:

　　　　　　1. 数据集大的时候非常慢，O(N^2)

　　　　　　2. 当features很多，维度很大的时候，效果不好，解释性差（curese of dimensionality）

　　4. Bayesian Decision Theory

　　　　1) 分类器的分类可能不正确，所以分类结果应该设为概率；Object会被分到概率最大的class

　　　　2）

　　5. Decision Tree

　　　　1) 完全分类时，熵是0，没完全分类时，是正数，所以要做的是minimize(entropy)

　　　　2）分类越好，entropy就越小，information gain就越大

　　　　3）从哪个feature分割，information gain大，就用哪个feature做node

　　　　4） Pros:

　　　　　　1. 很好解释

　　　　　　2. 能处理numerical与categorical数据

　　　　　　3. 能处理outliers，missingvalue

　　　　　　4. 能区别features的importance

　　　　5）Cons: tend to overfit

　　6. 集成学习，RandomForest

　　　　1)error rate

　　　　　　1. RF的error rate受forests'correlation以及individual tree's strength

　　　　　　2. corelation越大strength越小，会导致error_rate越大

　　　　　　3. forests'correlation与strength都与每棵树选择的features数量m成正相关关系

　　　　　　4. 所以需要对m进行trade-off

　　　　2）pros

　　　　　　• unexcelled in accuracy among current algorithms

　　　　　　• works efficiently on large datasets

　　　　　　• handles thousands of input features without feature selection

　　　　　　• handles missing values effectively

　　　　3) Cons:

　　　　　　• less interpretable than an individual decision tree

　　　　　　• More complex and more time-consuming to construct than decision trees

　　7. SVM

posted @ 2020-07-03 16:43 ChevisZhang 阅读(319) 评论(0) 编辑收藏举报

刷新页面返回顶部

ChevisZhang

COMP9517 week5 机器学习

公告