(论文分析) Machine Learning -- Learning from labeled and unlabeled data

Learning from labeled and unlabeled data

主要思想：

无标签数据可以提供关于domain的结构性信息，如数据如何分布，等。

the unlabeled data provides information about the structure of the domain.

主要算法及思想介绍：

1. Self-Training

分类器在labeled data上进行训练，然后用其对unlabeled data进行分类。 the most confident unlabeled points（对无标签数据分类后的信任度），伴随着它们预测的标签，加入到训练集中。这个过程重复进行直到收敛。

2. Co-Training

描述objects的特征分为两类，其中每一个都可以用来训练得到一个好的分类器，并且这两个集合在给出类别属性后条件独立。这两个分类器在各自的集合中迭代训练，并且它们使用未标注数据中的一部分（可以实现最佳预测的那部分）和它们的最佳预测标签 teach each other。

3. transductive SVMs

4. Collective classification

使用labeled data 和unlabeled data的关联结构来提高分类精度。我们可以假设，一个example的预测标签将要被它相关的example的预测标签所影响。

5 另外一个想法

Using Weighted Nearest Neighbor to Benefit from Unlabeled Data

使用labeled data来进行训练分类器。使用这个分类器对unlabeled data 进行分类，给出相应的信任权重。我们将这种使用原始分类器对unlabeled data进行分类后的数据，称为pre-labeled data。接下来我们联合labeled data 和 pre-labeled data 作为一个新的集合。当来一个测试样本时，我们使用k-nearest在新的集合中来寻找k 个最相近的点。由于在这个新的集合中的点，我们已经知道了它们的标签（当然我们对它们所拥有的标签的准确度的信任程度是不同的，我们需要加权），从而我们可以用这k个近邻进行投票，从而决定这个测试样本是哪个类别。

posted @ 2014-02-17 08:57 Jian - Discovering Engine 阅读(1042) 评论(0) 收藏举报

刷新页面返回顶部

Jian - Discovering Engine

(论文分析) Machine Learning -- Learning from labeled and unlabeled data

公告