self-taught learning setting && semi-supervised learning
摘于上文献:
The more general and powerful setting is the self-taught learning setting, which does not assume that your unlabeled data xu has to be drawn from the same distribution as your labeled data xl. The more restrictive setting where the unlabeled data comes from exactly the same distribution as the labeled data is sometimes called the semi-supervised learning setting.
文中举一个例子说明:
假如你要判别一个图像是轿车还是自行车。实际中,你可能收集到两种数据。(1)从网上下载了一堆图片,不管有没有轿车、自行车,然后作为数据集,不做任何标签(label)。(2)从网上小心翼翼地筛选出一堆要么是轿车,要么是自行车的图片,作为数据集,不做任何标签。
前者的图片不满足我们目标预测的分布。称之为 self-taught learning。(注:我们的目标预测是:给一张要么是轿车,要么是自行车两种图片的分布)
后者的图片和我们目标预测的分布一致,称之为semi-supervised learning。