Paper Reading: A Brief Introduction to Weakly Supervised Learning

incomplete, 想利用未标注数据帮助训练

inexact, 笼统的数据标注，如垃圾邮件分类

inaccurate supervision，带噪声的数据，如众包

Incomplete supervision

training data set \(D=\{(x_1,y_1),\cdots,(x_l,y_l),x_{l+1},\cdots,x_m\}\)

active learning (with human intervention)

the labeling cost only depends on the number of queries

informativeness: an unlabeled instance helps reduce the uncertainty of a statistical model.

1.1 Uncertainty sampling a single learner, with the least confidence

1.2 query-by-committee multiple learners, disagree to most
representativeness : an instance helps represent the structure of input patterns

2.1 aim to exploit the cluster structure of unlabeled data

semi-supervised learning (no human intervention is assumed)

Here, although the unlabeled data points are not explicitly with label information, they implicitly convey some information about data distribution which can be helpful for predictive modelling.

two basic assumptions: the cluster assumption (data have inherent cluster structure) and the manifold assumption (data lie on a manifold).

generative methods

labels of unlabeled instances can be treated as missing values of model parameters, and estimated by approaches such as the EM .

To get good performance, one usually needs domain knowledge to determine adequate generative model.
graph based methods

the performance will heavily depends on how the graphis constructed.
low-density seperation methods

It is evident that S3VMs try to identify a classiﬁcation boundary which goes across the less dense region while keeping the labeled data correctly classiﬁed.
disagreement-based methods

generate multiple learners and let them collaborate to exploit unlabeled data.

Inexact Supervision

Multi-instance learning: predict the labels for unseen bags(\(X_i\) is a positive bag, if there exists \(x_{ip}\) which is positive, while p is unknown).

Inaccurate Supervision

For machine learning, crowdsourcing is commonly used as a cost-saving way to collect labels for training data.

posted @ 2018-03-09 14:34 Blueprintf 阅读(420) 评论(0) 收藏举报

刷新页面返回顶部

Blueprintf

为自己代码。

Paper Reading: A Brief Introduction to Weakly Supervised Learning

Incomplete supervision

Inexact Supervision

Inaccurate Supervision

公告