Intro to Online machine learning

Online vs. offline

online

Input processed piece by piece in a serial fashion
Each new piece of information generates an event
Not neccessarily low latency（不一定是低延迟的）

offline

Input processed in batches
Not neccessarily high latency

NOTE: Online doesn't mean fast, online doesn't mean streaming, online only means that it processes information as soon as it is received.

总结：在线学习主要指的是接收数据后马上进行学习，而不是得到所有数据后分批（batch）处理数据（更新模型参数）。

vs. Incremental Learning : 在“Incremental Learning from Scratch for Task-Oriented Dialogue Systems.”论文中增量学习指的是在测试的时候也要更新模型，即训练得到的模型参数不是freezing的。即：增量学习可以每当新增数据时，并不需要重建所有的知识库，而是在原有知识库的基础上（利用已经学习到的模型），仅做由于新增数据所引起的更新。但是增量学习可以one by one (or batch by batch)地处理数据。

Lamda vs. Kappa (Machine Learning)

Lamda

Learning happens offline
Model used by streaming engine to make decision online

Kappa

Learning happens online
Online decision model updates for each new record seen

总结：Lamda指的是离线训练，测试的时候一个一个示例的测试（online）；Kappa指的是在线训练（每接收到一个实例）就要更新模型参数，并且测试的时候也要更新参数（增量学习？）。

Statistical vs. Adversarial

Traditional

Common statistical methods: supervised and unsupervised
Graded by statistical fitness tests and out of core testing e.g. MSE, MAPE, R2

Adversarial

Algorithm versus environment e.g. vs Spammers, vs Hackers, vs Nature
Graded by directionally can some tests and really A/B testing: adversaries may get smarter over time

总结：对抗学习指的是与环境之间的交互（两个模型之间的），通过另一个模型来学习。

real-time

Subjective
A good buzzword for something that:

- Doesn't fall intot any of the above categories cleanly

- Doesn't fall intot any of the above category you want it to fall into

- You're not really sure which buzzword to use, so you need a 'safe' word that no one can call you on

总结：实时可能指的是days，weeks。根据接收数据的时间决定。

(from https://www.youtube.com/watch?v=O3gd6elZOlA)

补充阅读和理解：

We can distinguish two learning modes: offline learning and online learning. In offline learning, the whole training data must be available at the time of model training. Only when training is completed can the model be used for predicting. In contrast, online algorithms process data sequentially. They produce a model and put it in operation without having the complete training dataset available at the beginning. The model is continuously updated during operation as more training data arrives.

Less restrictive than online algorithms are incremental algorithms that process input examples one by one (or batch by batch) and update the decision model after receiving each example. Incremental algorithms may have random access to previous examples or representative/selected examples. In such a case, these algorithms are called in- cremental algorithms with partial memory. Typically, in incremental algorithms, for any new presentation of data, the update operation of the model is based on the previous one. Streaming algorithms are online algorithms for processing high-speed continuous flows of data. In streaming, examples are processed sequentially as well and can be examined in only a few passes (typically just one). These algorithms use limited memory and limited processing time per item.

(from Gama, João, et al. "A survey on concept drift adaptation." ACM Computing Surveys (CSUR)46.4 (2014): 44.)

https://www.zhihu.com/question/38713098

https://blog.csdn.net/zyazky/article/details/51942135

posted @ 2020-06-12 10:36 萌新的学习之路阅读(387) 评论(0) 编辑收藏举报

tristaTL

Intro to Online machine learning

Online vs. offline

Lamda vs. Kappa (Machine Learning)

Statistical vs. Adversarial

real-time

公告