强化学习1

一、与监督学习不一样的地方:

(1)closed-loop

(2)not told which actions to take

(3)not only the immediate reward but also the next situation,through that, all subsequent rewards.

(4)the dilemma is that neither exploration nor exploitation can be persued exclusively without failing at the task.

(5)a goal-directed agent interacting with an uncertain environment.

posted on 2016-10-31 19:36  一动不动的葱头  阅读(130)  评论(0编辑  收藏  举报

导航