CS294-112 深度强化学习 秋季学期(伯克利)NO.5 Actor-critic introduction

 

 

 

 

  

 

 

 

 

 

 

 

 

 

 

 

 

 

 

in most AC algorithms, we actually just fit value function. less common to fit Q function as well.

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

  

batch:off line, monte carlo。online: bootstrap,TD

 

 

 

in fast emulator,use the left one

 

 

 

 

 

 

 

 

 

 

 

 

 

this strategy works well in the beginnning of training

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

posted @ 2018-05-26 12:28  ecoflex  阅读(207)  评论(0编辑  收藏  举报