CS294-112 深度强化学习 秋季学期(伯克利)NO.6 Value functions introduction NO.7 Advanced Q learning

 

 

 --------------------------------------------------------------------------------------------------------------------------- ---------------------------------------------------------------------------------------------------------------------------

 

 

 

 

 

 

 

 

 

 

 

 

 

understand that correlated samples cause problem. and how paralled solve the problem 

 

 

 another solution is replay buffers, fully ultilizing the advantage of off policy in Q-learning.

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

there's still a problem: Q learning is not gradient descent

 

 

 

 

 

 

 

divide Q function into two parts: the target net and the evolving net. 

 

sacrifice speed to get the convergence.

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

overestimation of Natural DQN

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

get trouble in left and right dilemma of avoiding bumping on a tree

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

posted @ 2018-05-26 19:57  ecoflex  阅读(211)  评论(0编辑  收藏  举报