摘要: transition possibility is unknown and we even don't need to estimate the possibility 阅读全文
posted @ 2018-05-26 23:04 ecoflex 阅读(122) 评论(0) 推荐(0) 编辑
摘要: understand that correlated samples cause problem. and how paralled solve the problem another solution is replay buffers, fully ultilizing the advantag 阅读全文
posted @ 2018-05-26 19:57 ecoflex 阅读(211) 评论(0) 推荐(0) 编辑
摘要: in most AC algorithms, we actually just fit value function. less common to fit Q function as well. batch:off line, monte carlo。online: bootstrap,TD in 阅读全文
posted @ 2018-05-26 12:28 ecoflex 阅读(206) 评论(0) 推荐(0) 编辑