摘要:
transition possibility is unknown and we even don't need to estimate the possibility 阅读全文
摘要:
understand that correlated samples cause problem. and how paralled solve the problem another solution is replay buffers, fully ultilizing the advantag 阅读全文
摘要:
in most AC algorithms, we actually just fit value function. less common to fit Q function as well. batch:off line, monte carlo。online: bootstrap,TD in 阅读全文