CS294-112 深度强化学习 秋季学期(伯克利)NO.5 Actor-critic introduction
in most AC algorithms, we actually just fit value function. less common to fit Q function as well.
batch:off line, monte carlo。online: bootstrap,TD
in fast emulator,use the left one
this strategy works well in the beginnning of training