CS294-112 深度强化学习秋季学期（伯克利）NO.3 Reinforcement learning introduction

first order markov chain

on policy algorithm is easier to be paralleled

off policy algorithm has to fit transition net, and policy net. much more computationally expensive

posted @ 2018-05-24 18:13 ecoflex 阅读(157) 评论(0) 收藏举报

刷新页面返回顶部