摘要: in policy gradient, "a" is replaced by "u" usually. use this new form to estimate how good the update is. If all three path show positive reward, shou 阅读全文
posted @ 2018-04-30 20:37 ecoflex 阅读(230) 评论(0) 推荐(0) 编辑
摘要: https://www.youtube.com/watch?v=fevMOp5TDQs http://www.denizyuret.com/2015/03/alec-radfords-animations-for.html artari is not a MDP, but MDP method wo 阅读全文
posted @ 2018-04-30 16:41 ecoflex 阅读(185) 评论(0) 推荐(0) 编辑
摘要: ... 阅读全文
posted @ 2018-04-30 16:03 ecoflex 阅读(124) 评论(0) 推荐(0) 编辑
摘要: https://www.youtube.com/watch?v=qaMdN6LS9rA https://drive.google.com/file/d/0BxXI_RttTZAhVXBlMUVkQ1BVVDQ/view match: a4 b1 c2 d3 a The middle one is c 阅读全文
posted @ 2018-04-30 14:52 ecoflex 阅读(204) 评论(0) 推荐(0) 编辑