摘要:
in policy gradient, "a" is replaced by "u" usually. use this new form to estimate how good the update is. If all three path show positive reward, shou 阅读全文
摘要:
https://www.youtube.com/watch?v=fevMOp5TDQs http://www.denizyuret.com/2015/03/alec-radfords-animations-for.html artari is not a MDP, but MDP method wo 阅读全文
摘要:
... 阅读全文
摘要:
https://www.youtube.com/watch?v=qaMdN6LS9rA https://drive.google.com/file/d/0BxXI_RttTZAhVXBlMUVkQ1BVVDQ/view match: a4 b1 c2 d3 a The middle one is c 阅读全文