摘要: green bar is the reward function, blue curve is the possibility of differenct trajectories if green bars are equally increased to yellow bars, the res 阅读全文
posted @ 2018-05-24 23:13 ecoflex 阅读(132) 评论(0) 推荐(0) 编辑
摘要: first order markov chain on policy algorithm is easier to be paralleled off policy algorithm has to fit transition net, and policy net. much more comp 阅读全文
posted @ 2018-05-24 18:13 ecoflex 阅读(150) 评论(0) 推荐(0) 编辑
摘要: 前面弄错了,应该看2017的秋季课,结果看了春季课了。 neural network control a virtual robot, by imitating human motion Domain shift cause the failure of supervised learning in 阅读全文
posted @ 2018-05-24 16:43 ecoflex 阅读(1058) 评论(0) 推荐(0) 编辑
摘要: initialization dramatically influences the trajectory. the current state depends on all the past decision. ones reflect the dimensions being counted. 阅读全文
posted @ 2018-05-24 13:59 ecoflex 阅读(303) 评论(0) 推荐(0) 编辑