摘要:
green bar is the reward function, blue curve is the possibility of differenct trajectories if green bars are equally increased to yellow bars, the res 阅读全文
摘要:
first order markov chain on policy algorithm is easier to be paralleled off policy algorithm has to fit transition net, and policy net. much more comp 阅读全文
摘要:
前面弄错了,应该看2017的秋季课,结果看了春季课了。 neural network control a virtual robot, by imitating human motion Domain shift cause the failure of supervised learning in 阅读全文
摘要:
initialization dramatically influences the trajectory. the current state depends on all the past decision. ones reflect the dimensions being counted. 阅读全文