上一页 1 ··· 8 9 10 11 12 13 14 15 16 下一页
摘要: ... 阅读全文
posted @ 2018-05-29 17:24 ecoflex 阅读(217) 评论(0) 推荐(0) 编辑
摘要: 阅读全文
posted @ 2018-05-29 17:23 ecoflex 阅读(212) 评论(0) 推荐(0) 编辑
摘要: 阅读全文
posted @ 2018-05-29 17:22 ecoflex 阅读(180) 评论(0) 推荐(0) 编辑
摘要: jump over this lecture 阅读全文
posted @ 2018-05-29 17:21 ecoflex 阅读(120) 评论(0) 推荐(0) 编辑
摘要: ... 阅读全文
posted @ 2018-05-29 17:17 ecoflex 阅读(144) 评论(0) 推荐(0) 编辑
摘要: ... 阅读全文
posted @ 2018-05-29 16:27 ecoflex 阅读(161) 评论(0) 推荐(0) 编辑
摘要: after the break, we'll extend our IRL into continuous spaces 阅读全文
posted @ 2018-05-29 14:55 ecoflex 阅读(187) 评论(0) 推荐(0) 编辑
摘要: yellow region corresponds to β blue to α 阅读全文
posted @ 2018-05-28 20:46 ecoflex 阅读(132) 评论(0) 推荐(0) 编辑
摘要: ... 阅读全文
posted @ 2018-05-28 17:13 ecoflex 阅读(139) 评论(0) 推荐(0) 编辑
摘要: make compromise between learnt policy and minimal cost! π hat is using states π theta is using observations 阅读全文
posted @ 2018-05-27 23:01 ecoflex 阅读(184) 评论(0) 推荐(0) 编辑
摘要: MPC means replan every step Every N step, rebuild the dynamic model 阅读全文
posted @ 2018-05-27 18:15 ecoflex 阅读(236) 评论(0) 推荐(0) 编辑
摘要: transition possibility is unknown and we even don't need to estimate the possibility 阅读全文
posted @ 2018-05-26 23:04 ecoflex 阅读(122) 评论(0) 推荐(0) 编辑
摘要: understand that correlated samples cause problem. and how paralled solve the problem another solution is replay buffers, fully ultilizing the advantag 阅读全文
posted @ 2018-05-26 19:57 ecoflex 阅读(211) 评论(0) 推荐(0) 编辑
摘要: in most AC algorithms, we actually just fit value function. less common to fit Q function as well. batch:off line, monte carlo。online: bootstrap,TD in 阅读全文
posted @ 2018-05-26 12:28 ecoflex 阅读(206) 评论(0) 推荐(0) 编辑
摘要: green bar is the reward function, blue curve is the possibility of differenct trajectories if green bars are equally increased to yellow bars, the res 阅读全文
posted @ 2018-05-24 23:13 ecoflex 阅读(130) 评论(0) 推荐(0) 编辑
摘要: first order markov chain on policy algorithm is easier to be paralleled off policy algorithm has to fit transition net, and policy net. much more comp 阅读全文
posted @ 2018-05-24 18:13 ecoflex 阅读(150) 评论(0) 推荐(0) 编辑
摘要: 前面弄错了,应该看2017的秋季课,结果看了春季课了。 neural network control a virtual robot, by imitating human motion Domain shift cause the failure of supervised learning in 阅读全文
posted @ 2018-05-24 16:43 ecoflex 阅读(1046) 评论(0) 推荐(0) 编辑
摘要: initialization dramatically influences the trajectory. the current state depends on all the past decision. ones reflect the dimensions being counted. 阅读全文
posted @ 2018-05-24 13:59 ecoflex 阅读(299) 评论(0) 推荐(0) 编辑
上一页 1 ··· 8 9 10 11 12 13 14 15 16 下一页