2018 年 5月 24 日随笔档案 - ecoflex

2018年5月24日

CS294-112 深度强化学习秋季学期（伯克利）NO.4 Policy gradients introduction

摘要： green bar is the reward function, blue curve is the possibility of differenct trajectories if green bars are equally increased to yellow bars, the res 阅读全文

posted @ 2018-05-24 23:13 ecoflex 阅读(132) 评论(0) 推荐(0) 编辑

CS294-112 深度强化学习秋季学期（伯克利）NO.3 Reinforcement learning introduction

摘要： first order markov chain on policy algorithm is easier to be paralleled off policy algorithm has to fit transition net, and policy net. much more comp 阅读全文

posted @ 2018-05-24 18:13 ecoflex 阅读(150) 评论(0) 推荐(0) 编辑

CS294-112 深度强化学习秋季学期（伯克利）NO.1 Introduction NO.2 Supervised learning and imitation

摘要：前面弄错了，应该看2017的秋季课，结果看了春季课了。 neural network control a virtual robot, by imitating human motion Domain shift cause the failure of supervised learning in 阅读全文

posted @ 2018-05-24 16:43 ecoflex 阅读(1058) 评论(0) 推荐(0) 编辑

CS294-112深度增强学习课程（加州大学伯克利分校 2017）NO.5 Guest lecture: lgor Mordatch （open ai）

摘要： initialization dramatically influences the trajectory. the current state depends on all the past decision. ones reflect the dimensions being counted. 阅读全文

posted @ 2018-05-24 13:59 ecoflex 阅读(303) 评论(0) 推荐(0) 编辑

ecoflex

公告