摘要: So, the process is similar to one-to-many RNN? learn much more efficiently than model-free method iteratively get better less than 300 trials ~ 25min 阅读全文
posted @ 2018-05-02 23:02 ecoflex 阅读(216) 评论(0) 推荐(0) 编辑
摘要: you wouldn't try to explore any problem structure in DFO low dimension policy 30 degrees of freedom 120 paramaters to tune keep the positive results i 阅读全文
posted @ 2018-05-02 13:08 ecoflex 阅读(182) 评论(0) 推荐(0) 编辑