Reinforcement Learning - 随笔分类 - Shiyu_Huang

reward model learning papers

摘要：1. Fine-Tuning Language Models from Human Preferences reward model：774M参数量的GPT-2，先进行了有监督训练训练loss: 其中r(x,y)代表reward model，x代表输入或者prompt，y代表输出或者reponse 阅读全文

posted @ 2023-03-10 14:51 Shiyu_Huang 阅读(472) 评论(0) 推荐(0)

dm_control使用

该文被密码保护。

posted @ 2020-11-25 19:36 Shiyu_Huang 阅读(214) 评论(0) 推荐(0)

Variational RL for POMDP

摘要：1.Le, Tuan Anh, et al. "Auto-encoding sequential monte carlo." arXiv preprint arXiv:1705.10306 (2017). 阅读全文

posted @ 2019-04-08 15:47 Shiyu_Huang 阅读(336) 评论(0) 推荐(0)

Attacks for RL

摘要：1. http://rll.berkeley.edu/adversarial/ Adversarial Attacks on Neural Network Policies 就是对test时候的policy进行构造对抗样本，方法还是用的分类对抗样本的方法，对于DQN，把Q value做个softma 阅读全文

posted @ 2019-04-08 14:39 Shiyu_Huang 阅读(281) 评论(0) 推荐(0)

Bayesian RL and PGMRL

摘要：简介： PGMRL: PGMRL就是把RL问题建模成一个概率图模型，如下图所示：然后通过variational inference的方法进行学习： PGMRL给RL问题的表示给了一个范例，对解决很多RL新问题提供了一种思路和工具。 Bayesian RL: 主要是对RL的reward functi 阅读全文

posted @ 2019-04-04 13:50 Shiyu_Huang 阅读(447) 评论(0) 推荐(0)

Policy Gradient

摘要：Policy Gradient是区别于Q-Learning为代表的value based的方法。policy gradient又可以叫reinforce算法(Williams, 1992)。如今的ACTOR-CRITIC也就是基于policy gradient。该方法不能制表，只能对policy进阅读全文

posted @ 2018-03-27 00:49 Shiyu_Huang 阅读(554) 评论(0) 推荐(0)

Q-Learning

摘要：一、Q-Learning: 例子：https://www.zhihu.com/question/26408259/answer/123230350 http://ml.cs.tsinghua.edu.cn:5000/demos/flappybird/ 以上为Q-Learning的伪代码 Q(S,A) 阅读全文

posted @ 2018-03-26 15:58 Shiyu_Huang 阅读(543) 评论(0) 推荐(0)

SEMI-PARAMETRIC TOPOLOGICAL MEMORY FOR NAVIGATION

摘要：github: https://github.com/nsavinov/SPTM 阅读全文

posted @ 2018-03-16 14:08 Shiyu_Huang 阅读(380) 评论(0) 推荐(0)

Loss is its own Reward: Self-Supervision for Reinforcement Learning

摘要：作者用action, reward, state等当做lalbel，进行有监督训练。阅读全文

posted @ 2018-03-12 17:37 Shiyu_Huang 阅读(635) 评论(0) 推荐(0)

QMDP-Net: Deep Learning for Planning under Partial Observability

摘要：一篇用deep neural network做POMDP的论文阅读全文

posted @ 2018-03-12 17:23 Shiyu_Huang 阅读(629) 评论(0) 推荐(0)

LEARNING TO NAVIGATE IN COMPLEX ENVIRONMENTS

摘要：任务是地图里面导航，让agent从起始点到达指定位置。用了supervised learning + reinforcement learning + lstm 用supervised learning当做辅助训练，加速rl训练，用lstm当做memory。实验表明depth constructi 阅读全文

posted @ 2018-03-12 14:55 Shiyu_Huang 阅读(609) 评论(0) 推荐(0)

Playing FPS Games with Deep Reinforcement Learning

摘要：论文不同点：（1）用两套网络分别实现移动和射击。（2）使用LSTM来处理不完全信息。疑问：（1）为什么对于射击使用RNN，对导航却没有使用RNN。一般来说，当我们看见视野里面有敌人的时候，我们可以立即进行射击，似乎不太需要长久的历史信息，再官方给出的视频中，我也没发现RNN有什么好处。另一方阅读全文

posted @ 2018-02-26 14:53 Shiyu_Huang 阅读(550) 评论(0) 推荐(0)

AttributeError: 'module' object has no attribute 'RAND_LIMIT'

摘要：解决办法：pip install box2d-py 阅读全文

posted @ 2017-09-18 10:45 Shiyu_Huang 阅读(885) 评论(0) 推荐(0)

Incentivizing exploration in reinforcement learning with deep predictive models

摘要：Stadie, Bradly C., Sergey Levine, and Pieter Abbeel. "Incentivizing exploration in reinforcement learning with deep predictive models." arXiv preprint 阅读全文

posted @ 2017-08-13 19:00 Shiyu_Huang 阅读(429) 评论(0) 推荐(0)

RL Problems

摘要：1.Delayed, sparse reward(feedback), Long-term planning Hierarchical Deep Reinforcement Learning, Sub-goal, SAMDP, optoins, Thompson sampling, Boltzman 阅读全文

posted @ 2017-08-13 15:47 Shiyu_Huang 阅读(262) 评论(0) 推荐(0)

Graying the black box: Understanding DQNs

摘要：Zahavy, Tom, Nir Ben-Zrihem, and Shie Mannor. "Graying the black box: Understanding DQNs." International Conference on Machine Learning. 2016. 这篇论文想要做阅读全文

posted @ 2017-08-13 14:56 Shiyu_Huang 阅读(384) 评论(0) 推荐(0)

Deep Reinforcement Learning from Self-Play in Imperfect-Information Games

摘要：Heinrich, Johannes, and David Silver. "Deep reinforcement learning from self-play in imperfect-information games." arXiv preprint arXiv:1603.01121(201 阅读全文

posted @ 2017-08-11 21:08 Shiyu_Huang 阅读(796) 评论(0) 推荐(0)

Good Sentences for RL

该文被密码保护。

posted @ 2017-08-11 20:07 Shiyu_Huang 阅读(5) 评论(0) 推荐(0)

Asynchronous Methods for Deep Reinforcement Learning(A3C)

摘要：Mnih, Volodymyr, et al. "Asynchronous methods for deep reinforcement learning." International Conference on Machine Learning. 2016. DeepMind rl系列文章之一。阅读全文

posted @ 2017-08-10 17:36 Shiyu_Huang 阅读(602) 评论(0) 推荐(0)

Mastering the game of Go with deep neural networks and tree search浅析

摘要：Silver, David, et al. "Mastering the game of Go with deep neural networks and tree search." Nature 529.7587 (2016): 484-489. 推荐PPT:https://wenku.baidu 阅读全文

posted @ 2017-06-12 19:49 Shiyu_Huang 阅读(1524) 评论(0) 推荐(0)

黄世宇@小鹏汽车，OpenRL Lab负责人，强化学习，LLM，VLM, GUI Agent, Omni
[OpenRL][知乎][GitHub][Linkedin]
如果你对人工智能前沿感兴趣，欢迎联系并加入我们！

随笔分类 - Reinforcement Learning

黄世宇@小鹏汽车，OpenRL Lab负责人，强化学习，LLM，VLM, GUI Agent, Omni[OpenRL][知乎][GitHub][Linkedin]如果你对人工智能前沿感兴趣，欢迎联系并加入我们！

随笔分类 - Reinforcement Learning

黄世宇@小鹏汽车，OpenRL Lab负责人，强化学习，LLM，VLM, GUI Agent, Omni
[OpenRL][知乎][GitHub][Linkedin]
如果你对人工智能前沿感兴趣，欢迎联系并加入我们！