lee_ing

2023年6月20日

项目总结-瑞吉外卖

摘要：瑞吉外卖学习总结阅读全文

posted @ 2023-06-20 12:30 lee_ing 阅读(1131) 评论(0) 推荐(0)

2023年6月4日

Jan 2022-Model-augmented Prioritized Experience Replay

摘要：利用基于模型的强化学习(MbRL)中组件驱动的新可学习特征来计算经验得分阅读全文

posted @ 2023-06-04 12:13 lee_ing 阅读(57) 评论(0) 推荐(0)

Apr 2021-Lucid Dreaming for Experience Replay: Refreshing Past States with the Current Policy

摘要：本文提出了用于经验回放的清醒梦(LiDER)，一个概念上的新框架，允许通过利用智能体的当前策略来刷新回放体验。阅读全文

posted @ 2023-06-04 10:26 lee_ing 阅读(23) 评论(0) 推荐(0)

2023年5月31日

April 2023-Memory-efficient Reinforcement Learning with Value-based Knowledge Consolidation

摘要：本文基于深度q网络算法提出了记忆高效的强化学习算法来缓解这一问题。通过将目标q网络中的知识整合Knowledge Consolidation到当前q网络中，所提算法减少了遗忘并保持了较高的样本效率。阅读全文

posted @ 2023-05-31 12:19 lee_ing 阅读(118) 评论(0) 推荐(0)

2023年5月23日

Feb 2023-Replay Memory as An Empirical MDP: Combining Conservative Estimation with Experience Replay

摘要：将 replay memory视为经验 replay memory MDP (RM-MDP)，并通过求解该经验MDP获得一个保守估计。MDP是非平稳的，可以通过采样有效地更新。基于保守估计设计了价值和策略正则化器，并将其与经验回放(CEER)相结合来正则化DQN的学习。阅读全文

posted @ 2023-05-23 18:07 lee_ing 阅读(122) 评论(0) 推荐(0)

2023年5月21日

Sep 2022-Prioritized Training on Points that are Learnable, Worth Learning, and Not Yet Learnt

摘要：提出了Reducible Holdout Loss Selection (RHOLOSS)，一种简单但有原则的技术，近似地选择那些最能减少模型泛化损失的点进行训练阅读全文

posted @ 2023-05-21 17:54 lee_ing 阅读(104) 评论(0) 推荐(0)

June 2021-Continuous Transition: Improving Sample Efficiency for Continuous Control Problems via MixUp

摘要：本文建议通过对连续transition进行线性插值来合成新的transition用于训练。为了保持构建的transition的真实性，还开发了一个鉴别器来自动指导构建过程阅读全文

posted @ 2023-05-21 10:44 lee_ing 阅读(28) 评论(0) 推荐(0)

2023年5月20日

May 2022-Neighborhood Mixup Experience Replay: Local Convex Interpolation for Improved Sample Efficiency in Continuous Control Tasks

摘要：提出了邻域混合经验回放(NMER)，一种基于几何的回放缓冲区，用状态-动作空间中最近邻的transition进行插值。NMER仅通过混合transition与邻近状态-动作特征来保持trnaistion流形的局部线性近似。阅读全文

posted @ 2023-05-20 11:28 lee_ing 阅读(60) 评论(0) 推荐(0)

2023年5月19日

May 2022-Composite Experience Replay-Based Deep Reinforcement Learning With Application in Wind Farm Control

摘要：设计了一种新的复合体验重放(CER)策略，并将其嵌入到深度确定性策略梯度(DDPG)算法中。CER提供了一种新的采样方案，**通过在奖励和时间差异(TD)误差之间进行权衡，可以深入挖掘存储变迁的信息阅读全文

posted @ 2023-05-19 17:35 lee_ing 阅读(29) 评论(0) 推荐(0)

Jan 2022-Actor-critic with familiarity-based trajectory experience replay

摘要：摘要：深度强化学习通过智能体与环境进行交互获取原始输入信息，从而学习动作策略，通过不断地试错逐步形成具有强大学习能力的智能体。本文旨在解决深度强化学习中著名的异步优势行动者评论家算法A3C样本效率低下的问题。首先，设计了一种新的离策略actor-critic算法，该算法在在策略actor-crit 阅读全文

posted @ 2023-05-19 11:52 lee_ing 阅读(88) 评论(0) 推荐(0)

公告