initial_h

2022年4月18日

Encoding Human Domain Knowledge to Warm Start Reinforcement Learning

摘要： **发表时间：**2020（AAAI 2021） **文章要点：**这篇文章提出Propositional Logic Nets (PROLONETS)，通过建立决策树的方式来初始化神经网络的结构和权重，从而将人类知识嵌入到神经网络中作为初始化warm start，然后进行强化学习。具体的，就是先阅读全文

posted @ 2022-04-18 07:57 initial_h 阅读(127) 评论(0) 推荐(1) 编辑

TREEQN AND ATREEC: DIFFERENTIABLE TREE-STRUCTURED MODELS FOR DEEP REINFORCEMENT LEARNING

摘要： **发表时间：**2018（ICLR 2018） **文章要点：**这篇文章设计了特别的网络结构，将树结构嵌入到神经网络中，实现了look-ahead tree的online planning，将model free和online planning结合起来，并提出了TreeQN和ATreeC算法。并阅读全文

posted @ 2022-04-18 07:54 initial_h 阅读(77) 评论(0) 推荐(0) 编辑

2022年4月12日

A Framework for Reinforcement Learning and Planning

摘要： **发表时间：**2020 **文章要点：**这篇文章是篇综述，主要从RL和planning的异同入手，总结了对解决MDP这类问题的公共因素，放到一起称作framework for reinforcement learning and planning (FRAP)的框架。首先文章提出，RL和pla 阅读全文

posted @ 2022-04-12 07:21 initial_h 阅读(95) 评论(0) 推荐(0) 编辑

2022年4月3日

Forward-Backward Reinforcement Learning

摘要： **发表时间：**2018 **文章要点：**这篇文章提出了Forward-Backward Reinforcement Learning (FBRL)算法，在假设reward function和goal已知的情况下，将model free的forward step和model based的back 阅读全文

posted @ 2022-04-03 13:19 initial_h 阅读(111) 评论(0) 推荐(0) 编辑

2022年3月31日

Discriminator Augmented Model-Based Reinforcement Learning

摘要： **发表时间：**2021 **文章要点：**这篇文章提出了Discriminator Augmented MBRL (DAM)算法，文章想说model based RL里面，学到的model是不准确的，这个问题也是很难避免的，于是作者换了一个思路，不去修正model，而是通过importance 阅读全文

posted @ 2022-03-31 09:08 initial_h 阅读(44) 评论(0) 推荐(0) 编辑

Bellman: A Toolbox for Model-Based Reinforcement Learning in TensorFlow

摘要： **发表时间：**2021 **文章要点：**文章基于TF-Agent库（model free RL）设计了一个model based RL的库，主要包括三个模块，Environment Model，Agent Classes和Experiment Harness。Environment Model 阅读全文

posted @ 2022-03-31 09:05 initial_h 阅读(37) 评论(0) 推荐(0) 编辑

2022年3月19日

TEMPORAL DIFFERENCE MODELS: MODEL-FREE DEEP RL FOR MODEL-BASED CONTROL

摘要： **发表时间：**2018（ICLR 2018） **文章要点：**这篇文章提出了temporal difference models(TDMs)算法，把goal-conditioned value functions和dynamics model联系起来，建立了model-free和model-b 阅读全文

posted @ 2022-03-19 12:09 initial_h 阅读(104) 评论(0) 推荐(0) 编辑

2022年3月13日

MBMF: Model-Based Priors for Model-Free Reinforcement Learning

摘要： **发表时间：**2017 **文章要点：**这篇文章提出了一个Model-Based Model-Free (MBMF)算法，通过学习一个dynamics model然后作为先验来做model free optimization，这里的model free optimization指的是基于Gau 阅读全文

posted @ 2022-03-13 10:16 initial_h 阅读(292) 评论(0) 推荐(0) 编辑

2022年3月10日

Model-Based Reinforcement Learning via Latent-Space Collocation

摘要： **发表时间：**2021（ICML 2021） **文章要点：**这篇文章提出了latent collocation method (LatCo)算法，用来planning状态序列，而不是动作序列，来解决long horizon的planning问题（it is easier to solve l 阅读全文

posted @ 2022-03-10 12:23 initial_h 阅读(73) 评论(0) 推荐(0) 编辑

Neural Network Dynamics for Model-Based Deep Reinforcement Learning with Model-Free Fine-Tuning

摘要： **发表时间：**2018（ICRA 2018） **文章要点：**这篇文章提出了一个叫model-based and model-free (Mb-Mf)的算法，先用model based的方法训一个policy，再用model free的方法来fine tune。具体的，先学一个model，然后阅读全文

posted @ 2022-03-10 12:17 initial_h 阅读(153) 评论(0) 推荐(0) 编辑

2022年2月26日

Model-Based Deep Reinforcement Learning for High-Dimensional Problems, a Survey

摘要： **发表时间：**2020 **文章要点：**这篇文章主要是deep的model based RL的综述，说起来主要的目标就是一句话achieve high predictive power while maintaining low sample complexity. 主要分了三大类using 阅读全文

posted @ 2022-02-26 13:10 initial_h 阅读(69) 评论(0) 推荐(0) 编辑

2022年2月19日

Model-based Reinforcement Learning: A Survey

摘要： **发表时间：**2021 **文章要点：**一篇综述，主要从dynamics model learning，planning-learning integration和implicit model-based RL三个方面介绍。dynamics model learning包括stochastic 阅读全文

posted @ 2022-02-19 12:09 initial_h 阅读(190) 评论(0) 推荐(0) 编辑

2022年2月17日

Collect & Infer - a fresh look at data-efficient Reinforcement Learning

摘要： **发表时间：**2021 **文章要点：**一篇比较短的概念性的文章，主要想说Data-efficient RL走过了三个阶段，一个是pure on-line RL，就是数据来了用一次就扔；第二个是RL with a replay buffer，数据来了会存到一个容量有限的buffer里，数据可以阅读全文

posted @ 2022-02-17 12:38 initial_h 阅读(64) 评论(0) 推荐(0) 编辑

2022年2月13日

Imagination-Augmented Agents for Deep Reinforcement Learning

摘要： **发表时间：**2017(NIPS 2017) **文章要点：**文章提出了一个叫Imagination-Augmented Agents (I2As)的算法，结合了model free和model based，主要的点不是planning，而是把在model里planning的轨迹encode到阅读全文

posted @ 2022-02-13 11:36 initial_h 阅读(104) 评论(0) 推荐(0) 编辑

2022年2月12日

MODEL-ENSEMBLE TRUST-REGION POLICY OPTIMIZATION

摘要： **发表时间：**2018（ICLR 2018） **文章要点：**这篇文章用ensemble的方式来度量model uncertainty，然后用来调整训练，避免policy利用model训练不充分（model bias）的地方进行学习，从而影响训练效果（policy optimization t 阅读全文

posted @ 2022-02-12 13:24 initial_h 阅读(100) 评论(0) 推荐(0) 编辑

https://github.com/initial-h

公告