initial_h

2021年7月5日

RECURRENT EXPERIENCE REPLAY IN DISTRIBUTED REINFORCEMENT LEARNING（R2D2）

摘要： **发表时间：**2019 ICLR **文章要点：**文章在Q-learning的基础上加上rnn结构，提出了解决rnn中hidden state如何用来更新的问题。以前的rnn的初始hidden state直接输入0，这会造成和真正的hidden state有偏差的问题（initial recu 阅读全文

posted @ 2021-07-05 09:11 initial_h 阅读(197) 评论(0) 推荐(0) 编辑

2021年6月28日

强化学习导论课后习题参考 - Chapter 11,12,13

摘要： Reinforcement Learning: An Introduction (second edition) - Chapter 11,12,13 Contents Chapter 1,2 Chapter 3,4 Chapter 5,6 Chapter 7,8 Chapter 9,10 Chap 阅读全文

posted @ 2021-06-28 11:20 initial_h 阅读(1095) 评论(0) 推荐(0) 编辑

2021年6月20日

强化学习导论课后习题参考 - Chapter 9,10

摘要： Reinforcement Learning: An Introduction (second edition) - Chapter 9,10 Contents Chapter 1,2 Chapter 3,4 Chapter 5,6 Chapter 7,8 Chapter 9,10 Chapter 阅读全文

posted @ 2021-06-20 04:13 initial_h 阅读(1142) 评论(0) 推荐(0) 编辑

2021年6月1日

强化学习导论课后习题参考 - Chapter 7,8

摘要： Reinforcement Learning: An Introduction (second edition) - Chapter 7,8 Contents Chapter 1,2 Chapter 3,4 Chapter 5,6 Chapter 7,8 Chapter 9,10 Chapter 1 阅读全文

posted @ 2021-06-01 13:44 initial_h 阅读(1132) 评论(0) 推荐(0) 编辑

2021年5月26日

强化学习导论课后习题参考 - Chapter 5,6

摘要： Reinforcement Learning: An Introduction (second edition) - Chapter 5,6 Contents Chapter 1,2 Chapter 3,4 Chapter 5,6 Chapter 7,8 Chapter 9,10 Chapter 1 阅读全文

posted @ 2021-05-26 07:17 initial_h 阅读(2261) 评论(0) 推荐(0) 编辑

2021年3月25日

强化学习导论课后习题参考 - Chapter 3,4

摘要： Reinforcement Learning: An Introduction (second edition) - Chapter 3,4 Contents Chapter 1,2 Chapter 3,4 Chapter 5,6 Chapter 7,8 Chapter 9,10 Chapter 1 阅读全文

posted @ 2021-03-25 06:08 initial_h 阅读(4107) 评论(0) 推荐(0) 编辑

2021年3月11日

强化学习导论课后习题参考 - Chapter 1,2

摘要： Reinforcement Learning: An Introduction (second edition) - Chapter 1,2 Contents Chapter 1,2 Chapter 3,4 Chapter 5,6 Chapter 7,8 Chapter 9,10 Chapter 1 阅读全文

posted @ 2021-03-11 06:48 initial_h 阅读(2165) 评论(0) 推荐(0) 编辑

2020年5月10日

基于胜率矩阵的PageRank排序

摘要：在做博弈模型评估的时候，遇到一个问题是如何评价多个模型的优劣。例如我有训练好的三个围棋模型A,B,C，两两之间对打之后有一个胜负关系，如何对这三个模型进行排序呢？通常对于人类选手这种水平有波动的情形，棋类比赛通常计算选手Elo得分按分值排序，足球篮球等通过联赛积分或胜场进行排序，但对于固定不变的AI 阅读全文

posted @ 2020-05-10 12:06 initial_h 阅读(1193) 评论(0) 推荐(0) 编辑

2020年1月18日

Random Thoughts on Deep Reinforcement Learning

摘要： About model based and model free Model free methods cannot be the future of reinforcement learnig, even though these algorithms perform better than mo 阅读全文

posted @ 2020-01-18 01:08 initial_h 阅读(239) 评论(0) 推荐(0) 编辑

2019年3月12日

《Population Based Training of Neural Networks》论文解读

摘要：很早之前看到这篇文章的时候，觉得这篇文章的思想很朴素，没有让人眼前一亮的东西就没有太在意。之后读到很多Multi-Agent或者并行训练的文章，都会提到这个算法，比如第一视角多人游戏(Quake III Arena Capture the Flag)的超人表现，NeurIPS2018首届多智能体竞赛阅读全文

posted @ 2019-03-12 20:06 initial_h 阅读(5530) 评论(0) 推荐(0) 编辑

2018年12月14日

AlphaZero并行五子棋AI

摘要： AlphaZero Gomoku MPI Link Github : "AlphaZero Gomoku MPI" Overview This repo is based on "junxiaosong/AlphaZero_Gomoku" , sincerely grateful for it. I 阅读全文

posted @ 2018-12-14 13:34 initial_h 阅读(2314) 评论(0) 推荐(4) 编辑

2018年8月13日

Gumbel-Softmax Trick和Gumbel分布

摘要：之前看MADDPG论文的时候，作者提到在离散的信息交流环境中，使用了Gumbel-Softmax estimator。于是去搜了一下，发现该技巧应用甚广，如深度学习中的各种GAN、强化学习中的A2C和MADDPG算法等等。只要涉及在离散分布上运用重参数技巧时(re-parameterization) 阅读全文

posted @ 2018-08-13 17:03 initial_h 阅读(63287) 评论(20) 推荐(19) 编辑

2018年8月6日

《Multi-Agent Actor-Critic for Mixed Cooperative-Competitive Environments》论文解读

摘要： "MADDPG原文链接" "OpenAI blog" "DDPG链接" 目录 "一、摘要" "二、效果展示" "三、方法细节" "问题分析" "具体方法" "伪代码" "网络结构" "四、实验结果" "五、总结" "附录" "Proposition 1" 一、摘要文章探索了多智能体(multi a 阅读全文

posted @ 2018-08-06 13:15 initial_h 阅读(18287) 评论(16) 推荐(7) 编辑

2018年7月28日

《Playing hard exploration games by watching YouTube》论文解读

摘要：论文链接油管链接一、摘要当环境奖励特别稀疏的时候，强化学习方法通常很难训练(traditionally struggle)。一个有效的方式是通过人类示范者(human demonstrator)提供模仿轨迹(imitate trajectories)来指导强化学习的探索方向，通常的做法是观看人阅读全文

posted @ 2018-07-28 12:53 initial_h 阅读(1210) 评论(0) 推荐(0) 编辑

2018年7月17日

MDP中值函数的求解

摘要： MDP概述马尔科夫决策过程(Markov Decision Process)是强化学习(reinforcement learning)最基本的模型框架。它对序列化的决策过程做了很多限制。比如状态$S_t$和动作$a_t$只有有限个、$(S_t,a_t)$对应的回报$R_t$ 阅读全文

posted @ 2018-07-17 10:52 initial_h 阅读(4812) 评论(0) 推荐(1) 编辑

https://github.com/initial-h

公告