学习笔记 | Morvan - Reinforcement Learning, Part 4: Deep Q Network
Deep Q Network
Deep Q Network 的简称叫 DQN, 是将 Q learning 的优势 和 Neural networks 结合了.
Notes
Psudocode
Deep Q-learning Algorithm
This gives us the final deep Q-learning algorithm with experience replay:
There are many more tricks that DeepMind used to actually make it work – like target network, error clipping, reward clipping etc, but these are out of scope for this introduction.
The most amazing part of this algorithm is that it learns anything at all. Just think about it – because our Q-function is initialized randomly, it initially outputs complete garbage. And we are using this garbage (the maximum Q-value of the next state) as targets for the network, only occasionally folding in a tiny reward. That sounds insane, how could it learn anything meaningful at all? The fact is, that it does.
Extension
- Using Keras and Deep Q-Network to Play FlappyBird | Ben Lau
- Demystifying Deep Reinforcement Learning
- The above post is a must-read for those who are interested in deep reinforcement learning.