Awesome Reinforcement Learning

Awesome Reinforcement Learning

A curated list of resources dedicated to reinforcement learning.

We have pages for other topics: awesome-rnnawesome-deep-visionawesome-random-forest

Maintainers: Hyunsoo KimJiwon Kim

We are looking for more contributors and maintainers!

Contributing

Please feel free to pull requests

Table of Contents

Codes

Theory

Lectures

Books

  • Richard Sutton and Andrew Barto, Reinforcement Learning: An Introduction [Book] [Code]
  • Csaba Szepesvari, Algorithms for Reinforcement Learning [Book]
  • David Poole and Alan Mackworth, Artificial Intelligence: Foundations of Computational Agents[Book Chapter]
  • Dimitri P. Bertsekas and John N. Tsitsiklis, Neuro-Dynamic Programming [Book (Amazon)][Summary]
  • Mykel J. Kochenderfer, Decision Making Under Uncertainty: Theory and Application [Book (Amazon)]

Surveys

  • Leslie Pack Kaelbling, Michael L. Littman, Andrew W. Moore, Reinforcement Learning: A Survey, JAIR, 1996. [Paper]
  • S. S. Keerthi and B. Ravindran, A Tutorial Survey of Reinforcement Learning, Sadhana, 1994. [Paper]
  • Jens Kober, J. Andrew Bagnell, Jan Peters, Reinforcement Learning in Robotics, A Survey, IJRR, 2013. [Paper]
  • Littman, Michael L. "Reinforcement learning improves behaviour from evaluative feedback." Nature 521.7553 (2015): 445-451. [Paper]
  • Marc P. Deisenroth, Gerhard Neumann, Jan Peter, A Survey on Policy Search for Robotics, Foundations and Trends in Robotics, 2014. [Book]

Papers / Thesis

  • Foundational Papers

    • Marvin Minsky, Steps toward Artificial Intelligence, Proceedings of the IRE, 1961.[Paper]
      • discusses issues in RL such as the "credit assignment problem"
    • Ian H. Witten, An Adaptive Optimal Controller for Discrete-Time Markov Environments, Information and Control, 1977. [Paper]
      • earliest publication on temporal-difference (TD) learning rule.
  • Solution Methods

    • Dynamic Programming (DP):
      • Christopher J. C. H. Watkins, Learning from Delayed Rewards, Ph.D. Thesis, Cambridge University, 1989. [Thesis]
    • Monte Carlo:
      • Andrew Barto, Michael Duff, Monte Carlo Inversion and Reinforcement Learning, NIPS, 1994. [Paper]
      • Satinder P. Singh, Richard S. Sutton, Reinforcement Learning with Replacing Eligibility Traces, Machine Learning, 1996. [Paper]
    • Temporal-Difference:
      • Richard S. Sutton, Learning to predict by the methods of temporal differences. Machine Learning 3: 9-44, 1988. [Paper]
    • Q-Learning (Off-policy TD algorithm):
      • Chris Watkins, Learning from Delayed Rewards, Cambridge, 1989. [Thesis]
    • Sarsa (On-policy TD algorithm):
      • G.A. Rummery, M. Niranjan, On-line Q-learning using connectionist systems, Technical Report, Cambridge Univ., 1994. [Report]
      • Richard S. Sutton, Generalization in Reinforcement Learning: Successful examples using sparse coding, NIPS, 1996. [Paper]
    • R-Learning (learning of relative values)
      • Andrew Schwartz, A Reinforcement Learning Method for Maximizing Undiscounted Rewards, ICML, 1993. [Paper-Google Scholar]
    • Function Approximation methods (Least-Sqaure Temporal Difference, Least-Sqaure Policy Iteration)
      • Steven J. Bradtke, Andrew G. Barto, Linear Least-Squares Algorithms for Temporal Difference Learning, Machine Learning, 1996. [Paper]
      • Michail G. Lagoudakis, Ronald Parr, Model-Free Least Squares Policy Iteration, NIPS, 2001. [Paper] [Code]
    • Policy Search (in application to Robotics)
      • Nate Kohl, Peter Stone, Policy Gradient Reinforcement Learning for Fast Quadrupedal Locomotion, ICRA, 2004. [Paper]
      • Marc Deisenroth, Carl Rasmussen, PILCO: A Model-Based and Data-Efficient Approach to Policy Search, ICML, 2011. [Paper]
      • Jan Peters, Sethu Vijayakumar, Stefan Schaal, Natural Actor-Critic, ECML, 2005.[Paper]
      • Scott Kuindersma, Roderic Grupen, Andrew Barto, Learning Dynamic Arm Motions for Postural Recovery, Humanoids, 2011. [Paper]
    • Hierarchical RL
      • Richard Sutton, Doina Precup, Satinder Singh, Between MDPs and Semi-MDPs: A Framework for Temporal Abstraction in Reinforcement Learning, Artificial Intelligence, 1999. [Paper]
      • George Konidaris, Andrew Barto, Building Portable Options: Skill Transfer in Reinforcement Learning, IJCAI, 2007. [Paper]

Applications

Game Playing

Robotics

  • Reinforcement Learning for Humanoid Robotics (ICHR 2003) [Paper]
  • Policy Gradient Reinforcement Learning for Fast Quadrupedal Locomotion (ICRA 2004)[Paper]
  • Robot Motor SKill Coordination with EM-based Reinforcement Learning (IROS 2010) [Paper][Video]
  • Generalized Model Learning for Reinforcement Learning on a Humanoid Robot (ICRA 2010)[Paper] [Video]
  • Autonomous Skill Acquisition on a Mobile Manipulator (AAAI 2011) [Paper] [Video]
  • PILCO: A Model-Based and Data-Efficient Approach to Policy Search (ICML 2011) [Paper]
  • Incremental Semantically Grounded Learning from Demonstration (RSS 2013) [Paper]
  • Efficient Reinforcement Learning for Robots using Informative Simulated Priors (ICRA 2015)[Paper] [Video]

Control

  • An Application of Reinforcement Learning to Aerobatic Helicopter Flight (NIPS 2006) [Paper][Video]
  • Autonomous helicopter control using Reinforcement Learning Policy Search Methods (ICRA 2011) [Paper]

Operations Research

  • Scaling Average-reward Reinforcement Learning for Product Delivery (AAAI 2004) [Paper]
  • Cross Channel Optimized Marketing by Reinforcement Learning (KDD 2004) [Paper]

Human Computer Interaction

  • Optimizing Dialogue Management with Reinforcement Learning: Experiments with the NJFun System (JAIR 2002) [Paper]

Tutorials / Websites

Online Demos

posted @ 2015-11-16 19:11  菜鸡一枚  阅读(974)  评论(0编辑  收藏  举报