• ICRA 2017
  • https://ieeexplore.ieee.org/abstract/document/7989186
  • parameters of policies
  • Entropy Search, BO algorithm
  • combine 2 sources: cheap but inaccurate, accurate but expensive
  • fewer experiments than standard BO on the physical system only
  • a general but naive kernel

1 Introduction

  • depend on a small set of tuning params
  • two-stage, warm-start, prior
  • transfer learning
  • ...
  • BO, GP
    • directly real
  • multi-task, transfer between
  • entropy to measure, trade off
  • rename, lack of accuracy, cost
    • retrieving an evaluation? effort

2 Problem Statement

  • query the robot, take time
  • solve, more efficiently, fewer evaluations
  • choose: \(\theta_n\), and sim or real

3 Preliminaries

  • GP
    • approximate the unknown function, nonlinear map
    • random variables so that any finite number ... joint Gaussian distribution
    • observation, noisy, \(\hat J(\theta)=J(\theta)+\omega(\theta)\)
    • \(\mu_n,m\): posterior, prior
      • \(\hat y_n\): deviations
  • BO, determine the global optimum
  • ES, maximally reduce
  • around the minima
  • most informative, \(\Delta H(\theta)\), retrieve a new cost value
  • best guess: \(\theta_{bg}\ne \theta_{n+1}\)
  • full derivation beyond the scope

4 Reinforcement Learning with Simulations

  • modeling the errors of the simulator in a principled way and trading off evaluation effort and information gain
  • one simulation -> arbitrary number
  • cost (being optimized) being partly explained, \(J(\theta)=J_{sim}(\theta)+J_{err}(\theta)\)
  • parameter vector, additional binary, \(\delta\)
  • sim: only covariances between, captured by \(k_{sim}\)
    • both physics: \(k_{err}\), covariate strongly
  • synthetic example
    • blue: partly, red: directly
  • noise of measurement
  • quantify the goal, ES, low entropy
    • \(\delta=0\) only provides information about part of the cost, \(J_{sim}\)
  • trade off, do not require tuning, lead to more experiments on the physical...
  • entropy, consistent unit of measurement for both information sources
  • best gain per unit of effort
    • "whether the simulator is reliable enough to lead to additional information"
  • switch, not 2-stage, the quality is not known in advance

5 Experimental Results

  • cart-pole, Quanser Linear Inverted Pendulum
  • Simulink model, manufactor, simulator
  • static state-feedback controller
  • cost function, control
  • LQR, two parameters, prior
  • systematically, lower prior
  • GP model, hyperparam, convergence
  • MF-ES and ES
    • 10 times
    • physical experiments, sim experiments (good illustration! no word games!)