目录
- ICRA 2017
- https://ieeexplore.ieee.org/abstract/document/7989186
- parameters of policies
- Entropy Search, BO algorithm
- combine 2 sources: cheap but inaccurate, accurate but expensive
- fewer experiments than standard BO on the physical system only
- a general but naive kernel
1 Introduction
- depend on a small set of tuning params
- two-stage, warm-start, prior
- transfer learning
- ...
- BO, GP
- directly real
- multi-task, transfer between
- entropy to measure, trade off
- rename, lack of accuracy, cost
- retrieving an evaluation? effort
2 Problem Statement
- query the robot, take time
- solve, more efficiently, fewer evaluations
- choose: \(\theta_n\), and sim or real
3 Preliminaries
- GP
- approximate the unknown function, nonlinear map
- random variables so that any finite number ... joint Gaussian distribution
- observation, noisy, \(\hat J(\theta)=J(\theta)+\omega(\theta)\)
- \(\mu_n,m\): posterior, prior
- \(\hat y_n\): deviations
- BO, determine the global optimum
- ES, maximally reduce
- around the minima
- most informative, \(\Delta H(\theta)\), retrieve a new cost value
- best guess: \(\theta_{bg}\ne \theta_{n+1}\)
- full derivation beyond the scope
4 Reinforcement Learning with Simulations
- modeling the errors of the simulator in a principled way and trading off evaluation effort and information gain
- one simulation -> arbitrary number
- cost (being optimized) being partly explained, \(J(\theta)=J_{sim}(\theta)+J_{err}(\theta)\)
- parameter vector, additional binary, \(\delta\)
- sim: only covariances between, captured by \(k_{sim}\)
- both physics: \(k_{err}\), covariate strongly
- synthetic example
- blue: partly, red: directly
- noise of measurement
- quantify the goal, ES, low entropy
- \(\delta=0\) only provides information about part of the cost, \(J_{sim}\)
- trade off, do not require tuning, lead to more experiments on the physical...
- entropy, consistent unit of measurement for both information sources
- best gain per unit of effort
- "whether the simulator is reliable enough to lead to additional information"
- switch, not 2-stage, the quality is not known in advance
5 Experimental Results
- cart-pole, Quanser Linear Inverted Pendulum
- Simulink model, manufactor, simulator
- static state-feedback controller
- cost function, control
- LQR, two parameters, prior
- systematically, lower prior
- GP model, hyperparam, convergence
- MF-ES and ES
- 10 times
- physical experiments, sim experiments (good illustration! no word games!)