【RL】第6课-随机近似与随机梯度下降-

第6课-随机近似与随机梯度下降

6.1 Motivating examples

Mean Estimation

Revisit the mean estimation problem:

  • Consider a random variable X.
  • Our aim is to estimate E[X].
  • Suppose that we collected a sequence of iid samples {xi}i=1N.
  • The expectation of X can be approximated by

E[X]x¯:=1Ni=1Nxi.

采样N次,把所有数据收集起来求平均

We already know from the last lecture:

  • This approximation is the basic idea of Monte Carlo estimation.
  • We know that x¯E[X] as N. x¯会逐渐趋近真实值
    Why do we care about mean estimation so much?
  • Many values in RL such as state/action values are defined as means. 这些均值需要用数据去估计

迭代计算均值

incremental and iterative manner? 来几个就先计算几个,效率更高

假设:

wk+1=1ki=1kxi,k=1,2,

可以得到:

wk+1=wk1k(wkxk)

wkE[X] as k

6.2 Robbins-Monro algorithm

Stochastic approximation (SA)

  • SA refers to a broad class of stochastic iterative algorithms solving root finding or optimization problems.
  • Compared to many other root-finding algorithms such as
    gradient-based methods, SA is powerful in the sense that it does not require to know the expression of the objective function nor its derivative.

Problem statement

Suppose we would like to find the root of the equation

g(w)=0,

where wR is the variable to be solved and g:RR is a function.

  • Many problems can be eventually converted to this root finding problem. For example, suppose J(w) is an objective function to be minimized. Then, the optimization problem can be converged to

g(w)=wJ(w)=0

梯度为0

  • Note that an equation like g(w)=c with c as a constant can also be converted to the above equation by rewriting g(w)c as a new function.

RM算法

求解g(w)=0的问题

The Robbins-Monro (RM) algorithm can solve this problem:

wk+1=wkakg~(wk,ηk),k=1,2,3,

where

  • wk is the k th estimate of the root
  • g~(wk,ηk)=g(wk)+ηk is the k th noisy observation
  • ak is a positive coefficient.
    The function g(w) is a black box! This algorithm relies on data:
  • Input sequence: {wk}
  • Noisy output sequence: {g~(wk,ηk)}

Stochastic gradient descent (SGD) algorithms

Suppose we aim to solve the following optimization problem:

minwJ(w)=E[f(w,X)]

Method 1: gradient descent (GD)

wk+1=wkαkwE[f(wk,X)]=wkαkE[wf(wk,X)]

Drawback: the expected value is difficult to obtain.

Method 2: batch gradient descent (BGD)

E[wf(wk,X)]1ni=1nwf(wk,xi)wk+1=wkαk1ni=1nwf(wk,xi)

Drawback: it requires many samples in each iteration for each wk.

Method 3: stochastic gradient descent (SGD)

wk+1=wkαkwf(wk,xk)

  • Compared to the gradient descent method: Replace the true gradient E[wf(wk,X)] by the stochastic gradient wf(wk,xk).
  • Compared to the batch gradient descent method: let n=1.
posted @   鸽鸽的书房  阅读(143)  评论(0编辑  收藏  举报
相关博文:
阅读排行:
· 震惊!C++程序真的从main开始吗?99%的程序员都答错了
· winform 绘制太阳,地球,月球 运作规律
· 【硬核科普】Trae如何「偷看」你的代码?零基础破解AI编程运行原理
· 上周热点回顾(3.3-3.9)
· 超详细:普通电脑也行Windows部署deepseek R1训练数据并当服务器共享给他人
点击右上角即可分享
微信分享提示