【RL】第6课-随机近似与随机梯度下降-
第6课-随机近似与随机梯度下降
6.1 Motivating examples
Mean Estimation
Revisit the mean estimation problem:
- Consider a random variable .
- Our aim is to estimate .
- Suppose that we collected a sequence of iid samples .
- The expectation of can be approximated by
采样N次,把所有数据收集起来求平均
We already know from the last lecture:
- This approximation is the basic idea of Monte Carlo estimation.
- We know that as . 会逐渐趋近真实值
Why do we care about mean estimation so much? - Many values in RL such as state/action values are defined as means. 这些均值需要用数据去估计
迭代计算均值
incremental and iterative manner? 来几个就先计算几个,效率更高
假设:
可以得到:
6.2 Robbins-Monro algorithm
Stochastic approximation (SA)
- SA refers to a broad class of stochastic iterative algorithms solving root finding or optimization problems.
- Compared to many other root-finding algorithms such as
gradient-based methods, SA is powerful in the sense that it does not require to know the expression of the objective function nor its derivative.
Problem statement
Suppose we would like to find the root of the equation
where is the variable to be solved and is a function.
- Many problems can be eventually converted to this root finding problem. For example, suppose is an objective function to be minimized. Then, the optimization problem can be converged to
梯度为0
- Note that an equation like with as a constant can also be converted to the above equation by rewriting as a new function.
RM算法
求解的问题
The Robbins-Monro (RM) algorithm can solve this problem:
where
- is the th estimate of the root
- is the th noisy observation
- is a positive coefficient.
The function is a black box! This algorithm relies on data: - Input sequence:
- Noisy output sequence:
Stochastic gradient descent (SGD) algorithms
Suppose we aim to solve the following optimization problem:
Method 1: gradient descent (GD)
Drawback: the expected value is difficult to obtain.
Method 2: batch gradient descent (BGD)
Drawback: it requires many samples in each iteration for each wk.
Method 3: stochastic gradient descent (SGD)
- Compared to the gradient descent method: Replace the true gradient by the stochastic gradient .
- Compared to the batch gradient descent method: let .
【推荐】国内首个AI IDE,深度理解中文开发场景,立即下载体验Trae
【推荐】编程新体验,更懂你的AI,立即体验豆包MarsCode编程助手
【推荐】抖音旗下AI助手豆包,你的智能百科全书,全免费不限次数
【推荐】轻量又高性能的 SSH 工具 IShell:AI 加持,快人一步
· 震惊!C++程序真的从main开始吗?99%的程序员都答错了
· winform 绘制太阳,地球,月球 运作规律
· 【硬核科普】Trae如何「偷看」你的代码?零基础破解AI编程运行原理
· 上周热点回顾(3.3-3.9)
· 超详细:普通电脑也行Windows部署deepseek R1训练数据并当服务器共享给他人