LM算法学习笔记(一)
LM算法全称为Levenberg-Marquard algorithm,在正式介绍该算法之前,我们需要先研读一下对该算法的发展有重要意义的几篇论文。首先,我们从LM算法的开篇之作(Levenberg于1944年发表)开始。
A method for the solution of certain non-linear problems in least squares
------------------------------------------------------------------------------------------------------------
引言部分:
The standard method for solving least squares problems which lead to non-linear normal equations depends upon a reduction of the residuals to linear form by first order Taylor approximations taken about an initial or trial solution for the parameters. If the usual least squares procedure, performed with these linear approximations, yields new values for the parameters which are not sufficiently close to the initial values, the neglect of second and higher order terms may invalidate the process, and may actually give rise to a larger value of the sum of the squares of the residuals than that corresponding to the initial solution. This failure of the standard method to improve the initial solution has received some notice in statistical applications of least squares and has been encountered rather frequently in connection with certain engineering applications involving the approximate representation of one function by another. The purpose of this article is to show how the problem may be solved by an extension of the standard method which insures improvement of the initial solution. The process can also be used for solving non-linear simultaneous equations, in which case it may be considered an extension of Newton’s method.
解决导致非线性法向方程的最小二乘问题的标准方法取决于通过关于参数的初始或试验解的一阶泰勒近似将残差减少到线性形式。如果使用这些线性近似执行的通常的最小二乘过程产生的参数不足以接近初始值,则忽略二阶和更高阶项可能使过程无效,并且实际上可能会产生更大的残差的平方和的值大于对应于初始解的残差的平方和。标准方法改进初始解决方案的这种失败在最小二乘法的统计应用中已经得到了一些注意,并且经常遇到涉及一个功能的近似表示的某些工程应用。本文的目的是展示如何通过扩展标准方法来解决问题,从而确保改进初始解决方案。该过程也可用于求解非线性联立方程,在这种情况下,它可以被认为是牛顿方法的扩展。——来源于谷歌翻译
从上面的引言部分我们可以得到三点信息:
1. 以往使用的标准方法存在缺陷(后面会介绍);
2. 作者提出了一种扩展方法(后面会介绍);
3. 这种方法可以被认为是牛顿法的扩展,从后续发展来看,确实可以认为是在牛顿法的基础上的改进,多数讲LM算法的博客都会先科普一下牛顿法。
------------------------------------------------------------------------------------------------------------
结论部分:
The nature of the damping which we have imposed upon the parameter variables can be given a simple geometric interpretation. For instance, if the unity weighting system is considered, the "overshooting" of the solution is prevented by damping the distance (k dimensional) from the initial solution point, since Q is then the square of this distance. By this restriction of k dimensional distance (which would appear to be a natural way to prevent overshooting), we are not obliged to decide on an arbitrary preassigned procedure restricting the variables individually, as is done, for example, by the method of Cauchy (l.c.). The greater freedom given the individual variables by the method of damped least squares may account for the fact that it has solved, with a comparatively rapid rate of convergence, types of problems which are of much greater complexity than those to which the principle of least squares is ordinarily applied.
我们对参数变量施加的阻尼的性质可以给出一个简单的几何解释。例如,如果考虑统一加权系统,则通过阻尼距离(k维)到初始解点的距离(k维)来防止解的“超调”,因为q是该距离的平方。通过对k维距离的限制(这似乎是防止超调的一种自然方法),我们没有义务决定一个任意的预先指定的程序,单独地限制变量,例如,通过柯西(L.C.)方法。通过阻尼最小二乘法赋予单个变量更大的自由度可以解释这样一个事实:它以相对较快的收敛速度解决了比通常应用最小二乘原理更复杂的问题类型。——来源于百度翻译