Deep Learning 深度学习 Notes Chapter 4 Numerical Computation

1. Overflow and underflow

Underﬂow occurs when numbers near zero are rounded to zero
Overﬂow occurswhen numbers with large magnitude are approximated as \(-\infty\) or \(\infty\)

\(\\\)

\(\Large\text{Note:}\) One example of a function that must be stabilized against underﬂow and overﬂow is the \(\textbf{softmax}\) function. The \(\textbf{softmax}\) function is often used to predict the probabilities associated with a multinoulli distribution:

\[\text{softmax}(\mathbf{x})_i = \frac{\exp(x_i)}{\sum_{j=1}^n \exp(x_j)} \]

2. Pool Conditioning

Consider a function \(f(x) = A^{-1}x\). When \(A\in\mathbb{R}^{n\times n}\) has eigenvalue decomposition, the \(\textbf{Condition Number }\)is:

\[\max_{i,j}\left |\frac{\lambda_i}{\lambda_j} \right| \]

This is the ratio of the magnitude of the largest and smallest eigenvalue. Whenthis number is large, matrix inversion is particularly sensitive to error in the input.

3. Jacobian and Hessian Matrices

Sometimes we need to ﬁnd all the partial derivatives of a function whose input and output are both vectors. The matrix containing all such partial derivatives is known as a \(\textbf{Jacobian matrix}\). Speciﬁcally, if we have a function: \(f:\mathbb{R}^m\rightarrow \mathbb{R}^n\), then the Jacobian matrix \(\mathbf{J}\in\mathbb{R}^{n\times m}\) of \(f\) is defined as :

\[J_{i,j} = \frac{\partial f(x)_i}{\partial x_j} \]

\(\textbf{Hessian}\) matrix \(\mathbf{H}(f)(x)\) is defined as:

\[\mathbf{H}(f)(x)_{i,j} = \frac{\partial^2 f(x)}{\partial x_i\partial x_j} \]

Equivalently, the \(\textbf{Hessian}\) is the Jacobian of the gradient.

\(\Large\textbf{Note: the diﬀerential operators are commutative}\)

\[\frac{\partial^2 f(x)}{\partial x_i\partial x_j} = \frac{\partial^2 f(x)}{\partial x_j\partial x_i} \]

This implies that \(H_{i,j} = H_{j,i}\) \(\Rightarrow \textbf{Symmetric}\).

Because the Hessian matrix is real and symmetric, we can decompose it into a set of real \(\textbf{eigenvalues}\) and an \(\textbf{orthogonal basis}\) of eigenvectors.

4. Constrained Optimization

The Karush–Kuhn–Tucker(KKT) approach provides a very general solution to constrained optimization. In detail,

\[L(x,\lambda,\alpha) = f(x)+\sum_i\lambda_ig^{(i)}(x)+\sum_j\alpha_jh^{(j)}(x) \]

where \(f(x)\) is the objective function, and \(g^{(i)}(x)=0, \forall i\) is the \(\textbf{equality constraint}\), \(h^{(j)}(x)\leq 0, \forall j\) is the \(\textbf{inequality constraint}\).

As long as at least one feasible point exists and \(f(x)\) is not permitted to have a value \(\infty\), then

\[\min_x\max_{\lambda}\max_{\alpha\ge 0}L(x,\lambda,\alpha) \]

has has the same optimal objective function value and set of optimal points \(x\) as

\[\min_x f(x) \]

posted on 2022-06-09 04:39 Blackzxy 阅读(22) 评论(0) 编辑收藏举报

刷新页面返回顶部

xinyu04

导航

公告