Deep Learning 深度学习 Notes Chapter 4 Numerical Computation
1. Overflow and underflow
- Underflow occurs when numbers near zero are rounded to zero
- Overflow occurswhen numbers with large magnitude are approximated as \(-\infty\) or \(\infty\)
\(\\\)
\(\Large\text{Note:}\) One example of a function that must be stabilized against underflow and overflow is the \(\textbf{softmax}\) function. The \(\textbf{softmax}\) function is often used to predict the probabilities associated with a multinoulli distribution:
2. Pool Conditioning
Consider a function \(f(x) = A^{-1}x\). When \(A\in\mathbb{R}^{n\times n}\) has eigenvalue decomposition, the \(\textbf{Condition Number }\)is:
This is the ratio of the magnitude of the largest and smallest eigenvalue. Whenthis number is large, matrix inversion is particularly sensitive to error in the input.
3. Jacobian and Hessian Matrices
Sometimes we need to find all the partial derivatives of a function whose input and output are both vectors. The matrix containing all such partial derivatives is known as a \(\textbf{Jacobian matrix}\). Specifically, if we have a function: \(f:\mathbb{R}^m\rightarrow \mathbb{R}^n\), then the Jacobian matrix \(\mathbf{J}\in\mathbb{R}^{n\times m}\) of \(f\) is defined as :
\(\textbf{Hessian}\) matrix \(\mathbf{H}(f)(x)\) is defined as:
Equivalently, the \(\textbf{Hessian}\) is the Jacobian of the gradient.
\(\Large\textbf{Note: the differential operators are commutative}\)
This implies that \(H_{i,j} = H_{j,i}\) \(\Rightarrow \textbf{Symmetric}\).
Because the Hessian matrix is real and symmetric, we can decompose it into a set of real \(\textbf{eigenvalues}\) and an \(\textbf{orthogonal basis}\) of eigenvectors.
4. Constrained Optimization
The Karush–Kuhn–Tucker(KKT) approach provides a very general solution to constrained optimization. In detail,
where \(f(x)\) is the objective function, and \(g^{(i)}(x)=0, \forall i\) is the \(\textbf{equality constraint}\), \(h^{(j)}(x)\leq 0, \forall j\) is the \(\textbf{inequality constraint}\).
As long as at least one feasible point exists and \(f(x)\) is not permitted to have a value \(\infty\), then
has has the same optimal objective function value and set of optimal points \(x\) as