SMO算法
SMO算法是SVM的重要部分.
分类函数记为\(\vec{u} = (\vec{w},\vec{x}) - b\)
SMO算法分为两个部分,第一部分是求\[\Psi(\vec{\alpha}) = \frac{1}{2}\sum_{i = 1}^{N}\sum_{j = 1}^{N}y_{i}y_{j}K(\vec{x_i},\vec{x_j}) \alpha_i\alpha_j - \sum_{i = 1}^{N}\alpha_i\]
在一定约束条件下的最小值,第二部分是运用启发式的策略选择要两个更新的\(\alpha_i\)。
第一部分,求最小值
\[\min_{\vec{\alpha}}\Psi(\vec{\alpha}) = \min \frac{1}{2}\sum_{i = 1}^{N}\sum_{j = 1}^{N}y_{i}y_{j}K(\vec{x_i}, \vec{x_j})\alpha_i\alpha_j - \sum_{i = 1}^{N}\alpha_i\]
s.t.\[0\le\alpha_i\le C,\forall{\alpha_i}\]\[\sum_{i = 1}^{N}y_i\alpha_i = 0 \]
那么与之相应的KKT条件为:
\[\alpha_i = 0 \Leftrightarrow y_{i}u_{i} \ge 1\]
\[0 < \alpha_i < C \Leftrightarrow y_{i}u_{i} = 1\]
\[\alpha_i = c \Leftrightarrow y_{i}u_{i} le 1\]
KKT条件表明内部点的\(\alpha_i = 0\),支持向量的\(\alpha_i\)取值范围为\(0 < \alpha_i < C\),离群点的\(\alpha_i = C\)
那么先用第二部分的启发策略选定两个需要优化的\(\alpha_i, \alpha_j\),简记为\(\alpha_1, \alpha_2\),这样就固定了\(\alpha_i, i \ge 3\)。再利用约束条件,将\(\alpha_1\)利用\(\alpha_2\)表达,因此问题就转化为了关于\(\alpha_2\)的一元二次函数优化问题。那么将选取的\(\alpha_1, \alpha_2\)重新记为\(\alpha_1^{new}, \alpha_2^{new}\)以表明这两个是需要优化的值,原先的\(\alpha_i, i \ge 3\)表达是未进行优化之前的值,因此是定值(常量)。同时\(\alpha_1, \alpha_2\)表示是未优化之前的值,因此也是定值,并且有对于某个向量\(\vec{x_j}\),它的输出记为\(u_j\),则\(\sum_{i = 1}^{N}y_{i}\alpha_{i}K(\vec{x_i}, \vec{x_j}) = u_j + b\)。
将\(\alpha_1^{new}, \alpha_2^{new}\)代入目标函数有:
\[\Psi(\vec{\alpha}) = \Psi(\alpha_1^{new}, \alpha_2^{new}, \alpha_3, ..., \alpha_N) = \frac{1}{2}(y_{1}^{2}{\alpha_{1}^{new}}^{2}K(\vec{x_1}, \vec{x_1}) + y_{1}y_{2}{\alpha_{1}^{new}}{\alpha_{2}^{new}}K(\vec{x_1}, \vec{x_2}) + y_{1}y_{3}{\alpha_{1}^{new}}\alpha_{3}K(\vec{x_1}, \vec{x_3}) + ... + y_{1}y_{N}{\alpha_{1}^{new}}\alpha_{N}K(\vec{x_1}, \vec{x_N}) + y_{2}y_{1}{\alpha_{2}^{new}}{\alpha_{1}^{new}}k(\vec{x_2}, \vec{x_1}) + y_{2}^{2}{\alpha_{2}^{new}}^{2}K(\vec{x_2}, \vec{x_2}) + y_{2}y_{3}{\alpha_{2}^{new}}\alpha_{3}K(\vec{x_2}, \vec{x_3}) + ... + y_{2}y_{N}{\alpha_{2}^{new}}\alpha_{N}k(\vec{x_2}, \vec{x_N}) + y_{3}y_{1}\alpha_{3}{\alpha_{1}^{new}}K(\vec{x_3}, \vec{x_1}) + y_{3}y_{2}\alpha_{3}{\alpha_{2}^{new}}K(\vec{x_3}, \vec{x_2}) + y_{3}^{2}\alpha_{3}^{2}K(\vec{x_3}, \vec{x_3}) + ... + y_{3}y_{N}\alpha_{3}\alpha_{N}K(\vec{x_3}, \vec{x_N}) + ... + y_{N}y_{1}\alpha_{N}{\alpha_{1}^{new}}K(\vec{x_N}, \vec{x_1}) + y_{N}y_{2}\alpha_{N}{\alpha_{2}^{new}}K(\vec{x_N}, \vec{x_2}) + y_{N}y_{3}\alpha_{N}\alpha_{3}K(\vec{x_N}, \vec{x_1}) + ... + y_{N}^{2}\alpha_{N}^{2}K(\vec{x_N}, \vec{x_N})) - \alpha_{1}^{new} - \alpha_{2}^{new} - \sum_{i = 3}\alpha_{i} \]
记
\[K_{ij} = K(\vec{x_i}, \vec{x_j})\]
\[V_{i} = \sum_{j = 3}y_{j}\alpha_{j}K_{ij}\ = u_{i} + b - y_{1}\alpha_{1}K_{1i} - y_{2}\alpha_{2}K_{2i}\]
\[s = y_{1}y_{2}\]
则
\[\Psi(\vec{\alpha}) = \frac{1}{2}K_{11}{\alpha_{1}^{new}}^{2} + \frac{1}{2}K_{22}{\alpha_{2}^{new}}^{2} + sK_{12}{\alpha_{1}^{new}}{\alpha_{2}^{new}} + y_{1}{\alpha_{1}^{new}}V_{1} + y_{2}{\alpha_{2}^{new}}V_{2} - {\alpha_{1}^{new}} - {\alpha_{2}^{new}} + \Psi_{constant}\]
又因为有
\[y_{1}\alpha_{1} + y_{2}\alpha_{2} = -\sum_{i = 3}y_{i}\alpha_{i} = y_{1}{\alpha_{1}^{new}} + y_{2}{\alpha_{2}^{new}}\]
两边同时乘以\(y_1\)有
\[\alpha_{1} + s\alpha_{2} = -y_{1}\sum_{i = 3}y_{i}\alpha_{i} = {\alpha_{1}^{new}} + s{\alpha_{2}^{new}}\]
记
\[w = -y_{1}\sum_{i = 3}y_{i}\alpha_{i}\]
那么
\[\alpha_{1}^{new} = w - s{\alpha_{2}^{new}}\]
因此有
\[\Psi(\vec{\alpha}) = \frac{1}{2}K_{11}{(w - s{\alpha_{2}^{new}})}^{2} + \frac{1}{2}K_{22}{\alpha_{2}^{new}}^{2} + sK_{12}{(w - s{\alpha_{2}^{new}})}{\alpha_{2}^{new}} + y_{1}{(w - s{\alpha_{2}^{new}})}V_{1} + y_{2}{\alpha_{2}^{new}}V_{2} - {(w - s{\alpha_{2}^{new}})} - {\alpha_{2}^{new}} + \Psi_{constant}\]
对\(\alpha_{2}^{new}\)求导数有
\[\frac{\textbf{d}\Psi}{\textbf{d}\alpha_{2}^{new}} = -sK_{11}(w - s{\alpha_{2}^{new}}) + K_{22}{\alpha_{2}^{new}} - K_{12}{\alpha_{2}^{new}} + sK_{12}(w - s{\alpha_{2}^{new}}) - y_{2}V_{1} + s + y_{2}V_{2} - 1\]
如果二阶导数大于\(0\),那么当一阶导数为\(0\)时,就是极小值.令上式(一阶导数)等于\(0\),那么有
\[{\alpha_{2}^{new}}(K_{11} + K_{22} - 2K_{12}) = s(K_{11} - K_{12})w + y_{2}(V_{1} - V_{2}) + 1 - s\]
将\(w\)和\(v\)代入后继续推导得
\[{\alpha_{2}^{new}}(K_{11} + K_{22} - 2K_{12}) = \alpha_{2}(K_{11} + K_{22} - 2K_{12}) + y_{2}(u_1 - u_2 + y_2 - y_1)\]
记
\[\eta = K_{11} + K_{22} - 2K_{12}\]
\[E_i = u_i - y_i\]
那么有
\[{\alpha_{2}^{new}} = \alpha_2 + \frac{y_2(E_1 - E_2)}{\eta}\]
又因为
\[y_{1}\alpha_{1} + y_{2}\alpha_{2} = y_{1}{\alpha_{1}^{new}} + y_{2}{\alpha_{2}^{new}} = -\sum_{i = 3}y_{i}\alpha_{i} = 常数\]
且
\[0 \le \alpha_1^{new} \le C\]
\[0 \le \alpha_2^{new} \le C\]
则形式化的表示如下图所示
因此当\(y_1 \neq y_2\)记
\[L = \max(0, \alpha_2 - \alpha_1), H = \min(C, C + \alpha_2 - \alpha_1)\]
当\(y_1 = y_2\)记
\[L = \max(0, \alpha_2 + \alpha_1 - C), H = \min(C, \alpha_2 + \alpha_1)\]
因此有