SMO算法

SMO算法是SVM的重要部分.

分类函数记为\(\vec{u} = (\vec{w},\vec{x}) - b\)

SMO算法分为两个部分,第一部分是求\[\Psi(\vec{\alpha}) = \frac{1}{2}\sum_{i = 1}^{N}\sum_{j = 1}^{N}y_{i}y_{j}K(\vec{x_i},\vec{x_j}) \alpha_i\alpha_j - \sum_{i = 1}^{N}\alpha_i\]

在一定约束条件下的最小值,第二部分是运用启发式的策略选择要两个更新的\(\alpha_i\)。

第一部分,求最小值

\[\min_{\vec{\alpha}}\Psi(\vec{\alpha}) = \min \frac{1}{2}\sum_{i = 1}^{N}\sum_{j = 1}^{N}y_{i}y_{j}K(\vec{x_i}, \vec{x_j})\alpha_i\alpha_j - \sum_{i = 1}^{N}\alpha_i\]

s.t.\[0\le\alpha_i\le C,\forall{\alpha_i}\]\[\sum_{i = 1}^{N}y_i\alpha_i = 0 \] 

那么与之相应的KKT条件为:

\[\alpha_i = 0 \Leftrightarrow y_{i}u_{i} \ge 1\]

\[0 < \alpha_i < C \Leftrightarrow y_{i}u_{i} = 1\]

\[\alpha_i = c \Leftrightarrow y_{i}u_{i} le 1\]

KKT条件表明内部点的\(\alpha_i = 0\),支持向量的\(\alpha_i\)取值范围为\(0 < \alpha_i < C\),离群点的\(\alpha_i = C\)

那么先用第二部分的启发策略选定两个需要优化的\(\alpha_i, \alpha_j\),简记为\(\alpha_1, \alpha_2\),这样就固定了\(\alpha_i, i \ge 3\)。再利用约束条件,将\(\alpha_1\)利用\(\alpha_2\)表达,因此问题就转化为了关于\(\alpha_2\)的一元二次函数优化问题。那么将选取的\(\alpha_1, \alpha_2\)重新记为\(\alpha_1^{new}, \alpha_2^{new}\)以表明这两个是需要优化的值,原先的\(\alpha_i, i \ge 3\)表达是未进行优化之前的值,因此是定值(常量)。同时\(\alpha_1, \alpha_2\)表示是未优化之前的值,因此也是定值,并且有对于某个向量\(\vec{x_j}\),它的输出记为\(u_j\),则\(\sum_{i = 1}^{N}y_{i}\alpha_{i}K(\vec{x_i}, \vec{x_j}) = u_j + b\)。

将\(\alpha_1^{new}, \alpha_2^{new}\)代入目标函数有:

\[\Psi(\vec{\alpha}) = \Psi(\alpha_1^{new}, \alpha_2^{new}, \alpha_3, ..., \alpha_N) =  \frac{1}{2}(y_{1}^{2}{\alpha_{1}^{new}}^{2}K(\vec{x_1}, \vec{x_1}) + y_{1}y_{2}{\alpha_{1}^{new}}{\alpha_{2}^{new}}K(\vec{x_1}, \vec{x_2}) + y_{1}y_{3}{\alpha_{1}^{new}}\alpha_{3}K(\vec{x_1}, \vec{x_3}) + ... + y_{1}y_{N}{\alpha_{1}^{new}}\alpha_{N}K(\vec{x_1}, \vec{x_N}) + y_{2}y_{1}{\alpha_{2}^{new}}{\alpha_{1}^{new}}k(\vec{x_2}, \vec{x_1}) + y_{2}^{2}{\alpha_{2}^{new}}^{2}K(\vec{x_2}, \vec{x_2}) + y_{2}y_{3}{\alpha_{2}^{new}}\alpha_{3}K(\vec{x_2}, \vec{x_3}) + ... + y_{2}y_{N}{\alpha_{2}^{new}}\alpha_{N}k(\vec{x_2}, \vec{x_N}) + y_{3}y_{1}\alpha_{3}{\alpha_{1}^{new}}K(\vec{x_3}, \vec{x_1}) + y_{3}y_{2}\alpha_{3}{\alpha_{2}^{new}}K(\vec{x_3}, \vec{x_2}) + y_{3}^{2}\alpha_{3}^{2}K(\vec{x_3}, \vec{x_3}) + ... + y_{3}y_{N}\alpha_{3}\alpha_{N}K(\vec{x_3}, \vec{x_N}) + ... + y_{N}y_{1}\alpha_{N}{\alpha_{1}^{new}}K(\vec{x_N}, \vec{x_1}) + y_{N}y_{2}\alpha_{N}{\alpha_{2}^{new}}K(\vec{x_N}, \vec{x_2}) + y_{N}y_{3}\alpha_{N}\alpha_{3}K(\vec{x_N}, \vec{x_1}) + ... + y_{N}^{2}\alpha_{N}^{2}K(\vec{x_N}, \vec{x_N})) - \alpha_{1}^{new} - \alpha_{2}^{new} - \sum_{i = 3}\alpha_{i} \]

\[K_{ij} = K(\vec{x_i}, \vec{x_j})\]

\[V_{i} = \sum_{j = 3}y_{j}\alpha_{j}K_{ij}\ = u_{i} + b - y_{1}\alpha_{1}K_{1i} - y_{2}\alpha_{2}K_{2i}\]

\[s = y_{1}y_{2}\]

\[\Psi(\vec{\alpha}) = \frac{1}{2}K_{11}{\alpha_{1}^{new}}^{2} + \frac{1}{2}K_{22}{\alpha_{2}^{new}}^{2} + sK_{12}{\alpha_{1}^{new}}{\alpha_{2}^{new}} + y_{1}{\alpha_{1}^{new}}V_{1} + y_{2}{\alpha_{2}^{new}}V_{2} - {\alpha_{1}^{new}} - {\alpha_{2}^{new}} + \Psi_{constant}\]

又因为有

\[y_{1}\alpha_{1} + y_{2}\alpha_{2} = -\sum_{i = 3}y_{i}\alpha_{i} = y_{1}{\alpha_{1}^{new}} + y_{2}{\alpha_{2}^{new}}\]

两边同时乘以\(y_1\)有

\[\alpha_{1} + s\alpha_{2} = -y_{1}\sum_{i = 3}y_{i}\alpha_{i} = {\alpha_{1}^{new}} + s{\alpha_{2}^{new}}\]

\[w = -y_{1}\sum_{i = 3}y_{i}\alpha_{i}\]

那么

\[\alpha_{1}^{new} = w - s{\alpha_{2}^{new}}\]

因此有

\[\Psi(\vec{\alpha}) = \frac{1}{2}K_{11}{(w - s{\alpha_{2}^{new}})}^{2} + \frac{1}{2}K_{22}{\alpha_{2}^{new}}^{2} + sK_{12}{(w - s{\alpha_{2}^{new}})}{\alpha_{2}^{new}} + y_{1}{(w - s{\alpha_{2}^{new}})}V_{1} + y_{2}{\alpha_{2}^{new}}V_{2} - {(w - s{\alpha_{2}^{new}})} - {\alpha_{2}^{new}} + \Psi_{constant}\]

对\(\alpha_{2}^{new}\)求导数有

\[\frac{\textbf{d}\Psi}{\textbf{d}\alpha_{2}^{new}} = -sK_{11}(w - s{\alpha_{2}^{new}}) + K_{22}{\alpha_{2}^{new}} - K_{12}{\alpha_{2}^{new}} + sK_{12}(w - s{\alpha_{2}^{new}}) - y_{2}V_{1} + s + y_{2}V_{2} - 1\]

 如果二阶导数大于\(0\),那么当一阶导数为\(0\)时,就是极小值.令上式(一阶导数)等于\(0\),那么有

\[{\alpha_{2}^{new}}(K_{11} + K_{22} - 2K_{12}) = s(K_{11} - K_{12})w + y_{2}(V_{1} - V_{2}) + 1 - s\]

将\(w\)和\(v\)代入后继续推导得

 \[{\alpha_{2}^{new}}(K_{11} + K_{22} - 2K_{12}) = \alpha_{2}(K_{11} + K_{22} - 2K_{12}) + y_{2}(u_1 - u_2 + y_2 - y_1)\]

\[\eta = K_{11} + K_{22} - 2K_{12}\]

\[E_i = u_i - y_i\]

那么有

\[{\alpha_{2}^{new}} = \alpha_2 + \frac{y_2(E_1 - E_2)}{\eta}\]

又因为

\[y_{1}\alpha_{1} + y_{2}\alpha_{2} = y_{1}{\alpha_{1}^{new}} + y_{2}{\alpha_{2}^{new}} = -\sum_{i = 3}y_{i}\alpha_{i} = 常数\]

\[0 \le \alpha_1^{new} \le C\]

\[0 \le \alpha_2^{new} \le C\]

则形式化的表示如下图所示

因此当\(y_1 \neq y_2\)记

\[L = \max(0, \alpha_2 - \alpha_1), H = \min(C, C + \alpha_2 - \alpha_1)\]

当\(y_1 = y_2\)记

\[L = \max(0, \alpha_2 + \alpha_1 - C), H = \min(C, \alpha_2 + \alpha_1)\]

因此有

   

posted @ 2014-04-12 14:40  liuzhijiang123  阅读(676)  评论(0编辑  收藏  举报