线性判别器(LDA)
线性判别器(LDA)
线性判别器思想:类内小,类间大;通过选择一个投影方向将 \(p\) 维数据进行投影到 一维直接上,并且在这直线可以通过一个阈值进行分类。所以我们要找 \(w\)。
首先我们做以下定义:
数据集 $ {(x_i,y_i)}_{i=1}^{N}、x \in R^p、 y \in {+1, 0}$
假设两类数据 \(x_{c_1} = \{x_i | y_i = 0\}、 x_{c_2} = \{x_i | y_i = 0\}、 |x_{c_1} = N_1|、|x_{c_2}| = N_2、N_1 + N_2 = N\)
根据求均值公式:
\[\mu = \frac{1}{N} \sum_{i = 1}^{N}x_i
\]
得到类心:
\[\mu_1 =\bar{x_{c_1}}= \frac{1}{N_1} \sum_{i = 1}^{N_1}x_i\\
\mu_2 =\bar{x_{c_2}}= \frac{1}{N_2} \sum_{i = 1}^{N_2}x_i
\]
投影到直线上类心为:
\[z_1 = \frac{1}{N_1} \sum_{i = 1}^{N_1} w^T x_i \\
z_2 = \frac{1}{N_2} \sum_{i = 1}^{N_2} w^T x_i \\
\]
类心距为(类间距):
\[(z_1 - z_2)^2 = (\frac{1}{N_1} \sum_{i = 1}^{N_1} w^T x_i - \frac{1}{N_1} \sum_{i = 1}^{N_1} w^T x_i)^2 \\
= (w^T * \mu_1 - w^T * \mu_2)^2 \\
= w^T(\mu_1 - \mu_2)(\mu_1 - \mu_2)^Tw\\
令 \quad S_b = (\mu_1 - \mu_2)(\mu_1 - \mu_2)^T \\
(z_1 - z_2)^2 = w^T * S_b * w
\]
其中 \(S_b\) 为类间散列矩阵。
类内距离用方差进行表示:
\[S_1 = \frac{1}{N_1} \sum_{i = 1}^{N_1}(w^Tx_i - z_1)(w^Tx_i - z_1)^T\\
S_2 = \frac{1}{N_2} \sum_{i = 1}^{N_2}(w^Tx_i - z_2)(w^Tx_i - z_2)^T\\
S_1 = \frac{1}{N_1} \sum_{i = 1}^{N_1}(w^Tx_i - \frac{1}{N_1} \sum_{i = 1}^{N_1} w^T x_i)(w^Tx_i - \frac{1}{N_1} \sum_{i = 1}^{N_1} w^T x_i)^T\\
= S_1 = \frac{1}{N_1} \sum_{i = 1}^{N_1}(w^Tx_i - w^T \mu_1)(w^Tx_i - w^T \mu_1)^T\\
= \frac{1}{N_1} w^T\sum_{i = 1}^{N_1}(x_i - \mu_1)(x_i - \mu_1)^T w\\
= w^T[\frac{1}{N_1} \sum_{i = 1}^{N_1}(x_i - \mu_1)(x_i - \mu_1)^T] w\\
令 \quad S_{c_1} = \frac{1}{N_1} \sum_{i = 1}^{N_1}(x_i - \mu_1)(x_i - \mu_1)^T \\
S_1 = w^T S_{c_1} w\\
S_2 = w^T S_{c_2} w\\
\]
由于我们需要的是类间距离大,类内距离小,所以设目标函数:
\[J(w) = \frac{(z_1 - z_2)^2}{S_1 + S_2}\\
令 \quad S_w = S_{c_1} + S_{c_2} \\
J(w) = \frac{w^T S_b w}{w^T S_w w}
\]
其中 \(S_w\) 为类内散度矩阵。
下面我们对 \(J(w)\) 进行求导:
\[\frac{\partial}{\partial w}J(w) = 2S_bw(w^TS_ww)^{-1} - 2w^TS_bw(w^TS_ww)^{-2}S_W w = 0\\
\Longrightarrow S_bw(w^TS_ww) = (w^TS_bw)S_ww\\
\Longrightarrow S_ww = \frac{(w^TS_ww)}{(w^TS_bw)}S_bw\\
\Longrightarrow w = \frac{(w^TS_ww)}{(w^TS_sw)}S_w^{-1}S_bw\\
\frac{(w^TS_bw)}{(w^TS_ww)} \in R \\
\Longrightarrow w \propto S_w^{-1}S_bw = S_w^{-1}(\mu_1 - \mu_2)(\mu_1 - \mu_2)^Tw\\
\Longrightarrow w \propto S_w^{-1}(\mu_1 - \mu_2)
\]
另一种求解:
由于 \(J\) 与 \(w\) 的方向有关,确定方向后,与 \(w\) 的长度无关。求解过程中,分子分母会同时变化,所以首先固定分母为某一个非 \(0\) 常数,即:
\[w^TS_ww = c
\]
则原问题转化一个约束问题:
\[\begin{cases}
max_w \quad w^T S_b w \\
s.t. \quad w^TS_ww = c
\end{cases}
\]
引入拉格朗日乘子:
\[L(w, \lambda) = w^TS_b w - \lambda(w^TS_ww - c)
\]
转化为无约束问题:
\[\begin{cases}
max_w max_\lambda L(w,\lambda) \\
s.t. \quad \lambda > 0
\end{cases}
\]
将 \(max\) 调换位置:
\[\begin{cases}
max_\lambda max_w L(w,\lambda) \\
s.t. \quad \lambda > 0
\end{cases}
\]
将 $ L(w,\lambda)$ 对 \(w\) 进行求导:
\[\frac{\partial L(w,\lambda)}{\partial w} = (S_b + S_b^T)w - \lambda(S_w + S_w^T)w\\
= 2S_bw - 2\lambda S_ww = 0\\
S_w^{-1}S_bw = \lambda w\\
S_b w = (\mu_1 - \mu_2)(\mu_1 - \mu_2)^Tw = \beta(\mu_1 - \mu_2) \\
w = \frac{\beta}{\lambda} S_w^{-1}(\mu_1 - \mu_2)\\
w = S_w^{-1}(\mu_1 - \mu_2)
\]
考虑到数值解的稳定性,在实践中通常对 \(S_w\) 进行奇异值分解。至此 \(w\) 求得。