无监督LDA、PCA、k-means三种方法之间的的联系及推导

\(LDA\)是一种比较常见的有监督分类方法，常用于降维和分类任务中；而\(PCA\)是一种无监督降维技术；\(k\)-means则是一种在聚类任务中应用非常广泛的数据预处理方法。
本文的主要写作出发点是:探讨无监督情况下，\(LDA\)的类内散度矩阵和类间散度矩阵与\(PCA\)和\(k\)-means之间的联系。

1.常规有监督\(LDA\)的基本原理:

(1) \(LDA\)的目标函数:

关于\(LDA\)的产生及理论推导，大家参考：“线性判别分析LDA原理总结”，这篇文章已经讲解地非常详细，我在这里不再赘述。本文涉及到的\(LDA\)皆是多分类\(LDA\), 以矩阵形式书写。
首先\(LDA\)的基本思想是：给定原始数据\(X\)（假设已经去中心化），求解一个正交投影子空间\(W\)，使得样本经过子空间投影后，可以使类内散度矩阵\(S_w\)最小，类间散度矩阵\(S_b\)最大。即优化以下目标函数：

\[\begin{equation} \left\{\begin{array}{l} \min_{W^{T} W=I} \operatorname{Tr}\left(W^{T} S_{w} W\right) \\ \max_{W^{T} W=I} \operatorname{Tr}\left(W^{T} S_{b} W\right). \end{array} \right. \end{equation} \]

而上式中的类内散度矩阵\(S_w\)和类间散度矩阵\(S_b\)又满足另一个条件：

\[\begin{equation} {S}_w + {S}_b = {S}_t, \end{equation} \]

这里，\({S}_t\)指的使整体散度矩阵。本文的出发点就是说明类内散度矩阵\({S}_t\)与\(PCA\)之间的联系以及类间散度矩阵\({S}_w\)与\(k\)-means之间的关系。

(2) \(LDA\)为什么是有监督的

LDA之所以是有监督的，是因为在公式（1）中，计算类内散度矩阵\({S}_w\)和类间散度矩阵\({S}_b\)时，需要用到标签矩阵Y。

2.LDA的类内散度矩阵和\(PCA\)之间的关系

关于PCA的具体推导过程，可以参考："PCA的数学原理"
LDA中的整体散度矩阵\({S}_t\)的计算可以表达为：

\[\begin{equation} {S}_{t}={X X}^{T}=\sum_{i=1}^{n} x_{i} x_{i}^{T}。 \end{equation} \]

这里可以明显的发现，\(LDA\)中的整体散度矩阵\({S}_t\)和\(PCA\)是等价的。

3. \(LDA\)和\(k\)-means之间的联系

首先，我们做出一个假设，在无监督情况下，标签矩阵\(Y\)由一个已知变量转化为一个待求变量。此时，类内散度矩阵\({S}_w\)和类间散度矩阵\({S}_b\)可以做如下推导：

\[\begin{equation} \left\{\begin{array}{l} {S}_{t}={X} {X}^{T} \\ {S}_{b}={X} {Y}\left({Y}^{T} {Y}\right)^{-1} {Y}^{T} {X}^{T} \\ {S}_{w}={S}_{t}-{S}_{b}={X} \left({I}-{Y}\left({Y}^{T} {Y}\right)^{-1} {Y}^{T}\right) {X}^{T} \end{array} \right. \end{equation} \]

这里\({I}\)是同维度的单位矩阵。下面，我们进行类内散度矩阵\(\mathbf{S}_w\)的推导：

\[\begin{equation} \begin{aligned} \mathbf{S}_{w} &=\mathbf{X} \left(\mathbf{I}-\mathbf{Y}\left(\mathbf{Y}^{T} \mathbf{Y}\right)^{-1} \mathbf{Y}^{T}\right) \mathbf{X}^{T}\\ &={X X}^{T}-\mathbf{X Y}\left(\mathbf{Y}^{T} \mathbf{Y}\right)^{-1} \mathbf{Y}^{T} \mathbf{X}^{T}\\ \end{aligned} \end{equation} \]

对上式进行拆分:

\[\begin{equation} \begin{aligned} &\mathbf{X X}^{T}-\mathbf{X Y}\left(\mathbf{Y}^{T} \mathbf{Y}\right)^{-1} \mathbf{Y}^{T} \mathbf{X}^{T}\\ =&\mathbf{X X}^{T}-2 \mathbf{X Y}\left(\mathbf{Y}^{T} \mathbf{Y}\right)^{-1} \mathbf{Y}^{T} \mathbf{X}^{T}+\mathbf{X Y}\left(\mathbf{Y}^{T} \mathbf{Y}\right)^{-1} \mathbf{Y}^{T} \mathbf{Y}\left(\mathbf{Y}^{T} \mathbf{Y}\right)^{-1} \mathbf{Y}^{T} X^{T} \\ =&\left(\mathbf{X}-\mathbf{X Y}\left(\mathbf{Y}^{T} \mathbf{Y}\right)^{-1} \mathbf{Y}^{T}\right)\left(\mathbf{X-XY}\left(\mathbf{Y}^{T} \mathbf{Y}\right)^{-1} \mathbf{Y}^{T}\right)^{T} \\ =& trace\left(\mathbf{X}-\mathbf{X Y}\left(\mathbf{Y}^{T} \mathbf{Y}\right)^{-1} \mathbf{Y}^{T}\right) \end{aligned} \end{equation} \]

上述公式中的一个小技巧：\((\mathbf{YY})^{-1}\)是一个对角矩阵，对角元素是，类别数分之一(\(\frac{1}{c}\))。
另外需要注意的一点是：

\[\begin{equation} \left\{ \begin{aligned} &\mathbf{Y}^{T} \mathbf{Y}\left(\mathbf{Y}^{T} \mathbf{Y}\right)^{-1}=I\\ &\left(\mathbf{Y}^{T} \mathbf{Y}\right)^{-1 / 2} \mathbf{Y}^{T} \mathbf{Y}\left(\mathbf{Y}^{T} \mathbf{Y}\right)^{-1 / 2}=I\\ &\mathbf{Y}\left(\mathbf{Y}^{T} \mathbf{Y}\right)^{-1} \mathbf{Y}^{T} \neq I \end{aligned} \right.. \end{equation} \]

故此，无监督情况下，\(LDA\)的类内散度矩阵和\(k\)-means其实是等价的，并且可以写成迹范数的形式。

posted @ 2020-05-07 23:24 派大星1号阅读(1687) 评论(0) 编辑收藏举报

刷新页面返回顶部

zyx423