Binary Classification

给定 m 个样本,根据每个样本拥有的 n 个特征对它们分类。

最基础的分类问题是 Binary Classification, 输出结果为 Yes 或 No。

Logistic regression

相较于线性回归假设因变量 y 服从高斯分布,逻辑回归假设因变量 y 服从伯努利分布

\[z=\mathbf{w}^T \mathbf{x} \in (-\infty,+\infty) \]

Sigmoid/Logistic function:

\[g(z)=\frac{1}{1+e^{-z}} \in(0,1) ,\, z\in(-\infty,+\infty) \]

使用 Sigmoid function 将输出映射到 \([0,1]\) 之间:

\[h(\mathbf{x};\mathbf{w}) = g(\mathbf{w}^T \mathbf{x}) = \frac{1}{1+e^{-\mathbf{w}^T \mathbf{x}}} \]

Hypothesis (假设):

\[\begin{aligned} p &= P(y=1|\mathbf{x};\mathbf{w}) = \frac{1}{1+e^{-\mathbf{w}^T \mathbf{x}}} \\ 1-p &= P(y=0|\mathbf{x};\mathbf{w}) = \frac{1}{1+e^{\mathbf{w}^T \mathbf{x}}}\end{aligned} \]

易得:

\[\text{odds} = \frac{p}{1-p} \in (0,+\infty) \]

\[\ln(\text{odds}) = \ln\frac{p}{1-p}=\mathbf{w}^T \mathbf{x} \]

All possibilities:

\[P(y=1|\mathbf{x};\mathbf{w}) + P(y=0|\mathbf{x};\mathbf{w}) = 1 \]

Linear decision boundary

\[\mathbf{w}^T\mathbf{x} = \sum_{j=0}^{n} w_j x_j = 0 \]

Non-linear decision boundary: polynomial

Likelihood:

\[\mathcal{L}(y=1| \mathbf{w};\mathbf{x}) = P(y=1|\mathbf{x};\mathbf{w}) \]

最大似然估计:

\[\begin{aligned} P(y|\mathbf{x};\mathbf{w}) &= P(y=1|\mathbf{x};\mathbf{w})^y P(y=0|\mathbf{x};\mathbf{w})^{1-y} \\ &= (\frac{1}{1+e^{-\mathbf{w}^T \mathbf{x}}})^y (1- \frac{1}{1+e^{-\mathbf{w}^T \mathbf{x}}})^{1-y} \end{aligned} \]

Maximize:

\[L(\mathbf{w}) = \prod_{i=1}^{m} P(y_i|\mathbf{x}_i;\mathbf{w}) \]

Take the logarithm:

\[\begin{aligned} \ln L(\mathbf{w}) &= \sum_{i=1}^{m} ( y_i \ln P(y=1|\mathbf{x};\mathbf{w}) + (1-y_i) \ln (1- P(y=1|\mathbf{x};\mathbf{w})) ) \\ &= \sum_{i=1}^{m} ( y_i \ln \frac{P(y=1|\mathbf{x};\mathbf{w})}{1- P(y=1|\mathbf{x};\mathbf{w})} + \ln (1- P(y=1|\mathbf{x};\mathbf{w})) ) \\ &= \sum_{i=1}^{m} ( y_i \mathbf{w}^T \mathbf{x}_i - \ln (1+e^{\mathbf{w}^T\mathbf{x}_i}) ) \end{aligned}\]

Training set:

\[\{(\mathbf{x}_1,y_1), (\mathbf{x}_2,y_2), \cdots, (\mathbf{x}_m,y_m)\} \]

For each example,

\[\mathbf{x}_i = \begin{bmatrix} x_{i,0} \\ x_{i,1} \\ \vdots \\ x_{i,n} \end{bmatrix} ,\, x_{i,0}=1 ,\, y_i \in\{0,1\} \]

non-convex 非凸函数

\[\text{Loss}(h(\mathbf{x}),y) = \begin{cases} -\ln(h(\mathbf{x})) ,&\quad y=1 \\ -\ln(1-h(\mathbf{x})) ,&\quad y=0 \end{cases} \]

Compress to:

\[\begin{aligned} \text{Loss} (h(\mathbf{x}),y) &= -y\ln(h(\mathbf{x})) - (1-y)\ln(1-h(\mathbf{x})) \\ &= - (y\ln(h(\mathbf{x})) + (1-y)\ln(1-h(\mathbf{x})))\end{aligned}\]

Logistic regression cost function:

\[\begin{aligned} J(\mathbf{w}) &= \frac{1}{m} \sum_{i=1}^{m} \text{Loss}(h(\mathbf{x}_i),y_i) \\ &= - \frac{1}{m} \sum_{i=1}^{m} (y_i\ln(h(\mathbf{x}_i)) + (1-y_i)\ln(1-h(\mathbf{x}_i))) \\ &= - \frac{1}{m} \ln L(\mathbf{w}) \\ &= - \frac{1}{m} \sum_{i=1}^{m} ( y_i \mathbf{w}^T \mathbf{x}_i - \ln (1+e^{\mathbf{w}^T\mathbf{x}_i}) ) \end{aligned}\]

Repeat (simultaneously update all \(w_j\)):

\[w_j := w_j -\alpha \sum_{i=1}^{m} (h(\mathbf{x}_i)-y_i)\mathbf{x}_{i,j} \]

Cross Entropy (交叉熵)

Conjugate gradient (共轭梯度法)

BFGS

L-BFGS

[jVal,gradient] = costFunction(theta)
optimset
fminunc

Cost function of Regularized Logistic Regression:

\[J(\mathbf{w}) = - \frac{1}{m} \sum_{i=1}^{m} ( y_i \mathbf{w}^T \mathbf{x}_i - \ln (1+e^{\mathbf{w}^T\mathbf{x}_i}) ) + \frac{\lambda}{2m} \sum_{j=1}^{n} w_j^2 \]

Gradient Descent (梯度下降法)

Repeat simultaneously:

\[w_0 := w_0 -\alpha \sum_{i=1}^{m} (h(\mathbf{x}_i)-y_i)x_{i,0} \]

\[\begin{aligned} w_j &:= w_j - \alpha \frac{\partial}{\partial w_j} J(\mathbf{w}) \\ &:= w_j -\alpha \left[ \frac{1}{m} \sum_{i=1}^{m} (h(\mathbf{x}_i)-y_i) x_{i,j} + \frac{\lambda}{m} w_j \right] \end{aligned}\]

posted @ 2022-08-20 20:53  4thirteen2one  阅读(41)  评论(0编辑  收藏  举报