CS229：Kernels

Kernels

review

A decision boundary with the greatest possible geometric margin.
functional margin/geometric margin

maximize the margin as an optimizing problem

maximize the geometric margin \gamma in the worst case
choosing proper w and b defines a hyberplane
in this case, the problem has the same result as:
- $\begin{aligned} \min _{\gamma, w, b} & \frac{1}{2}\|w\|^{2} \\ \text { s.t. } & y^{(i)}\left(w^{T} x^{(i)}+b\right) \geq 1, \quad i=1, \ldots, m \end{aligned}$

Infinite dimensional features

Suppose w can be represented by linear combination of training examples
- actually has a coefficient containing y^{i}
- extra constraint
- represent theorem
- actually, this can be derived by the Lagrangian for our optimization problem
Intuition:
- gradient descent: after iteration, \theta is still linear combination.
- w pins the direction of decision boundary, while b decides its position
using the linear representation can transfer the problem into a optimization about inner product of feature vectors.
more simplified one: dual optimization problem
- using Lagrangian
- $\begin{aligned} max_{α} & W (α) = \sum_{i = 1}^{m} α_{i} - \frac{1}{2} \sum_{i, j = 1}^{m} y^{(i)} y^{(j)} α_{i} α_{j} ⟨ x^{(i)}, x^{(j)} ⟩ . \\ s.t. & α_{i} \geq 0, i = 1, \dots, m \\ \sum_{i = 1}^{m} α_{i} y^{(i)} = 0, \end{aligned}$
- KKT conditions are indeed satisfied
Apply kernel trick
- Write algorithm in terms of inner product
- generate mapping from x to \phi(x) , which is the high dimensional features.
- find way to compute K(x, z)=\phi(x)^{T} \phi(z)
- replace \langle x, z\rangle with K(x, z) (so you don't need to compute \phi (x) which maybe expensive)
- compute K(x, z) only need O(n) time, while calculate \phi (X) may need O(N^{2})
other skills
- adding a fixed number c controls the relative weighting between the first order and the second order terms
- changing 2 into d will include all monomials that are up to order d

how to generate a valid kernel

Intuition: think of Kernel Function as some measurement of the similarity between two vectors
K(x, z)=\exp \left(-\frac{\|x-z\|^{2}}{2 \sigma^{2}}\right)
- Gaussian kernel
- corresponding to an infinite dimensional feature mapping
judge if a kernel is valid
- Kernel Matrix
  - symmetric
  - semi-definite
  - necessary and sufficient
linear kernel: \phi(x) = x

soft margin SVM

when data is a little noisy(not linear separable), and you don't want to separate all the points
$\begin{aligned} \min _{\gamma, w, b} & \frac{1}{2}\|w\|^{2}+C \sum_{i=1}^{m} \xi_{i} \\ \text { s.t. } & y^{(i)}\left(w^{T} x^{(i)}+b\right) \geq 1-\xi_{i}, \quad i=1, \ldots, m \\ & \xi_{i} \geq 0, \quad i=1, \ldots, m . \end{aligned}$
ensuring that most examples have functional margin at least 1.
makes the decision boundary less sensitive to just one data

Summary

At the beginning, we want to separate data with two labels, finding the best decision boundary. Assuming that the dataset is linearly separable, we come up with functional margin and geometric margin, and derive a algorithm called optimal margin classifier. To solve to optimization problem, we learn Lagrange duality and KKT conditions as necessary math tools. In cases that the original dataset is not linearly separable, we choose proper kernel function and perform kernel skills to mapping original features into high dimensional space, and find a hyperplane to separate them, which will result in a non-linear decision boundary in the original feature space. Kernels maybe linear, Gaussian, and may corresponding to infinite dimensional feature mapping. Though calculate this mapping can be expensive, we can still compute Kernel function in a cost of O(n). To judge whether a kernel is valid, we compute kernel matrix and see if it is symmetric and semi-definite. When we don't want to our algorithm to classify too strictly, we apply soft margin SVM, which acts stable when dealing with data that has much noise or outlier.

posted @ 2022-02-07 14:42 Phile-matology 阅读(30) 评论(0) 编辑收藏举报

刷新页面返回顶部

登录后才能查看或发表评论，立即登录或者逛逛博客园首页

阅读排行：
· 分享一个免费、快速、无限量使用的满血 DeepSeek R1 模型，支持深度思考和联网搜索！
· 25岁的心里话
· 基于 Docker 搭建 FRP 内网穿透开源项目（很简单哒）
· ollama系列01：轻松3步本地部署deepseek，普通电脑可用
· 按钮权限的设计及实现

公告

昵称： Phile-matology
园龄： 3年6个月
粉丝： 0
关注： 1

+加关注

2025年3月

日

一

二

三

四

五

六

Phile-matology

CS229：Kernels

Kernels

review

maximize the margin as an optimizing problem

Infinite dimensional features

how to generate a valid kernel

soft margin SVM

Summary

公告

搜索

常用链接

随笔分类

随笔档案

阅读排行榜

评论排行榜

推荐排行榜

最新评论