Lecture15 Anomaly detection

Lecture 15 Anomaly detection

problem motivation

Gaussian distribution

Algorithm

Density estimation

Training set : \({x^{(1)},x^{(2)},\dots,x^{(m)}}\)

Each example is \(x \in \mathbb{R}^n\)

\[x_1 \sim N(u_1,\sigma_1^2)\\ x_2 \sim N(u_2,\sigma_2^2)\\ \vdots\\ x_n \sim N(u_n,\sigma_n^2)\\ p(x) = p(x_1,u_1,\sigma_1^2)p(x_2,u_2,\sigma_2^2)\dots p(x_n,u_n,\sigma_n^2) =\prod_{j=1}^p(x_j,u_j,\sigma_j^2) \]

Anomaly detection algorithm

  1. Choose features \(x_i\) that you think might be indicative of anomalous example

  2. Fit parameters \(u_1,u_2,\dots,u_n,\sigma_1^2,\sigma_2^2,\dots,\sigma_n^2\)

    \(u_j=\frac{1}{m}\sum_{i=1}^mx_j^{(i)}\)

    \(\sigma_j^2=\frac{1}{m}\sum_{i=1}^m(x_j^{(i)}-u_j)^2\)

  3. Given new example \(x\),compute \(p(x)\)

    \[p(x)=\prod_{j=1}^np(x_j,u_j,\sigma_j^2)=\prod_{j=1}^n\frac{1}{\sqrt{2\pi}\sigma_j}exp(-\frac{(x_j-u_j)^2}{2\sigma^2_j}) \]

    Anomaly if \(p(x)<\varepsilon\)

Anomaly detection example

image-20211220102418406

\(x_1\sim N(5,4)\)\(x_2\sim N(3,1)\)

对于测试样本\(x_{test}^{(1)}\),其坐标为\((a,b)\),计算其落在横坐标\(a\)上的概率\(p(a,u_1,\sigma_1^2)\),落在纵坐标\(b\)上的概率\(p(b,u_2,\sigma_2^2)\)\(p(x) = p(a,u_1,\sigma_1^2)p(b,u_2,\sigma_2^2)\)就是测试样本\(x_{test}^{(1)}\)落在平面上点\((a,b)\)的概率。因为\(p(x_{test}^{(1)})=0.0426 \geq \varepsilon\) 所以判定其为正常样本。

同理,\(p(x_{test}^{(2)})=0.0021 \leq \varepsilon\) 所以判定其为异常样本

Developing and evaluating an anomaly detection system

The inportance of real-number evaluation

When developing a learning algorithm (choosing features,etc.),making decisions is much easier if we have a way of evaluating our learning algorithm

Assume we have some labeld data,of anomalous and non-anomalous example(y = 0 if normal,y = 1 if anomalous)

Training set:\(x^{(1)},x^{(2)},\dots,x^{(m)}\) (assume normal examples/not anomalous).把训练集是正常样本的集合,即使溜进了一些异常样本。

Cross validation set: \((x_{cv}^{(1)},y_{cv}^{(1)}),\dots,(x_{cv}^{(m_{cv})},y_{cv}^{(m_{cv})})\)

Test set : \({(x_{test}^{(1)},y_{test}^{(1)}),\dots,(x_{test}^{(m_{test})},y_{test}^{(m_{test})})}\)

验证集和测试集都包含一些异常样本

Aircraft engines motivating example

10000 good (normal) engines

20 flawd engines (anomalous)

Training set : 6000 good engines (y = 0)

CV : 2000 good engines (y = 0),10 anomalous (y = 1)

Test : 2000 good engines (y = 0),10 anomalous (y = 1)

Alternative : (不推荐)

Training set : 6000 good engines (y = 0)

CV : 4000 good engines (y = 0),10 anomalous (y = 1)

Test : 4000 good engines (y = 0),10 anomalous (y = 1)

验证集和测试集应当是两个完全不同的数据集。

Algorithm evaluation

Fit model \(p(x)\) on training set \(\{x^{(1)},\dots,x^{(m)}\}\)

On a cross validation/test example \(x\),predict

\[y = \begin{cases} 1 & if\ p(x)<\varepsilon\ (anomaly)\\ 0 & if\ p(x)\geq\varepsilon\ (normal) \end{cases} \]

possible evaluation metrics:

  • True positive,false positive,false negative,true negative
  • Precision/Recall
  • \(F_1\)-score

Can also use cross validation set to choose parameter \(\varepsilon\)

尝试许多不同的 \(\varepsilon\),找到使得\(F_1\)-score最大的\(\varepsilon\)

Anomaly detection vs. supervised learning

Anomaly detection Supervised learning
vary small number of positive example (y = 1).(0-20 is common) Large number of negative (y = 0) examples Laege number of positive and negative examples
Many different “types" of anomalies.Hard for any algorithm to learn from positive examples what the anomalies look like ; future anomalies may look nothing like any of the anomalous examples we've seen so far Ecough positive examples for algorithm to get a sense of what positive examples are like,future positive examples likely to be similar to ones in training set

异常检测适用于正样本少,负样本多的情况。监督学习适用于正负样本都多的情况。

异常有很多种,异常样本较少,很难从正样本中学习异常。而且将来出现的异常很可能会与已有的截然不同。适用于异常检测。

监督学习拥有足够多的正样本去学习异常,而且将来出现的异常与当前训练集中的正样本相似。适用于监督学习。

关键是正样本的数量。

Anomaly detection Supervised learning
Frad detection Email spam classification
Manufacturing (e.g. aircraft engines) Wheater prediction
Monitoring machines in a data center Cancer classification

Choosing what features to use

Non-gaussian features

image-20211220114818503

对于不符合高斯分布的数据,通过对数据进行一些转换,使得它看上去更接近高斯分布。虽然不进行转换算法也可以运行,但转换之后,算法可以运行的更好。

通常的转换方式有

\[\begin{align} & x_1 \leftarrow log(x_1)\\ & x_2 \leftarrow log(x_2 + c)\\ & x_3 \leftarrow \sqrt{x_3}\\ & x_4 \leftarrow x_4^{\frac{1}{3}} \end{align} \]

Error analysis for anomaly detection

Want \(p(x)\) large for normal examples \(x\)

\(p(x)\) small for anomalous examples \(x\)

Most common problem : \(p(x)\) is comparable (say,both large) for normal and anomalous examples.

发现一个新的特征\(x_2\),他能够将异常样本和正常样本区分开来。

image-20211220144412386

Monitoring computers in a data center

Choose features that might take on unusually large or small values in the event of an anomaly

\(x_1\)=memory use of computer

\(x_2\)=number of disk accesses/sec

\(x_3\)=CPU load

\(x_4\)=network traffic

\(x_5\)=CPU load / network traffic

\(x_6\)=(CPu load)^2 / network traffic

Multivariate Gaussian distribution

Motivating example:Monitoring machines in a data center

image-20211220151938869

对于绿色样本\(x^{(1)}(0.5,1.5)\),CPU负载很低,但内存使用量很高,是一个异常样本。如果认为两个变量线性无关,计算结果\(p(x^{(1)})=p(x_1^{(1)},u_1,\sigma_1^2)p(x_2^{(1)},u_2,\sigma_2^2)\)并不低,所以可能不会被归类为异常样本。紫红色的等高线图表示其概率在平面上的分布。但是CPU负载和内存使用量之间存在线性关系。其概率分布是椭圆形的。

Multivariate Gaussian (Normal) distribution

\(x\in\mathbb{R}^n\). Don't model \(p(x_1),p(x_2),\dots\),etc. separately.

Model \(p(x)\) all in one go.

Parameters : \(u \in \mathbb{R}^n\),\(\Sigma \in \mathbb{R}^{n\times n}\)(covariance matrix)

\[p(x;u,\Sigma)=\frac{1}{(2\pi)^{(n/2)}|\Sigma|^{\frac{1}{2}}}exp(-\frac{1}{2}(x-u)^T\Sigma^{-1}(x-u)) \]

Parameter fitting:

Given training set \(\{x^{(1)},x^{(2)},\dots,x^{(m)}\}\)

\[u=\frac{1}{m}\sum_{i=1}^mx^{(i)}\ \ \Sigma=\frac{1}{m}\sum_{i=1}^m(x^{(i)}-u)(x^{(i)}-u)^T \]

Multivariance Gaussian (Normal) examples

image-20211220152719486

image-20211220152728855

image-20211220152741527

image-20211220152754579

image-20211220152805531

image-20211220152812624

Anomaly detection using the multivariance Gaussian distribution

Relationship to origin model

Original model : \(p(x) = p(x_1,u_1,\sigma_1^2)p(x_2,u_2,\sigma_2^2)\dots p(x_n,u_n,\sigma_n^2)\)

image-20211220154459325

Corresponds to multivariate Gaussian

\[p(x;u,\Sigma)=\frac{1}{(2\pi)^{(n/2)}|\Sigma|^{\frac{1}{2}}}exp(-\frac{1}{2}(x-u)^T\Sigma^{-1}(x-u)) \]

Where \(\Sigma = \left[\begin{matrix}\sigma^2_1&0&\cdots&0\\0&\sigma^2_2&\cdots&0\\\vdots&\vdots&\ddots&\vdots\\0&0&\cdots&\sigma^2_n\end{matrix}\right]\)

原模型的等高线图总是轴对齐的 (axis-aligned)。(变量之间线性无关。)

Original model Multivariate Gaussian
Manually create features to capture anomalies where \(x_1,x_2\) take unusual combinations of values.e.g.\(x_3=\frac{x_1}{x_2}=\frac{CPU\ load}{memory}\) Automaticlly capture correlations between features
Computationally cheaper(alternatively,scales better to large) Computationally more expensive
OK even if \(m\) (training set size) is small Must have \(m>n\) or else \(\Sigma\) is non-invertible (\(m \geq 10n\))
posted @ 2021-12-20 15:52  Un-Defined  阅读(24)  评论(0编辑  收藏  举报