Time Series Anomaly Detection

 

这里有个2015年的综述文章,概括的比较好,各种技术的适用场景.  https://iwringer.wordpress.com/2015/11/17/anomaly-detection-concepts-and-techniques/

 

其中 Clustering 技术可以使用 K-Means, Gaussian Mixture Model. GMM 模型可以参考这个很棒的文章 https://colab.research.google.com/github/jakevdp/PythonDataScienceHandbook/blob/master/notebooks/05.12-Gaussian-Mixtures.ipynb#scrollTo=2l9rOarpNSi0

还有一个比较新的 2019 年的 DEEP LEARNING FOR ANOMALY DETECTION: A SURVEY https://arxiv.org/pdf/1901.03407.pdf. 对所有领域的异常检测做了综述.

 

异常种类:

  Point Anomalies, 单点异常,就是一个点和其他点不同,比如突然有一笔大额花费.

  Contextual Anomalies, 上下文异常,考虑特定上下文的异常,比如在半夜休息时间突然有很大的访问量

  Collective Anomalies, 组异常,是一组,一小撮数据的异常,单看每个点都正常,但是一组数据就不正常,比如

    1. Events in unexpected order ( ordered. e.g. breaking rhythm in ECG)
    2. Unexpected value combinations ( unordered. e.g. buying a large number of expensive items)

unsupervised:

  1. Isolation Forest Algorithm 
    1. 这里看来感觉比 K-means要好?https://www.kaggle.com/rgaddati/unsupervised-fraud-detection-isolation-forest
    2. 本质上也是基于统计的,不考虑时间序列. 通过看[4] 感觉 IF 比 AutoEncoder 效果还好点. [7] 的测试结果也表明这个IF很强悍.
    3. 比如中小数据集低维度的情况下可以选择KNN,大数据集高维度时可以选择Isolation Forest. 参考[5]
    4. IF 的升级版 EIF https://towardsdatascience.com/outlier-detection-with-extended-isolation-forest-1e248a3fe97b
    5. https://towardsdatascience.com/anomaly-detection-with-isolation-forest-visualization-23cd75c281e2
  2. Local Outlier Factor(LOF) Algorithm 
  3. Clustering: K-means
  4. Clustering:GMM,与时序无关,只是基于统计的, 比K-mean 高级点
  5. Boxplot, 这个很简单,就是类似画出boxplot,一定比例范围外的就算作异常
  6. AutoEncoder, 这个训练要只使用正常数据,所以需要你知道哪些是正常数据,不是纯粹的 unsupervised learning
    1.   http://sofasofa.io/tutorials/anomaly_detection/
    2. https://medium.com/@curiousily/credit-card-fraud-detection-using-autoencoders-in-keras-tensorflow-for-hackers-part-vii-20e0c85301bd
  7. 总觉得用graph会更好,准备研究一下 
  8. big data 方面的AD 
    • https://medium.com/rahasak/anomaly-detection-with-isolation-forest-spark-scala-8d8b5f36c47c  
  • ref: 
  1. https://www.kaggle.com/pavansanagapati/anomaly-detection-credit-card-fraud-analysis
  2. https://www.experoinc.com/post/fraud-detection-using-deep-learning-on-graph-embeddings-and-topology-metrics
  3. https://www.knime.com/blog/four-techniques-for-outlier-detection 这里提到了四种异常检测算法及对比(Numeric Outlier, Z-Score, DBSCAN, Isolation Forest)
  4. https://www.infoq.com/articles/fraud-detection-random-forest/ (提到用 Random Forest, AutoEncoder, Isolation Forest)
  5. 数据挖掘中常见的「异常检测」算法有哪些?
  6. Anomaly Detection Techniques in Python
  7. A comparative evaluation of outlier detection algorithms: experiments and analyses
  8. yzhao062/anomaly-detection-resources CMU一个大神的github
  9.  

 

system log anomaly detection:

 

  1. https://www.researchgate.net/publication/220925081_Anomaly_Detection_in_Computer_Security_and_an_Application_to_File_System_Accesses

  2. Insider Threat Detection Based on User Behavior Modeling and Anomaly Detection Algorithms, 2019  
  3. Data-Driven Model-Based Detection of Malicious Insiders via Physical Access Logs, 2019
  4. Detecting insider information theft using features from file access logs, 2014, 这个就是我的场景
  5. Detection of Anomalous Insiders in Collaborative Environments via Relational Analysis of Access Logs, 2011 这个感觉更接近我的场景
  6. A Review of Insider Threat Detection: Classification, Machine Learning Techniques, Datasets, Open Challenges, and Recommendations, 2020 的一个review 文章
  7. Data Stream Clustering for Real-time Anomaly Detection: An Application to Insider Threats, 2018
  8. AN ABNORMAL FILE ACCESS BEHAVIOR DETECTION APPROACH BASED ON FILE PATH DIVERSITY, 2014, 国防科大的,提出了FPD算法,同时也提到了PAD算法,这个PAD我还没看
  9. Ghostbuster: A Fine-grained Approach for Anomaly Detection in File System Accesses,2017, file block level 的,需要kernel 支持,不适合我的场景

 

RNN 的应用

 https://github.com/chickenbestlover/RNN-Time-series-Anomaly-Detection

 https://towardsdatascience.com/time-series-of-price-anomaly-detection-13586cd5ff46 聚类的一些常用方法

posted @ 2019-10-30 13:32  mashuai_191  阅读(782)  评论(0编辑  收藏  举报