UEBA-对等组聚类
针对计算基础设施的网络攻击越来越多,因此需要更为高级的防御解决方案如入侵检测系统(Intrusion Detection Systems,IDS)来对抗攻击。使用网络行为异常检测(Network Behavior Anomaly Detection,NBAD)方法的IDS系统成为基于深度包检测(deep packet inspection,DPI)的传统网络安全系统的有效补充。NBAD系统通过追踪各种网络特征属性并建立正常行为基线检测威胁。而检测结果严重依赖基线的质量。当前有两种生成基线的方法,其一是对每个网络主机分别建立模型,形成按照主机(per-host)模型;其二是对整个网络建立基线,形成按照网络(per-network)模型。per-host模型容易受噪声影响,误报高,而per-network模型整体误报低,召回低。
在以上的情形下,一种以网络内主机群体为研究对象的方法出现,这种方法就是对等组(Peer Groups)。对等组是基于主机行为创建网络主机群体。对等组分析促成NBAD系统检测那些开始表现不同于它们以前相似主机群体行为的单个主机。行为表现出不同于对等组综合行为的主机被标记为异常,需要进一步调查。使用对等组聚类发现异常的方法属于上下文异常检测(Contextual Anomaly detection, CAD)。
关于对等组学术方面的研究:
Paper | Data source | Method |
(1) Community-based anomaly detection detection | HTTP/HTTPS requests and DNS for IP | Method for creating groups of internal network hosts based on their behaviour on the network. Consider two network hosts similar if they regularly visit a similar set of internal and/or external network HTTP/HTTPs server. Each network host is representedby a tuple containing the set of all visited server, port and a vector of frequencies of their visits. Cosine similarity together with approximative clustering is used to create peer groups. |
(2) Network-Aware Behavior Clustering of Internet End Hosts | IP packets | They use bipartite graphs to model communications of Internet end-hosts. Subsequently, they derive a one-mode projection graphs to capture the behavior similarity of host communications through edges between source (or destination) hosts that talk to the same destinations (or sources). Then, they apply a simple spectral clustering algorithm to discover the inherent behavior clusters within the same network prefixes. For each traffic cluster of network prefixes, they use relative uncertainty concepts in information theory to characterize and interpret its behavior patterns based on traffic features such as source ports, destination ports and IP addresses of packets. |
(3) Profiling IP Hosts Based on Traffic Behavior | NetFlow |
They study the behavior of IP Network nodes in the internet (IP hosts) from the prospective of their communication behavior patterns. They setup up hosts’ behavior profiles of the observed IP nodes by clustering hosts into groups of similar communication behaviors over 1 hour windows. Communication behaviors over 1 hour as summarized in a set of features computed over NetFlow. The feature sets are then clustered using DBSCAN. An understanding of the device’s normal behavior in relation to its past, to its peer group, and to the wider organization |
(4) Profiling and Clustering Internet Hosts | Packet headers | From packet headers, they obtain direct and indirect features for each host on a day or two-day basis. 5 Features for clustering: daily distinct destinations, daily bytes sent, list of open ports, and communication similairty computed as the average of Dice similarity values of all the communication pairs for this host (different TCP and UDP communications info). The Dice coefficient is also used as distance metric clustering and agglomerative clustering as clustering algorithm. |
(5) Role Classification of Hosts within Enterprise Networks Based on Connection Patterns | NetFlow of packets | Uses two algorithms that, used together, partition the hosts on an enterprise network into groups in a way that exposes the logical structure of a network. The grouping algorithm classifies hosts into groups, or “roles,” based on their connection habits. Similarity between hosts as a function of the number of common hosts with which they communicate. |
(6) Behavior-Profile Clustering for False Alert Reduction in Anomaly Detection Sensors | IP packets | Whenever a host sends or receives traffic, its sensor verifies whether it is within one standard deviation of its behavior profile and all the other profiles of the cluster where the host is a member. They defined the behavior profile of a user as a set of hourly histograms for each feature that represents a measure of network-related statistics such as: average number of unique users contacted per hour, average number of packets exchanged per hour, and average length of the packets exchanged per hour. Each histogram, computed per port (service) and direction (input or output), represents the hourly average and standard deviation for a feature. They cluster behavior profiles to produce peer groups |
(1) M. Kopp, M. Grill and J. Kohout, "Community-based anomaly detection," 2018 IEEE International Workshop on Information Forensics and Security (WIFS), Hong Kong, Hong Kong, 2018, pp. 1-6.
(2) K. Xu, F. Wang and L. Gu, "Network-aware behavior clustering of Internet end hosts," 2011 Proceedings IEEE INFOCOM, Shanghai, 2011, pp. 2078-2086.
(3) A. Jakalan, J. Gong and S. Liu, "Profiling IP hosts based on traffic behavior," 2015 IEEE International Conference on Communication Software and Networks (ICCSN), Chengdu, 2015, pp. 105-111.
(4) S. Wei, J. Mirkovic, E. Kissel, (2006), “Profiling and Clustering Internet Hosts”.
(5) G. Tan, M. Poletto, J. Guttag, and F. Kaashoek, 2003, “Role classification of hosts within enterprise networks based on connection patterns”, In Proceedings of the annual conference on USENIX Annual Technical Conference (ATEC ’03), USENIX Association, USA, 2.
(6) V. Frias-Martinez, S. J. Stolfo and A. D. Keromytis, "Behavior-Profile Clustering for False Alert Reduction in Anomaly Detection Sensors," 2008 Annual Computer Security Applications Conference (ACSAC), Anaheim, CA, 2008, pp. 367-376. doi: 10.1109/ACSAC.2008.30