darktrace 亮点是使用的无监督学习(贝叶斯网络、聚类、递归贝叶斯估计)发现未知威胁——使用无人监督 机器学习反而允许系统发现罕见的和以前看不见的威胁,这些威胁本身并不依赖 不完善的训练数据集。 学习正常数据,发现异常!
先说说他们的产品:企业免疫系统(基于异常发现来识别威胁)
可以看到是面向企业内部安全的!
优点
整个网络拓扑的三维可视化
企业威胁级别的实时全局概述
智能地聚类异常
泛频谱观测 - 高阶网络拓扑;特定群集,子网和主机事件
可搜索的日志和事件
重播历史数据
设备和外部IP的整体行为的简明摘要
专为业务主管和安全分析师设计
100%的能见度
企业免疫系统是世界上最先进的网络防御机器学习技术。受到人体免疫系统自我学习智能的启发,这种新技术在复杂和普遍的网络威胁的新时代中,使组织自我保护方式发生了根本转变。
人体免疫系统非常复杂,并且不断适应新形式的威胁,例如不断变异的病毒DNA。它的工作原理是了解身体的正常情况,识别和消除那些不符合正常发展模式的异常值。
Darktrace将相同的逻辑应用于企业和工业环境。在机器学习和人工智能算法的支持下,企业免疫系统技术迭代地为网络中的每个设备和用户学习独特的“生活模式”(“自我”),并将这些见解联系起来,以发现新出现的威胁,否则这些威胁将被忽视。
与人体免疫系统一样,企业免疫系统不需要先前的威胁或活动模式经验,以了解它可能具有威胁性。它可以在没有先验知识或签名的情况下自动工作,实时检测并抵御网络内部的微妙,隐秘攻击。
https://www.engerati.com/system/files/7.18.18_machine_learning_in_the_era_of_cyber_ai.pdf
要点摘录:
从一开始,Darktrace就拒绝了与历史攻击相关的数据可以预测未来数据的假设。相反,Darktrace的网络AI平台使用无监督的机器学习来大规模地分析网络数据,并根据它所看到的证据进行数十亿次基于概率的计算。它不依赖于过去威胁的知识,而是独立地对数据进行分类并检测引人注目的模式。
使用无人监督
机器学习反而允许系统发现罕见的
和以前看不见的威胁,这些威胁本身并不依赖
不完善的训练数据集。 与历史攻击有关的数据
不一定能防范未来的。
它看到了而不是依赖过去威胁的知识,
由此,它形成了对“正常”的理解
整个网络的行为,与设备,用户有关,
或任一实体的组,并检测与此的偏差。
不断发展的“生活模式”可能指向一种正在发展的威胁。
Darktrace机器学习的核心原则
它了解网络中“正常工作”中的正常情况
- 它不依赖于先前攻击的知识。
它在现代的规模,复杂性和多样性上蓬勃发展
企业,每个设备和人都是独一无二的。
它将攻击者的创新转变为对抗他们 - 任何
异常活动是可见的。
具体技术除了无监督的异常检测,聚类技术还有用于分类的深度学习技术,要点如下:
(1)使用的是贝叶斯网络。Darktrace使用贝叶斯概率作为其中的一部分
独特的无监督机器学习方法。
详细如下:
Technical Overview
Darktrace’s transformative approach to cyber defense
relies on probabilistic methods developed by Cambridge
mathematicians. Employing multiple unsupervised, supervised,
and deep learning techniques in a Bayesian framework, the
Enterprise Immune System can integrate a vast number
of weak indicators of anomalous behavior to produce a single
clear measure of threat probabilities.
For each unique environment, Darktrace generates millions
of interrelated mathematical models which are correlated to
ensure that only truly anomalous behavior is detected without
a profusion of false positives. Unlike rules-based computation,
the results that probabilistic mathematics generate cannot
simply be categorized as ‘yes’ or ‘no’ but instead indicate
degrees of certainty, reflecting the ambiguities that
inevitably exist in dynamic data environments.
Ranking threat
The Enterprise Immune System accounts for ambiguities by
distinguishing between the subtly differing levels of evidence
that characterize network data. Instead of generating the
simple binary outputs ‘malicious’ or ‘benign’, Darktrace’s
mathematical algorithms produce outputs marked with
differing degrees of potential threat. This enables users of
the system to rank alerts in a rigorous manner, and prioritize
those which most urgently require action, while removing
the problem of numerous false positives associated with a
rule-based approach.
At its core, Darktrace mathematically characterizes what
constitutes ‘normal’ behavior, based on the analysis of a
large number of different measures of a device’s network
behavior, including: ——基于行为异常发现威胁。
Server access
Data volumes
Timings of events
Credential use
Connection type, volume, and directionality
Directionality of uploads/downloads
File type
Admin activity
Resource and information requests
也就是数据维度包括:
服务器访问
数据量
活动时间
凭证使用
连接类型,大小和方向性
上传/下载的方向性
文件类型
管理活动
资源和信息请求
咋感觉是用在企业数据保护场景里。。。
(2)使用聚类技术来识别正常的设备行为。
Darktrace采用了许多不同的聚类
方法,包括基于矩阵的聚类,基于密度的方法
聚类和层次聚类技术。该
然后使用所得到的聚类来建模
个别设备的规范行为。
Clustering devices
In order to model what should be considered as normal for a
device, its behavior is analyzed in the context of other similar
devices on the network. Darktrace leverages the power of
unsupervised machine learning to algorithmically identify
significant groupings of devices, a task which is impossible
to do manually on even modestly-sized networks.
To create a holistic image of the relationships within the
network, Darktrace employs a number of different clustering
methods, including matrix-based clustering, density-based
clustering, and hierarchical clustering techniques. The
resulting clusters are then used to inform the modeling of
the normative behaviors of individual devices.
(3)识别网络拓扑结构中的变化
Network topology
A network is far more than the sum of its individual parts,
with much of its meaning contained in the relationships
among its different entities. Darktrace employs many
mathematical methods to model the multiple facets of a
network’s topology, allowing it to track subtle changes in
structure that are indicative of threats.(识别网络拓扑结构中的些微变化)
One approach is based on iterative matrix methods that
reveal important connectivity structures within the network,
in a similar way to advanced page-ranking algorithms.
In tandem with these, Darktrace has developed innovative
applications of models from the field of statistical physics,
which allows the modeling of a network’s ‘energy landscape’
to reveal anomalous substructures that could represent
the first symptoms of compromise.(发现异常子结构)
(4)识别网络中异常行为,应该是根据网络协议、IP等识别异常的流量。
Network structure
A further important challenge in modeling the behaviors of a
dynamically evolving network is the huge number of potential
predictor variables. For the observation of packet traffic and
host activity within an enterprise LAN or WAN, where both
input and output can contain many inter-related features
(protocols, source and destination machines, log changes,
and rule triggers etc.协议,源和目标机器,日志更改,
和规则触发器等), learning a sparse and consistent
structured predictive function is crucial.——预测网络流量吗?
In this context, Darktrace employs a cutting-edge large-scale
computational approach to understand sparse structure
in models of network connectivity based on applying L1-
regularization techniques (the lasso method). This allows
the Enterprise Immune System to discover true associations
between different elements of a network(发现网络元素之间的关系) which can be cast
as efficiently solvable convex optimization problems and
yield parsimonious models.
(5)使用递归贝叶斯估计来发现网络设备(状态、行为)的时间演进关系
(https://blog.csdn.net/Young_Gy/article/details/78642271 感觉RBE就是求解xt和xt-1之间的关系,无非就是用到了贝叶斯概率而已)
Recursive Bayesian Estimation
To combine these multiple analyses of network behavior, (生成网络设备的全面状态图)
generating a single comprehensive picture of the state of the
devices that comprise a network, Darktrace leverages the
power of Recursive Bayesian Estimation (RBE). Using RBE,
Darktrace’s mathematical models are able to constantly
adapt to new information as it becomes available to the
system. Continually recalculating threat levels in the light
of new data, the Enterprise Immune System can discern
significant patterns in data flows indicative of attacks, where
conventional signature-based methods see only chaos.传统的签名方法只能看到混乱。
(6)他们也使用了深度学习的分类技术
Darktrace & Deep Learning
Darktrace also uses deep learning to enhance modeling
processes. Deep learning is a subset of machine learning
that uses the cascading interactions of layered mathematical
processes – known as neural nets – to give intelligent
systems a higher degree of insight. Multi-layered neural
nets can improve the detection and remediation of certain
threats, for example, in the identification of DNS anomalies,
which are less effectively tracked by other machine learning
methods. Darktrace’s deep learning system assigns a score
to all DNS data from a device, with the purpose of identifying
suspicious activity even faster.(识别DNS异常,其他机器学习不太有效地跟踪它们。 分析来自设备的所有DNS数据,用于识别
DNS可疑活动。)
Darktrace also clusters devices into peer groups, based on
its own understanding of how those devices behave, and
uses supervised learning to uncover sequences of breaches,
unusual patterns, or to detect aberrant activity at a higher,(对这些设备的行为方式的理解,以及使用有监督的学习来发现违规行为,
不寻常的模式,或检测更高的异常活动)
more holistic level. For example, the WannaCry ransomware
was easily detected by Darktrace as it breaches a number of
different ‘pattern of life’ models. Using supervised learning,
Darktrace can replicate the process of a human interpreting
various sets of breaches for a device or network over time
and so present correlated alerts instead of a multitude.
Supervised learning is also used by Darktrace to understand
more about the environment, without a human having to label
it. By observing millions of different smartphones, for example,
Darktrace gets faster and faster at identifying a new device as a
‘smartphone’, and even what type of smartphone it is.
Using deep and supervised techniques to complement its core
unsupervised machine learning algorithms, Darktrace builds
up unique, contextual knowledge about network activity and
integrates the insights of our global deployments to improve
threat detection.
Finally, Darktrace also uses deep learning techniques to
automate repetitive and time-consuming tasks carried out
during investigation workflows. By analyzing how seasoned
cyber analysts interact with the Threat Visualizer, triage
alerts, and leverage third-party sources, Darktrace is able
to replicate those expert behaviors and automate certain
analyst functions.(Darktrace还使用深度学习技术
自动执行重复且耗时的任务,这个是要干嘛?没太明白)
Darktrace’s technology has become a vital tool for security
teams attempting to understand the scale of their network,
observe levels of activity, and detect areas of potential
weakness.