使用机器学习检测命令行混淆
-
[翻译]使用机器学习检测命令行混淆2019-1-13 10:26
翻译:玉林小学生 校对:Daemond
Obfuscated Command Line Detection Using Machine Learning
使用机器学习检测命令行混淆
This blog post presents a machine learning (ML) approach to solving an emerging security problem: detecting obfuscated Windows command line invocations on endpoints. We start out with an introduction to this relatively new threat capability, and then discuss how such problems have traditionally been handled. We then describe a machine learning approach to solving this problem and point out how ML vastly simplifies development and maintenance of a robust obfuscation detector. Finally, we present the results obtained using two different ML techniques and compare the benefits of each.
本博客介绍如何使用机器学习解决当前的一个安全问题:检测终端执行混淆的命令行。我们先介绍下这个较新的威胁,然后讨论如何使用传统方法解决该问题。之后我们介绍一种用机器学习解决该问题的方法,并且说明机器学习如何在最简化开发的同时维护一个强大的混淆检测器。最后我们介绍使用两种不同机器学习技术得到的结果,并比较两种技术的优缺点。
Introduction
引言
Malicious actors are increasingly “living off the land,” using built-in utilities such as PowerShell and the Windows Command Processor (cmd.exe) as part of their infection workflow in an effort to minimize the chance of detection and bypass whitelisting defense strategies. The release of new obfuscation tools makes detection of these threats even more difficult by adding a layer of indirection between the visible syntax and the final behavior of the command. For example, Invoke-Obfuscation and Invoke-DOSfuscation are two recently released tools that automate the obfuscation of Powershell and Windows command lines respectively.
恶意攻击者越来越多使用例如PowerShell和cmd的内建工具实现不落地攻击,以最小化被检测风险并绕过白名单防御策略。市面上的新混淆工具通过在命令的可见语法和实际行为间增加一个间接层使得检测这些威胁更加困难。例如,Invoke-Obfuscation和Invoke-DOSfuscation是最近发布的两个工具,它们分别对Powershell和Windows命令行进行自动化混淆。
The traditional pattern matching and rule-based approaches for detecting obfuscation are difficult to develop and generalize, and can pose a huge maintenance headache for defenders. We will show how using ML techniques can address this problem.
传统基于模式匹配和基于规则的混淆检测方法难以开发和推广,并且会长期让防御者头疼。我们要介绍机器学习如何解决这个问题。
Detecting obfuscated command lines is a very useful technique because it allows defenders to reduce the data they must review by providing a strong filter for possibly malicious activity. While there are some examples of “legitimate” obfuscation in the wild, in the overwhelming majority of cases, the presence of obfuscation generally serves as a signal for malicious intent.
检测被混淆的命令行是一个非常有用的技术,它可以提供一个强大的恶意行为过滤器使得防御者减少必须匹配的数据。现实世界中是有一些合法进行混淆的例子,但大多数时候,存在混淆是存在恶意行为的信号。
Background
背景
There has been a long history of obfuscation being employed to hide the presence of malware, ranging from encryption of malicious payloads (starting with the Cascade virus) and obfuscation of strings, to JavaScript obfuscation. The purpose of obfuscation is two-fold:
利用混淆来隐藏恶意软件已经有很长的历史了,从开始的加密payload(从Cascade virus起)和混淆字符串,到后来的JavaScript混淆。混淆有双重目的:
- Make it harder to find patterns in executable code, strings or scripts that can easily be detected by defensive software.
- 使得在代码中找到固定模式变得困难,防御软件检测字符串或脚本是很容易的。
- Make it harder for reverse engineers and analysts to decipher and fully understand what the malware is doing.
- 使得通过逆向工程解密并完全理解恶意软件的功能变得困难。
In that sense, command line obfuscation is not a new problem – it is just that the target of obfuscation (the Windows Command Processor) is relatively new. The recent release of tools such as Invoke-Obfuscation (for PowerShell) and Invoke-DOSfuscation (for cmd.exe) have demonstrated just how flexible these commands are, and how even incredibly complex obfuscation will still run commands effectively.
这样说来,命令行混淆不是个新东西,只是混淆的对象(Windows命令解析器)相对较新。最近发布的工具(针对PowerShell的Invoke-Obfuscation,针对cmd.exe的Invoke-DOSfuscation)展示了这些命令的灵活性,以及命令经过那么难以置信的混淆却仍然能有效执行。
There are two categorical axes in the space of obfuscated vs. non-obfuscated command lines: simple/complex and clear/obfuscated (see Figure 1 and Figure 2). For this discussion “simple” means generally short and relatively uncomplicated, but can still contain obfuscation, while “complex” means long, complicated strings that may or may not be obfuscated. Thus, the simple/complex axis is orthogonal to obfuscated/unobfuscated. The interplay of these two axes produce many boundary cases where simple heuristics to detect if a script is obfuscated (e.g. length of a command) will produce false positives on unobfuscated samples. The flexibility of the command line processor makes classification a difficult task from an ML perspective.
混淆和非混淆命令行之间两个类坐标轴:简单/复杂,清晰/混淆(见图一和图二)。简单意味着通常较短并相对不复杂,但仍然可以包含混淆;复杂意味着长,进过混淆的或没经过混淆的复杂字符串。因此,简单/复杂维度与混淆/未混淆维度垂直。相互垂直的两个维度产生许多分隔的情况,简单的混淆脚本启发式检测方法(命令的长度)将对未混淆的简单样本产生误报。命令解析器的灵活性使得从机器学习视角看分类成了一个困难的任务。
Traditional Obfuscation Detection
传统混淆检测
Traditional obfuscation detection can be split into three approaches. One approach is to write a large number of complex regular expressions to match the most commonly abused syntax of the Windows command line. Figure 3 shows one such regular expression that attempts to match ampersand chaining with a call command, a common pattern seen in obfuscation. Figure 4 shows an example command sequence this regex is designed to detect.
传统混淆检测可以分为三类。第一是写许多复杂的正则表达式去匹配Windows命令行中最常被滥用的语法。图三是一个正则表达式样例,尝试匹配一个call命令的&链,这是混淆的一种常用模式。图四展示一个这条正则表达式负责检测的命令样本。
There are two problems with this approach. First, it is virtually impossible to develop regular expressions to cover every possible abuse of the command line. The flexibility of the command line results in a non-regular language, which is feasible yet impractical to express using regular expressions. A second issue with this approach is that even if a regular expression exists for the technique a malicious sample is using, a determined attacker can make minor modifications to avoid the regular expression. Figure 5 shows a minor modification to the sequence in Figure 4, which avoids the regex detection.
这个方法有两个问题。第一,基本不可能开发出匹配所有命令行滥用的正则表达式。命令行的灵活性使其就像一个非正则的语言,它的灵活性使得使用一个正则表达式表示存在恶意样本使用的技术不现实,这个方法的另一个问题就是,即使一个正则表达式适用于恶意软件样本,攻击者只需做很小的修改就可以绕过正则表达式。图五显示了对图四的一个小修改,使其避免被正则表达式检测。
The second approach, which is closer to an ML approach, involves writing complex if-then rules. However, these rules are hard to derive, are complex to verify, and pose a significant maintenance burden as authors evolve to escape detection by such rules. Figure 6 shows one such if-then rule.
第二个方法,与机器学习有点像,采用复杂的if-then规则。然而,这些规则很难发现,很难验证,并且随着恶意作者渐渐规避那些规则的检测将代码很大的维护开销。图6展示了一个if-then规则。
A third approach is to combine regular expressions and if-then rules. This greatly complicates the development and maintenance burden, and still suffers from the same weaknesses that make the first two approaches fragile. Figure 7 shows an example of an if-then rule with regular expressions. Clearly, it is easy to appreciate how burdensome it is to generate, test, maintain and determine the efficacy of such rules.
第三个方法结合正则表达式和if-then规则。它使得开发和维护成本很大,并也面临前两个方法相同的缺陷。图七展示一个if-then结合正则表达式的样例。清楚地看到,生成、测试、维护和判断这个样例的有效性需要大量的开销。
The ML Approach – Moving Beyond Pattern Matching and Rules
机器学习方法-超越模式匹配和规则
Using ML simplifies the solution to these problems. We will illustrate two ML approaches: a feature-based approach and a feature-less end-to-end approach.
使用机器学习简化了对这些问题的解决。我们也举两个机器学习方法的例子:一个基于特征的方法和一个非特征端到端的方法。
There are some ML techniques that can work with any kind of raw data (provided it is numeric), and neural networks are a prime example. Most other ML algorithms require the modeler to extract pertinent information, called features, from raw data before they are fed into the algorithm. Some examples of this latter type are tree-based algorithms, which we will also look at in this blog (we described the structure and uses of Tree-Based algorithms in a previous blog post, where we used a Gradient-Boosted Tree-Based Model).
有些机器学习技术可以应用于任何类型的源数据(提供数字化后的),神经网络是主要的一个。大多数其它机器学习算法需要建模者从源数据提取出相关信息,称作特征,才能提交给算法。这一类例如基于树的算法,我们也要看看这种算法(在之前的博客中我们讨论过基于树的算法的结构和使用,那里我们使用了一个基于树的梯度提升模型)。
ML Basics – Neural Networks
机器学习基础-神经网络
Neural networks are a type of ML algorithm that have recently become very popular and consist of a series of elements called neurons. A neuron is essentially an element that takes a set of inputs, computes a weighted sum of these inputs, and then feeds the sum into a non-linear function. It has been shown that a relatively shallow network of neurons can approximate any continuous mapping between input and output. The specific type of neural network we used for this research is what is called a Convolutional Neural Network (CNN), which was developed primarily for computer vision applications, but has also found success in other domains including natural language processing. One of the main benefits of a neural network is that it can be trained without having to manually engineer features.
神经网络是一种最近非常流行的机器学习算法,它由许多称为神经元的元素组成。一个神经元是一个元素,它获取一个输入集合,计算这些输入的一个加权和,然后将和提供给一个非线性函数。现已证实,一个层数较浅的神经网络就可以逼近输入与输出之间的任意连续映射关系。我们在本研究中使用的神经网络类型是卷积神经网络(CNN),它最初被开发以应用于图像应用,但也被成功应用于许多其它领域,如自然语言识别。使用神经网络的一个优势是无需人工构造特征就可以进行训练。
Featureless ML
非基于特征数据的机器学习
While neural networks can be used with feature data, one of the attractions of this approach is that it can work with raw data (converted into numeric form) without doing any feature design or extraction. The first step in the model is converting text data into numeric form. We used a character-based encoding where each character type was encoded by a real valued number. The value was automatically derived during training and conveys semantic information about the relationships between characters as they apply to cmd.exe syntax.
可以使用神经操作特征数据,但这种方法的真正吸引在于它可以应用于源数据(转换为数字形式的)而无需进行特征设计或提取工作。第一步是将文本数据转换成数字形式。我们使用了一个字符编码,它将每一个字符类型编程成一个实数。这些值在训练过程中自动推理,并将字符间关系的语义信息转换成cmd.exe的语法。
Feature-Based ML
基于特征数据的机器学习
We also experimented with hand-engineered features and a Gradient Boosted Decision Tree algorithm. The features developed for this model were largely statistical in nature – derived from the presence and frequency of character sets and keywords. For example, the presence of dozens of ‘%’ characters or long, contiguous strings might contribute to detecting potential obfuscation. While any single feature will not perfectly separate the two classes, a combination of features as present in a tree-based model can learn flexible patterns in the data. The expectation is that those patterns are robust and can generalize to future obfuscation variants.
我们也针对人工提权的特征和梯度提升决策树进行了测试。为这个模型开发的特征来源于对现实的大量统计-产生于字符集和关键字的存在及其频率。例如,存在大量或很长一段‘%’字符,附加的字符串将被用于检测潜在的混淆。当单个特征无法很好地区分两类结果是,基于树的模型中的多个特征的结合可以很好地学习数据中模式的灵活性。我们期望这些模型稳定并能包含未来混淆的变化。
Data and Experiments
数据和实验
To develop our models, we collected non-obfuscated data from tens of thousands of endpoint events and generated obfuscated data using a variety of methods in Invoke-DOSfuscation. We developed our models using roughly 80 percent of the data as training data, and tested them on the remaining 20 percent. We ensured that our train-test split was stratified. For featureless ML (i.e. neural networks), we simply input Unicode code points into the first layer of the CNN model. The first layer converts the code point into semantically meaningful numerical representations (called embeddings) before feeding it into the rest of the neural network.
为了开发该模型,我们从成千上万终端事件中搜集非混淆数据并用Invoke-DOSfuscation中的各种方法产生混淆数据。我们使用大约80%数据训练,用剩下20%数据测试。我们确保有层次地进行训练、测试数据划分。对于非特征机器学习(如神经网络),我们只需输入Unicode编码到CNN的第一层。第一层将编码到包含语义信息的数值型表示(称作embeddings),然后交给后续的神经网络。
For the Gradient Boosted Tree method, we generated a number of features from the raw command lines. The following are some of them:
对于梯度提升树方法,我们从源命令行中产生一些特征。下面是产生的部分特征:
- Length of the command line
- 命令行长度
- The number of carets in the command line
- 命令行中^的个数
- The count of pipe symbols
- 管道符号的个数
- The fraction of white space in the command line
- 命令行中空白片段
- The fraction of special characters
- 特殊字符片段
- The frequency of the strings “cmd” and “power” in the command line
-
命令行中“cmd”和“power”字符串出现的频率
While each of these features individually is a weak signal and could not possibly be a good discriminator on its own, a flexible classifier such as a Gradient Boosted Tree – trained on sufficient data with these features – is able to classify obfuscated and non-obfuscated command lines in spite of the aforementioned difficulties.
这些特征中的每一个都不能够独立进行检测,一个灵活的分类器如梯度提升树-针对大量数据的这些特征进行训练-能够将混淆和非混淆的命令行进行分类,不用考虑前文的那些困难。
Results
结果
Evaluated against our test set, we were able to get nearly identical results from our Gradient Boosted Tree and neural network models.
针对我们的测试集进行评价,梯度提升树和神经网络模型得到几乎一样的结果。
The results for the GBT model were near perfect with metrics such as F1-score, precision, and recall all being close to 1.0. The CNN model was slightly less accurate.
梯度提升树模型得到几乎完美的指标,如F1-score,准确度和查全率都基本是1.0。CNN模型的准确性稍低。
While we certainly do not expect perfect results in a real-world scenario, these lab results were nonetheless encouraging. Recall that all of our obfuscated examples were generated by one source, namely the Invoke-DOSfuscation tool. While Invoke-DOSfuscation generates a wide variety of obfuscated samples, in the real world we expect to see at least some samples that are quite dissimilar from any that Invoke-DOSfuscation generates. We are currently collecting real world obfuscated command lines to get a more accurate picture of the generalizability of this model on obfuscated samples from actual malicious actors. We expect that command obfuscation, similar to PowerShell obfuscation before it, will continue to emerge in new malware families.
对于实战我们当然不期待完美的结果,但这些实验结果任然很鼓舞人心。考虑到我们所有的混淆样本都有一个源产生,Invoke-DOSfuscation工具。当然Invoke-DOSfuscation能产生变化丰富的混淆样本,在实战中我们预期至少会看到一些样本与Invoke-DOSfuscation产生的十分不同的样本。我们现在正在收集现实中被混淆的命令,以获得对于真实恶意行为更好的准确性。我们预期命令混淆,类似于之前的Powershell混淆,将持续在新的恶意软件家族中出现。
As an additional test we asked Daniel Bohannon (author of Invoke-DOSfuscation, the Windows command line obfuscation tool) to come up with obfuscated samples that in his experience would be difficult for a traditional obfuscation detector. In every case, our ML detector was still able to detect obfuscation. Some examples are shown in Figure 8.
作为附加测试,我们叫Daniel Bohannon(Invoke-DOSfuscation的作者)以他的经验想出一些对传统混淆检测工具来说有困难的混淆样本。对每种情况,我们的机器学习检测器仍然能检测到混淆,部分样本见图8.
We also created very cryptic looking texts that, although valid Windows command lines and non-obfuscated, appear slightly obfuscated to a human observer. This was done to test efficacy of the detector with boundary examples. The detector was correctly able to classify the text as non-obfuscated in this case as well. Figure 9 shows one such example.
我们还创建了一些模糊的文本,尽管是有效的命令行并且没有被混淆,对人类观察者来说看着很像被混淆的。这用来测试检测器对有歧义样本的有效性。这种情况下检测器能准确地将这些文本识别为非混淆。图9展示一个这样的样本。
Finally, Figure 10 shows a complicated yet non-obfuscated command line that is correctly classified by our obfuscation detector, but would likely fool a non-ML detector based on statistical features (for example a rule-based detector with a hand-crafted weighing scheme and a threshold, using features such as the proportion of special characters, length of the command line or entropy of the command line).
最后,图10展示了一个复杂的非混淆命令行,被我们的混淆检测器准确识别了,但能够欺骗一个基于静态特征的非机器学习检测器(如一个基于规则的,人工设计权重和阈值的,使用特定字符串存在的比例、命令行长度或命令行熵值为特征的检测器)。
CNN vs. GBT Results
CNN与梯度提升树结果比较
We compared the results of a heavily tuned GBT classifier built using carefully selected features to those of a CNN trained with raw data (featureless ML). While the CNN architecture was not heavily tuned, it is interesting to note that with samples such as those in Figure 10, the GBT classifier confidently predicted non-obfuscated with a score of 19.7 percent (the complement of the measure of the classifier’s confidence in non-obfuscation). Meanwhile, the CNN classifier predicted non-obfuscated with a confidence probability of 50 percent – right at the boundary between obfuscated and non-obfuscated. The number of misclassifications of the CNN model was also more than that of the Gradient Boosted Tree model. Both of these are most likely the result of inadequate tuning of the CNN, and not a fundamental shortcoming of the featureless approach.
我们比较一个使用精心挑选的特征并且调整过权重的GBT分类器与一个使用源数据训练的CNN(非特征机器学习)的结果。当CNN架构没有经过认真调整时,有趣的现象是,对于如图10的那些样本,GBT分类器以19.7的比例(补充测量分类器在非混淆情况下的可信度)可信地预测出非混淆情况。同时,CNN分类器以近50%的可信度预测出非混淆情况。在混淆和非混淆的边界处右侧。CNN的误分类数量也比梯度提升树高。这些情况与CNN训练不充分时很像,不是非特征方法的一个本质缺陷。
Conclusion
结论
In this blog post we described an ML approach to detecting obfuscated Windows command lines, which can be used as a signal to help identify malicious command line usage. Using ML techniques, we demonstrated a highly accurate mechanism for detecting such command lines without resorting to the often inadequate and costly technique of maintaining complex if-then rules and regular expressions. The more comprehensive ML approach is flexible enough to catch new variations in obfuscation, and when gaps are detected, it can usually be handled by adding some well-chosen evader samples to the training set and retraining the model.
本文中我们描述了使用机器学习检测混淆的Windows命令行的方法,这可以被用于协助识别恶意命令行的使用。使用机器学习技术,我们举例了一个高准确性的机制来检测命令行混淆,无需采用效果不佳并开销大的if-then规则或正则表达式。更加综合性的机器学习方法足够灵活来捕获混淆中的新变种,并在发现缺陷时,可以通过在训练集增加一些挑选的规避样本来重新训练模型。
This successful application of ML is yet another demonstration of the usefulness of ML in replacing complex manual or programmatic approaches to problems in computer security. In the years to come, we anticipate ML to take an increasingly important role both at FireEye and in the rest of the cyber security industry.
这项对机器学习的成功应用是在安全领域使用机器学习代替人工或程序开发的又一个例子。在未来几年,我们预测机器学习将在FireEye和其它网络安全产业承担更多重要的任务。