Proj THUDBFuzz Paper Reading: Building Fast Fuzzers

Abstract

背景:即使最快的grammar fuzzer dharma依然要比简单的random fuzzer慢千倍
本文: 工具 F1
目的: 加快语法fuzzer产生新测试用例的速度

  1. 我们从头开始描述如何快速构建语法Fuzzer,从编程语言实现的角度处理Fuzzing。 (Q1. 这里不是语法Fuzzer本身的算法优化,而是构建Fuzzer的优化?Q2: 具体甚么是从编程语言实现的角度?)
  2. 从Python教科书方法开始,我们以分步的方式逐步采用和改编了功能编程和虚拟机实现技术中的优化技术,以及其他新颖的领域特定的优化方法。 (Q3: 功能编程和虚拟机实现技术中的优化技术,以及其他新颖的领域特定的优化方法具体是什么?)
    实验:
  3. 比最快的语法Fuzzer dharma提升了100-300倍的速度。
  4. 甚至比词汇随机模糊器快5–8倍

Intro

有趣的related works

  1. 竞品: Today, a large number of tools exist [26, 58, 59, 2, 24, 16, 54] that all provide grammar-based fuzzing.
  2. 自动生成语法: Recently, however, a number of approaches have been proposed to infer both regular languages [56] as well as context-free grammars either by grammar induction from samples [3] or by dynamic grammar inference from program code [28]. While these approaches require valid sample inputs to learn from, recent work by Mathis et al. [37] and Blazytko et al. [6] suggests that it is possible to automatically generate inputs that cover all language features and thus make good samples for grammar induction

Csmith [63](生成有效的C程序)和JSFun-Fuzz [46](针对Javascript)的情况那样固定。 诸如Gram-fuzz [24],Grammarinator [26],Dharma [38],Domato [20]和CSS Fuzz [46]之类的模糊器允许用户将输入的for-mat指定为上下文无关的语法。 对于那些有限状态自动机就足够的情况,一些模糊器[13,60]允许将FSM作为输入模型。 也可以使用允许对输入进行上下文相关约束的模糊器[17]。

F1

使用有关语法的[Fuzzing Book]教科书章节中的简单Expr语法[65],它可提供每秒103.82 KB的吞吐量。
如果一个人想要很长的输入(例如10兆字节)来对程序进行缓冲区和堆栈溢出压力测试,那么一个人就必须等待一分钟才能产生一个输入。
现在,将其与纯随机模糊器进行比较,例如直接使用dd if = / dev / urandom作为程序的输入(我们称为dev-random模糊器)。使用dev-random可以达到每秒23 MiB的速度,这甚至比最快的基于语法的模糊器dharma快一百倍,后者在Expr上产生174.12 KiB / s。
在CSS语法上,我们的F1原型在单个内核上产生的有效输入的最终吞吐量为每秒80,722千字节。这比迄今为止最快的语法模糊器dharma(仅产生CSS 242 KiB / s)dharma快333倍,甚至是dev-random模糊器的三倍。
这些结果使我们的F1原型机成为世界上产生有效输入的速度最快的模糊器。像F1这样的快速模糊器,不仅可以用于一般地节省用于模糊处理的CPU周期,而且还可以为需要高吞吐量的区域提供大量输入,包括CPU [36],FPGA [32],仿真器[ 18]和IOT设备[68],所有这些都允许高速交互。它还允许压力测试硬件编码器和解码器(例如视频编码器),这些编码器在语法上需要有效,但很少见的输入可以从快速语法中获利.对实施加密和解密的硬件的旁道攻击可能要求证书使用具有有效值的信封,其中加密值是所包含的值之一,F1模糊器可以再次提供帮助。 最后,许多机器使用硬件实现的TCP / IP堆栈,这需要结构化的输入。 假定网络堆栈可以接受高吞吐量的流量,则可以使用F1对这些设备进行模糊处理。

3 Method of Evaluation

  1. 为了确保每个模糊器都有足够的时间将执行路径缓存在内存中,我们从一个包含十次迭代的小型热身循环开始。
  2. 为避免由于种子不同而产生的偏斜,我们选择了从零到92的随机种子,并使用这些种子计算了十次运行的平均值。我们需要确保所有语法模糊测试者都有一个公平的竞争环境。
  3. 每秒输入数之类的度量标准(与突变模糊器一起使用)不公平地惩罚了产生结构复杂的输入的语法模糊器。因此,我们选择仅使用吞吐量(每秒产生的千字节输出)作为度量模糊器性能的指标,而不是产生的输入数量。
  4. 我们看到最大递归深度对吞吐量有影响。因此,我们评估了具有相似递归深度的所有模糊器(允许设置最大深度),深度范围为8到256,超时设置为一小时(36,00秒)。

Our experiments were conducted on a Mac OS X machine with nacOS 10.14.5. The hardware was Mac-BookPro14,2 with an Intel Core i5 two-core processor. It had a speed of 3.1 Ghz, an L2 cache of 256 KB, and L3 cache of 4 MB. The memory was 16 GB. All tools, in-cluding our own, are single-threaded. Times measured and reported is the sum of user time and system time spent in the respective process.

4 Controlling Free Expansion

  1. let us define a minimum depth of expansion (-depth) as the minimum number of levels of expansion needed (stack depth) to produce a completely expanded string corresponding to that key
  2. When the number of stack frames go beyond the budget, choose only those expansions that have the min-imum cost of expansion

5 Compling the Grammar

The idea is to essentially transform the grammar such that each nonterminal becomes a function, and each terminal becomes a call to an output function (e.g print) with the symbol as argument that outputs the corresponding terminal symbol. The functions corresponding to nonterminal symbols have conditional branches corresponding to each expansion rule for that symbol in the grammar. The branch taken is chosen randomly. Each branch is a sequence of method calls corresponding to the terminal and nonterminal symbols in the rule.

6 Compiling to a Faster Language

解析器用python写,但是生成的mutator用c生成

7 The Grammar as a Functional DSL

7.1 Partial Evaluation

  1. embed the expansions directly into parent expansions, eliminating subroutine calls.
  2. pre-computing the mini-mum depth expansions

7.2 Supercompilation

  1. During this abstract interpretation, at any time the interpretation encounters functions or con-trol structures that does not depend on input, the execution is inlined.
  2. If you find that you are working on a simi-lar expression (in our case, a nonterminal expansion you have seen before) 5 , then terminate exploring that thread of continuation, produce a function corresponding to the expression seen, and replace it with a call to the function

8 System-Level Optimization

8.1 Effective Randomness

  1. We can replace the default PRNG with a faster PRNG
  2. 不使用mod,而是 to divide both and take the upper half in terms of bits
  3. 直接先生成所有伪随机数 generate the needed random numbers at one place, and use them one byte at a time when needed.

8.2 Effective Output

  1. 用large chunks来优化输出

9 Production Machines

我们认为是要在用于扩展的不同规则之间进行选择的随机位可以视为要为虚拟机解释的字节流。

9.1 Directed Threaded VM

但是,实现虚拟机最有效的方法之一就是所谓的线程代码[4,19]。 使用交换分派的纯字节码解释器必须获取要执行的下一个字节码,并在表中查找其定义。 然后,它必须将执行控制权转移到定义。 相反,直接线程化的想法是将字节码替换为其定义的起始地址。 此外,每个操作码实现都以结尾语结尾,以调度下一个操作码。 这种技术的实用性在于它消除了调度表和中央调度循环的必要性。 使用此技术可使我们在深度为8时达到每秒17848.876千字节,在深度为32时达到每秒53593.384千字节。

9.2 Context Threaded VM

直接线程[19]的问题之一是它倾向于增加分支的错误预测。 主要问题是,计算出的goto目标很难预测。 另一种方法是上下文线程[5]。 这个想法是一致地使用IP寄存器,并在可能的情况下根据IP寄存器的值使用调用和返回。 这与简单地使用子例程调用不同,因为没有传递任何参数,并且不会生成子例程调用的序言和结语。 这样做需要生成汇编,因为C不允许我们直接生成裸调用和返回指令。 因此,我们生成了与上下文线程VM对应的X86-64程序集,并对其进行了编译。 图9给出了伪组装中该虚拟机的片段。

Model based Fuzzers

Generating inputs using grammars were initially explored by Burkhadt [8], and later by Hanford [25] and Purdom [45]. Modern fuzzing tools that use some input models include CSmith [63], LangFuzz [27], Grammarinator [26] (Python), and Domato [20] (Python), Skyfire [58] (Python), and Superion [59], which extends AFL.

Grammar Learning

There is a large amount of research on inferring regular grammars from blackbox sys-tems [15, 56, 57]. The notable algorithms include L* [1] and RPNI [42]. Blackbox approaches can also be used to learn context-free grammars. Notable approaches include version spaces [53], and GRIDS [33]. GLADE [3] and later REINAM [61] derives the context-free in-put grammar focusing on blackbox programs. Other notable works include the approach by Lin et al. [34] which extracts the AST from programs that parse their input, AUTOGRAM by Höschele et al. [29, 28] which learns the input grammar through active learn-ing using source code of the program, Tupni [14] by Cui et al. which reverse engineers input formats us-ing taint tracking, Prospex [12] from Comparetti et al. which is able to reverse engineer network protocols, and Polyglot [9] by Caballero et al. Another approach for model learning is through ma-chine learning techniques where the model is not formally represented as a grammar. Pulsar [21] infers a Markov model for representing protocols. Gode-froid et al. [23] uses the learned language model to fuzz. IUST-DeepFuzz from Nasrabadi et al. [40] uses infers a neural language model using RNNs from the given corpus of data, which is used for fuzzing.

Faster execution

One of the major concerns of fuzzers is the speed of execution — to be effective, a fuzzer needs to generate a plausible input, and execute the program under fuzzing. Given that programs often have instrumentation enabled for tracing coverage, it becomes useful to reduce the overhead due to in-strumentation. The untracer from Nagy et al. [39] can remove the overhead of tracing from parts of the program already covered, and hence make the overall program execution faster. Another approach by Hsu et al. [30] shows that it is possible to reduce the over-head of instrumentation even further by instrument-ing only the parts that are required to differentiate paths.

Grammar fuzzers

A number of grammar based fuzzers exist that take in some form of a grammar. The fuzzers such as Gramfuzz[24], Grammarinator [26], Dharma [38], Domato [20], and CSS Fuzz [46] allow context-free grammars to be specified exter-nally. Other fuzzers [13, 60] allow specifying a reg-ular grammar or equivalent as the input grammar. Some [17] also allow constraint languages to specify context sensitive features. Other notable research on grammar based fuzzers include Nautilus [2], Blend-fuzz [62], and the Godefroid’s grammar based white-box fuzzing [22].

Optimizations in Functional Languages

We have dis-cussed how the fuzzer can be seen as a limited func-tional domain specific language (DSL) for interpret-ing context-free grammars. Supercompilation is not the only method for optimizing functional programs. Other methods include deforestation [55], and partial evaluation et al. [31]. Further details on how par-tial evaluation, deforestation and supercompilation fit togetherr can be found in Sorensen et al. [50].

Optimizations in Virtual Machine Interpreters

A number of dispatch techniques exist for virtual ma-chine interpreters. The most basic one is called switch dispatch in which an interpreter fetches and executes an instruction in a loop [7]. In direct thread-ing, addresses are arranged in an execution thread, and the subroutine follows the execution thread by using computed jump instructions to jump directly to the subroutines rather than iterating over a loop. A problem with the direct threading approach is that it is penalized by the CPU branch predictor as the CPU is unable to predict where a computed jump will transfer control to. An alternative is context threading [5] where simple call and return instruc-tions are used for transferring control back and forth from subroutines. Since the CPU can predict where a return would transfer control to, after a call, the penalty of branch misprediction is lessened.

posted @ 2021-04-15 16:13  雪溯  阅读(119)  评论(0编辑  收藏  举报