Proj. THUIoTFuzz Paper Reading: One Engine to Fuzz 'em All: Generic Language Processor Testing with Semantic Validation

Abstract

背景：Language processors，比如编译器或者解释器，很重要，但是现在的fuzzer却大部分不能够生成具有足够语法正确性的种子，或者只针对一种或几种languages
本文：
提出工具PolyGlot
特点: generic fuzzing framework + high-semantic-correct test cases
方法:

Intermediate representation
constrained mutation
semantic validation

实验:
对象：21个language processors, 9种编程语言
效果：发现了173bugs, 113 fixed, 18 CVEs，认为能够处理很多languages
比现有的code fuzzers多出30倍code coverage

1. Introduction

Language processors的意义，出问题的严重性，fuzz的挑战，

实验：
对象: 21种常见编译器，九种编程语言
效果：

找到173个新bugs(113 fixed, 18 CVEs)
100倍语言正确性提升
30倍新路径发现
8倍unique bugs发现

2. Problem

首先描述了language processor是什么，如果发生了语法/语义错误怎么办

已有工作

A: 无结构突变-效果：几乎无法生成语法正确的test cases
进一步-在AST或者IR上突变或者generation-based，问题：难以同时兼顾通用性和效果，举了CSmith的代码行数为例
本文-PolyGlot
fuzzing framework, can generate semantically valid test cases

uniform IR
给定BNF范式，将源程序转换为IR
用户可以提供语义注释来描述语言的scopes和types，这些注释将会在翻译中起到语义作用，例如.
constrained mutation: 保留了变异测试用例的语法结构，这有助于保留其语法正确性。
- 可以把代码分为mutated部分和unmutated部分，unmutated部分保持不变
- 算法检查mutated部分变异得到的invalid variables，然后根据types和scopes信息来将这些invalid variables替换成valid variables，比如对一个只包含num和arr两个变量的程序，Ployglot会把undefinedVar += 1中的undefinedVar改成num
- 算法将每个变量type, scope和name都记录下来，放入符号表。符号表带来了6.4倍的semantic correctness

B: 专用fuzzer
专用fuzzer的语法正确性更高，举例: CSmith, JavaScript fuzzer, Squirrel
问题: 难以转移到另外一种语言上

C: language-based fuzzers:
LangFuzz: 随机替换变量
Nautilus: 使用一个预定义的变量名称小集合，然后根据覆盖率来生成结果
缺点: 不太支持显式类型转换

常见语法错误

本文从现有fuzzers生成的无效testcases总结了4条最常见的语法错误：

Undefined Variables or Functions
Out-of-scope Variables or Functions
Undefined Types
Unmatched types

本文方法

转化为Uniform IR

这样做的意义

首先生成可能有错的部分，再修复

根据IR type来替换
只mutate具有local effects的IRs，也就是不包含定义也不创建新local scopes的，举例说明具体选择方案。举例说明SymbolTable。

3. Overview of PolyGlot

4. Frontend Generation

5. Constrained mutation

6. Semantic Validation

7. Implementation

8. Evaluation

9. Discussion

附录：

bugs细节
Case Study 1的PoC生成细节
竞品在不同语言上面的执行速度

posted @ 2021-01-08 16:34 雪溯阅读(209) 评论(0) 编辑收藏举报

刷新页面返回顶部

雪溯

总之心情不好的话大概就会来这边做两道OJ，此处顺便储存部分笔记

Proj. THUIoTFuzz Paper Reading: One Engine to Fuzz 'em All: Generic Language Processor Testing with Semantic Validation

Abstract

1. Introduction

2. Problem

已有工作

常见语法错误

本文方法

3. Overview of PolyGlot

4. Frontend Generation

5. Constrained mutation

6. Semantic Validation

7. Implementation

8. Evaluation

9. Discussion

附录：

公告

雪溯

总之心情不好的话大概就会来这边做两道OJ，此处顺便储存部分笔记

Proj. THUIoTFuzz Paper Reading: One Engine to Fuzz 'em All: Generic Language Processor Testing with Semantic Validation

Abstract

1. Introduction

2. Problem

已有工作

常见语法错误

本文方法

3. Overview of PolyGlot

4. Frontend Generation

5. Constrained mutation

6. Semantic Validation

7. Implementation

8. Evaluation

9. Discussion

10. Related Work

附录：

公告