Proj THUDBFuzz Paper Reading: FAUSTA: Scaling Dynamic Analysis with Traffic Generation at WhatsApp
Abstract
任务:an algorithmic traffic generation platform that enables analysis and testing at scale.
本文:FAUSTA
功能:
- ⽀持多种⽤例,例如可靠性测试、隐私分析和性能回归
- 它⽬前⽀持三种不同的算法输⼊⽣成策略,每⼀种不重放任何真实的⽤⼾数据
实验:We report on the development and deployment of FAUSTA’s reliability use case between September 2020 and August 2021.
- has found 1,876 unique reliability issues, with a fix rate of 74%
- 认为高覆盖率确实带来更多缺陷发现
1. Intro
FAUSTA(Fully-AUtomated Server Testing and Analysis)是⼀个平台,允许后端服务所有者加⼊他们⾃⼰的产品和⽤例。
使⽤了三种互补的算法策略:biased random generation有偏随机⽣成、Markovbased generation基于⻢尔可夫的⽣成 [8]和evolutionary search techniques进化搜索技术 [9]。
2. Background
A. Overview of Traffic Generation\
FAUSTA需要根据⽬标⽤例⽣成各种类型的流量。例如:真实的客⼾端流量(类似于⽣产)不切实际的流量(健壮性测试),带有Personally Identifiable Information (⼈⼯个⼈⾝份信息 (PII) 的合成流量
B. Traffic Specification
WhatsApp和 WhatsApp服务器之间的流量遵循 XMPP [13] 协议的变体
XMPP calls these messages stanzas
3. METHODOLOGY OVERVIEW
A. Design
输⼊由三种类型的配置组成: 1) 流量规范(规范) 2) 特定的⽤例知识,例如PII configs (which
denote sensitive fields of the specs) and stanzas distribution 3) ⽤于编码oracle和指⽰FAUSTA捕获的违规行为的策略
Such instrumented services run in a nonprod, controlled environment to prevent the instrumentation from introducing any side effects to production.
- FAUSTA takes these configurations as inputs.
- The spec parser parses traffic specs and sends them to data generators to initialize traffic.
- The strategies module optimizes the initialized traffic and sends it to a replayer which converts the generated traffic into a client readable format.
- The replayer sends traffic to instrumented services, which were compiled from automatically instrumented sources for coverage and stack trace profiling.
- The oracle detector observes services under test to gather program behaviors including call stack traces for reporting errors.
- The coverage collector gathers line level coverage and uses evaluated traffic properties as fitness for feedback into strategies.
- A report generator collects the output and raises diff or task signals, with detailed bug report including traces and reproduction steps to help developers debug.
B. Traffic Generator
C. Stochastic Model-based Generation
⻢尔可夫策略:在此策略中,我们将聚合的采样节数据收集到规范上的频率分布中,并构建⻢尔可夫链。频率分布是通过将传⼊的节与规范匹配并增加该规范的计数来获得的。
进化策略: FAUSTA还⽀持进化搜索,它能够优化多个冲突⽬标 [17]。我们通过引导进化搜索探索了流量最⼩化和覆盖最⼤化。在撰写本⽂时,该策略⽬前正在开发中,尚未完全部署。
D. Guided Flows
4. FAUSTA SYSTEM DEPLOYMENT
FAUSTA adopts a ‘shift-left’ software testing philosophy, which seeks to move testing effort early within the software development life-cycle.
A. CI Integration
FAUSTA已集成到 WhatsApp CI 管道中,⽤于在多个场景中进行动态分析和测试
在 Meta,这个系统是 Phabricator,它提 供了⼀套基于 Web 的协作软件开发⼯具,例如代码审查和存储库浏览器 (Diffusion)。差异需要由⾄少⼀名其他⼯程师审查和盖章。在差异提交 时,⽬标确定器执行变更影响分析。它分析差异变化并决定需要安排哪些 CI 作业。 如果 diff 仅涉及评论/⽂档并且对⽣产没有操作,则将安排最⼩的 CI
Once FAUSTA detects any issues, it performs fault localization based on collected stack trace to pin-point to a line that introduced the issue. This signal may get further boosted with other testing and verification job signals and get prioritized accordingly
根据FAUSTA和其他信号,diff 作者可能会多次重复上述过程,直到测 试和验证信号变为绿⾊。加上来⾃ diff reviewer 的印章,开发⼈员可以将 diff 合并到主⼲中。为了验证主⼲的健康状况,CI 会定期安排持续的测试和 验证作业,包括FAUSTA 的动态分析。健康的主⼲修订会触发进⼀步的持 续交付作业,以将更改发布给⽤⼾。
B. Continuous and Diff Testing
CI jobs may have various schedules includes continuous and diff.
- A scheduler triggers recurring dynamic analysis jobs for running FAUSTA hourly on master branch of the backend repository.
- FAUSTA synthesizes, taints (for taint analysis that tracks how PII flows through the SUT), and tracks the newly generated traffic based on configs (e.g., error patterns for reliability and PII annotations for privacy). FAUSTA records its synthesized traffic to allow reproduction.
- The dynamic analysis monitors code behaviors, where monitors differ per use case.
- A FAUSTA categorizer classifies unique issues and removes duplication. The categorization is based on stack trace analysis.
- A FAUSTA configurable signal filter automatically reduces signals based on configs (e.g., it excludes known false positive cases).
- The FAUSTA bot files tasks on newly detected issues. The report includes detailed stack traces, reproduction steps and subject metadata (e.g., repository commit hash) to help re-run and debug.
- We use a ‘human in the loop’ approach to determine when to escalate risky issues on developer communication channels, notifying service owners and stakeholders.
- Service owners check or triage the reported issues and fix accordingly.
C. Reporting Pipeline
Fault Categorization: unique issues, dedupe, and identify pre-existing ones.
Fault Localization: The localization extracts stack traces from dynamic analysis logs, parses stack trace frames and locates a line that relates to the diff changes, inline comments in Phabricator
Signal Boosting: prioritize signals and filter out false positives(该过程可以是配置的也可是学习后自当获得的).
Fix Detection: ,我们部署了修复检测器来跟踪哪些差异报告得到了修复,哪些没有
5. RELIABILITY TESTING
Q1: Does FAUSTA find real world reliability errors with its generated traffic? Do developers fix them?
Q2: What are the most common error types revealed by FAUSTA?
Q3: Does coverage improvement lead to more unique errors detected?