Abstract
背景:大多数defenses against adversarial立刻被证明不可用
本文: review on defenses against adversarial attacks
Task: 1. method 2. commonly adopted best practices 3. common pitfalls
1. Intro
2. Principles of Rigorous Evaluations
2.1 Defense Research Motivation
2.2 Threat Models
2.2.1 Adversary Goals
2.2.2 Adversarial Capabilities
2.2.3 Adversary Knowledge
2.3 Restrict Attacks to the Defense's Threat Model
2.4 Skepticism of Results
2.5 Adaptive Adversaries
2.6 Reproducible Research: Code and Pre-trained Models
3. Specific Recommendations: Evaluation Checklist
3.1 Common Severe Flaws
3.2 Common Pitfalls
3.3 Special-Case Pitfalls
4. Evaluation Recommendations
4.1 Investigate Provable Approaches
4.2 Report Clean Model Accuracy
4.3 Focus on the Strongest Attacks Possible
4.4 Apply Gradient-Free Attacks
4.6 Properly Ensemble over Randomness
4.7 Approximate Non-Differentiable Layers
4.8 Verify Attack Convergence
4.9 Carefully Investigate Attack Hyperparameters
4.10 Test General Robustness For General-Purpose Defenses
4.11 Try Brute-Force(Random) Search Attacks
4.12 Targeted And UnTargeted Attacks
4.13 Attack Similar-But-Undefended Models
4.14 Validate Any New Attack Algo Introduced
5. Analysis Recommendation
5.1 Compare Against Prior Work
5.3 Generate An Accuracy Versus Perturbation Curve
5.5 Investigate Domains Other Than Images
5.6 Report The Per-Example Attack Success Rate
5.7 Report Attacks Applied
6 Conclusion