Adversarially Robust Generalization Requires More Data
@article{schmidt2018adversarially,
title={Adversarially Robust Generalization Requires More Data},
author={Schmidt, Ludwig and Santurkar, Shibani and Tsipras, Dimitris and Talwar, Kunal and Madry, Aleksander},
pages={5014--5026},
year={2018}}
概
本文在二分类高斯模型和伯努利模型上分析adversarial, 指出对抗稳定的模型需要更多的数据支撑.
主要内容
高斯模型定义: 令\(\theta^* \in \mathbb{R}^n\)为均值向量, \(\sigma >0\), 则\((\theta^*, \sigma)\)-高斯模型按照如下方式定义: 首先从等概率采样标签\(y \in \{\pm 1\}\), 再从\(\mathcal{N}(y \cdot \theta^*, \sigma^2I)\)中采样\(x \in \mathbb{R}^d\).
伯努利模型定义: 令\(\theta^* \in \{\pm1\}^d\)为均值向量, \(\tau >0\), 则\((\theta^*, \tau)\)-伯努利模型按照如下方式定义: 首先等概率采样标签\(y \in \{\pm 1\}\), 在从如下分布中采样\(x \in \{\pm 1\}^d\):
分类错误定义: 令\(\mathcal{P}: \mathbb{R}^d \times \{\pm 1\} \rightarrow \mathbb{R}\)为一分布, 则分类器\(f:\mathbb{R}^d \rightarrow \{\pm1\}\)的分类错误\(\beta\)定义为\(\beta=\mathbb{P}_{(x, y) \sim \mathcal{P}} [f(x) \not =y]\).
Robust分类错误定义: 令\(\mathcal{P}: \mathbb{R}^d \times \{\pm 1\} \rightarrow \mathbb{R}\)为一分布, \(\mathcal{B}: \mathbb{R}^d \rightarrow \mathscr{P}(\mathbb{R}^d)\)为一摄动集合. 则分类器\(f:\mathbb{R}^d \rightarrow \{\pm1\}\)的\(\mathcal{B}\)-robust 分类错误率\(\beta\)定义为\(\beta=\mathbb{P}_{(x, y) \sim \mathcal{P}} [\exist x' \in \mathcal{B}(x): f(x') \not = y]\).
注: 以\(\mathcal{B}_p^{\epsilon}(x)\)表示\(\{x' \in \mathbb{R}^d|\|x'-x\|_p \le \epsilon\}\).
高斯模型
upper bound
定理18: 令\((x_1,y_1),\ldots, (x_n,y_n) \in \mathbb{R}^d \times \{\pm 1\}\) 独立采样于同分布\((\theta^*, \sigma)\)-高斯模型, 且\(\|\theta^*\|_2=\sqrt{d}\). 令\(\hat{w}:=\bar{z}/\|\bar{z}\| \in \mathbb{R}^d\), 其中\(\bar{z}=\frac{1}{n} \sum_{i=1}^n y_ix_i\). 则至少有\(1-2\exp(-\frac{d}{8(\sigma^2+1)})\)的概率, 线性分类器\(f_{\hat{w}}\)的分类错误率至多为:
定理21: 令\((x_1,y_1),\ldots, (x_n,y_n) \in \mathbb{R}^d \times \{\pm 1\}\) 独立采样于同分布\((\theta^*, \sigma)\)-高斯模型, 且\(\|\theta^*\|_2=\sqrt{d}\). 令\(\hat{w}:=\bar{z}/\|\bar{z}\| \in \mathbb{R}^d\), 其中\(\bar{z}=\frac{1}{n} \sum_{i=1}^n y_ix_i\). 如果
则至少有\(1-2\exp(-\frac{d}{8(\sigma^2+1)})\)的概率, 线性分类器\(f_{\hat{w}}\)的\(\ell_{\infty}^{\epsilon}\)-robust 分类错误率至多为\(\beta\).
lower bound
定理11: 令\(g_n\)为任意的学习算法, 并且, \(\sigma > 0, \epsilon \ge 0\), 设\(\theta \in \mathbb{R}^d\)从\(\mathcal{N}(0,I)\)中采样. 并从\((\theta,\sigma)\)-高斯模型中采样\(n\)个样本, 由此可得到分类器\(f_n: \mathbb{R}^d \rightarrow \{\pm 1\}\). 则分类器关于\(\theta, (y_1,\ldots, y_n), (x_1,\ldots, x_n)\)的\(\ell_{\infty}^{\epsilon}\)-robust 分类错误率至少为
伯努利模型
upper bound
令\((x, y) \in \mathbb{R}^d \times \{\pm1\}\)从一\((\theta^*, \tau)\)-伯努利模型中采样得到. 令\(\hat{w}=z / \|z\|_2\), 其中\(z=yx\). 则至少有\(1- \exp (-\frac{\tau^2d}{2})\)的概率, 线性分类器\(f_{\hat{w}}\)的分类错误率至多为\(\exp (-2\tau^4d)\).
lower bound
引理30: 令\(\theta^* \in \{\pm1\}^d\) 并且关于\((\theta^*, \tau)-伯努利模型\)考虑线性分类器\(f_{\theta^*}\),
\(\ell_{\infty}^{\tau}\)-robustness: \(f_{\theta^*}\)的\(\ell_{\infty}^{\tau}\)-robust分类误差率至多为\(2\exp (-\tau^2d/2)\).
\(\ell_{\infty}^{3\tau}\)-nonrobustness: \(f_{\theta^*}\)的\(\ell_{\infty}^{3\tau}\)-robust分类误差率至少为\(1-2\exp (-\tau^2d/2)\).
Near-optimality of \(\theta^*\): 对于任意的线性分类器, \(\ell_{\infty}^{3\tau}\)-robust 分类误差率至少为\(\frac{1}{6}\).
定理31: 令\(g_n\)为任一线性分类器学习算法. 假设\(\theta^*\)均匀采样自\(\{\pm1\}^d\), 并从\((\theta^*, \tau)\)-伯努利分布(\(\tau \le 1/4\))中采样\(n\)个样本, 并借由\(g_n\)得到线性分类器\(f_{w}\).同时\(\epsilon < 3\tau\)且\(0 < \gamma < 1/2\), 则当
\(f_w\)关于\(\theta^*, (y_1,\ldots, y_n), (x_1,\ldots, x_n)\)的期望\(\ell_{\infty}^{\epsilon}\)-robust 分类误差至少为\(\frac{1}{2}-\gamma\).