期中寄随机算法

截止到期中复习

定理：

Schwartz-Zippel Algorithm

Claim: 对于域 $\mathscr{F}$ 上的多元多项式 $P$ 和变元 $x_1,x_2...x_n\in |S|$，当 $x_i$ 在 $S$ 中均匀随机时，$\textrm{Pr}[P(x_1,x_2...x_n)=0]\le \frac{d}{|S|}$，其中 $d$ 表 $P$ 的最高次幂。

Markov's Inequality $\textrm{Pr}[Z\ge \alpha \mathbb{E}[Z]]\le \frac{1}{\alpha}$ （平均值概念）

Chebyshev's Inequality $\textrm{Pr}[|X-EX|\ge \alpha]\le \frac{D(X)}{\alpha^2}$

Union Bound $\textrm{Pr}[A_1\cup A_2...\cup A_n]\le \sum \textrm{Pr}[A_i]$

Chernoff Bound $\textrm{Pr}[X\le \mu- \lambda]/\textrm{Pr}[X\ge \mu+\lambda]\le \exp(-\frac{2\lambda^2}{n})$ （used mostly）

L2

Schwartz-Zippel Algorithm

Proof：归纳证明，$n=1$ 是自然的；

$n\ge 2$ 时，设 $P(x_1,x_2...x_n)=Q(x_1...x_{n-1})x_{n}^{k}+P'(x_1...x_{n})$ 其中 $k$ 为带 $x_n$ 项的最高次幂，则 $Q$ 为次数不超过 $d-k$ 的多项式，我们不妨假定 $x_1...x_{n-1}$ 先均匀随机则：

$Q=0$ 这类情况发生的概率不超过 $\frac{d-k}{|S|}$
$Q\neq 0$，此时 $P=ax_n^k+f(x_{n})$，等于 $0$ 的概率不超过 $\frac{k}{|S|}$

综上 $\textrm{Pr}[P(x_1,x_2...x_n)=0]\le \frac{d}{|S|}$

二分图完美匹配

等效于检查矩阵 $A$ 是否有 $\det(A)\neq 0$，其中若边 $(i,j)$ 存在设 $a_{i,j}=x_{i,j}$ 否则为 $0$ 随机赋值后进行 check 即可。

Fast Parallel Algorithms for Finding a Perfect Matching

鸽

L3

对于两个 $n-$bit 大数 $a,b$ 检查 $a=b?$

随机选择素数 $p\in \{2,3...T\}$ 并检查 $a\equiv b\pmod p$

错误率：$p|(a-b)$，设 $m=|a-b|$，则 $m$ 至多有 $n$ 个素因子。错误率不超过 $\frac{n}{\pi(T)},(\pi(x)\sim \frac{x}{\ln x})$

素数检测

Fermat’s Little Theorem: if $p$ is a prime, $a^{p-1}\equiv 1\bmod p$ for all $a\in \{1,2...p-1\}$

定理：若 $n$ 非卡默尔数，则 $\textrm{Pr[Error in the Fermat Test]}\le \frac{1}{2}$

Let $S_n=\{a^{n-1}\equiv 1\bmod n\}$ be the set of bad numbers, since $n$ isn't a Carmichael number $S_n$ is a subgroup of $\{1,2...n\}\Rightarrow |S_n|\le \frac{n}{2}$

L4

Definition: $x^2\equiv a\pmod p$ then we say $x$ is a square root of $a$

Claim: For a prime $p>2$, if $a$ has square root then $a$ has exactly two square roots $(g^{j},g^{j+\frac{p-1}{2}})$.

我们知道对于素数而言，费马小定理以及 $1$ 仅有两个平凡平方根成立，一个素性检测算法 $(MR)$ 如下：

随机 $x$
检查 $x^{\frac{p-1}{2^k}},x^{\frac{p-1}{2^{k-1}}}...x^{p-1}$ 其中尾项必然为 $1$，往前推有 $c^2=1$，检查 $c$ 是否为 $\pm 1$

Claim: If $n$ is odd, composite, and not a prime power, then $\textrm{Pr}[x\textrm{ is a witness}]\ge \frac{1}{2}$

Proof：核心思路是将 $n$ 拆成 $p\times q$ 加以讨论，此时 $\bmod p$ 和 $\bmod q$ 下出现 $1,-1$ 则出现了 witness，而同时 $(1,1)/(-1,-1)$ 的 pair 对子数不超过 $\frac{n}{2}$

L5 概率方法

给定任意平面图 $G=(V,E)$ 定义 $c(G)$ 为合理摆放下图的最小交叉边数。

$c(G)\ge |E|-3|V|+6$

如何提高这个界：

$\begin{aligned} &\mathbb{E}(c(G))\ge \mathbb{E}(E)-3\mathbb{E}(V)+6 \\&c(G)p^4\ge \mathbb{E}(c(G))\ge|E|p^2-3|V|p+6 \\&c(G)\ge \frac{|E|p^2-3|V|p+6}{p^4} \end{aligned}$

调适当的 $p$ 最大化，$c(G)\ge \frac{|E|^3}{64|V|^2}$

Unbalancing lights

给出 $n\times n$ 的 $0,1$ 矩阵 $\{a_{i,j}\}$，存在 $2n$ 个开关 $x_i,y_j=0/1$，定义 $b_{i,j}=a_{i,j}\oplus x_i\oplus y_j$ 希望最小化 $D=\sum b_{i,j}$

Claim: 对于任意给定的 $\{a\}$ 存在一种调整开关的方案使得 $D=\Omega(\frac{n^2}{2}+\sqrt{\frac{1}{2\pi}}\cdot n^{3/2})$

先随机调整行开关，现在对于每一列考虑，令 $X_{i,j}=\pm 1$ 表此开关状态为 on/off

let $Z_i=\sum X_{i,j},\mathbb{E}[|Z_i|]\sim \sqrt{\frac{2}{\pi}}\cdot \sqrt{n}$ 由伯努利分布得知。

最后，对于每一列，我们将开关打开，如果这样可以使得亮着的灯变多，意味着 $Z_i'=|Z_i|$，从而差值为 $n\cdot \mathbb{E}[|Z_i|]=\sqrt{\frac{2}{n}}\cdot n^{3/2}$

Graphs with large girth and chromatic number

L6

Markov's Inequality $\textrm{Pr}[Z\ge \alpha \mathbb{E}[Z]]\le \frac{1}{\alpha}$ （平均值概念）

3-CNF

\[(x1∨x2∨x3)∧(x4∨x5∨x3)∧(¬x4∨¬x5∨x3) \]

如上被称为一个 3CNF $\phi$，一个考虑是询问最多有多少个 clauses 可以被满足。

又被称为 MAX-3SAT 问题。

定理：对于任何一个 3CNF $\phi$ 存在一种赋值方案使得至少 $\frac{7}{8}$ 的 clauses 被满足。

Proof 考虑随机赋值，则期望上被满足的个数为 $\frac{7}{8}I$。

如何构造一个 clauses：

考虑按 $i=1,2...n$ 的顺序决策，对于某个 $i$ 决策 $x_i=T/F$，此时决策过程可以被树形结构描述，根节点等于两个儿子的加和平均，从而每次选择其中较大的儿子（注意到条件概率总是可以计算的），可以 poly n 的构造一个解。

单调多数决策电路

定理：存在大小为 $poly(n)$ 深度为 $\mathcal O(\log n)$ 的单调多数决策电路。
- 多数决：$Maj(x_1...x_n)=1$ 当且仅当超过半数为 $1$
- 单调电路：仅由单调门构成，and，or 。

概率方法证明：

证明存在性等效于对于 $2^n$ 可能的输入，电路回答均正确的概率不为 $0$

等效于 $2^n$ 可能的输出，任一出错的概率小于 $1$，By Union Bound，只需要单个出错的概率小于 $2^{-n}$ 即可。这里目标是做到 $2^{-n-1}$

我们首先构造 $Maj_3$ 单调决策电路，层数为 $D=k\log n$，每层节点均为 $Maj_3$ 单调决策电路，设这样的树为 $C$，则其大小为 $poly(n)$，深度为 $O(\log n)$

令 $p_t$ 表第 $t$ 层的 gate 输出 $1$，则有：

\[p_{t+1}=f(p_t)=p_t^3+3p_t^2(1-p_t) \]

底层则采用随机连的方案（虽然我对于 $p=\frac{1}{2}$ 的情况的返回颇感疑惑）$p_0=\frac{1}{2}-\frac{1}{2n}/p_0=\frac{1}{2}+\frac{1}{2n}$ 通过数学计算可以得到 $p_t$ 放大的结果从而决定 $k$。

L7

Variance and the second moment method

方差 $D(X)=E(X^2)-E(X)^2$

Chebyshev's Inequality $\textrm{Pr}[|X-EX|\ge \alpha]\le \frac{D(X)}{\alpha^2}$

Proof ：Let $Y=(X-EX)^2$ then we have $\textrm{Pr}[|X-EX|\ge \alpha]=\textrm{Pr}[Y^2\ge \alpha^2]\le \frac{D(X)}{\alpha^2}$

Thresholds in random graphs

$\mathscr{G}_{n,p}$ 表 $n$ 个点，每条边以 $p$ 的概率出现/不出现的随机图 $G$，$\mathbb{E}(\textrm{number of edges})=p\binom{n}{2}$ and $\mathbb{E}(\textrm{deg of vertex})=p(n-1)$

Some question:

$G$ contain a $4-$clique

Let $X$ be the number of $4-$clique in $G$ and we have:

$X=\binom{n}{4}p^6=\theta(n^4p^6)$

we will see:

if $p\ll n^{-2/3}$ then $E[X]\to 0$
if $p\gg n^{-2/3}$ then $E[X]\to \infty$

我们说 $p=p(n)$ 为一个分解值 "threshold" 对于一个性质 $Q$（在本例中为图中存在4-clique）当：

\[p\ll p(n)\Rightarrow \textrm{Pr}[\mathscr{G}_{n,p} \textrm{ has } Q]\to 0,n\to \infty \]

\[p\gg p(n)\Rightarrow \textrm{Pr}[\mathscr{G}_{n,p} \textrm{ has } Q]\to 1,n\to \infty \]

Claim: $p(n)=n^{-2/3}$ is a threshold for $Q=$ "$G$ exist at least one 4-clique"

Proof:

$p\ll p(n)\Rightarrow E[X]\to 0$, Note that $\textrm{Pr}[X>0]=\textrm{Pr}[X\ge 1]\le \mathbb{E}[X]\to 0$

and for $p\gg p(n)$ we have $\textrm{Pr}[X-E[X]\ge E[X]]\le \frac{D(X)}{E[X]^2}=\frac{(1+o(1))E(X)^2-E(X)^2}{E(X)^2}=o(1)\to 0$ by chebyshev's inequality.

The main idea to calculate $D(X)$ is :

\[D(\sum X_i)=\sum D(X_i)+\sum_{i\neq j}\textrm{Cov}(X_i,X_j) \]

Discuss: Behavior at the threshold

For $p=cn^{-2/3}$ let $X$ be the number of 4-cliques.Then $X$ is an asymptotically Poisson with parameter $c^{6}/24$（随着趋向无穷服从参数 $\frac{c^4}{24}$ 的泊松分布）

Poisson $\textrm{Pr}[X=k]=e^{-\lambda}\frac{\lambda^k}{k!}$ implies that $\textrm{Pr}[X>0]\to 1-e^{-c^6/24}$

最大联通子图

结论：令 $M(\mathscr{G}_{n,p})$ 表图 $G$ 的最大联通子图的大小，设 $p=\frac{c}{n}$，则有 $\mathbb{E}(M(\mathscr{G}_{n,p}))$ 几乎一定是：

\[\begin{cases} \Theta(\log n)&c<1 \\\Theta(n^{2/3})&c=1 \\\Theta(n)&c>1 \end{cases}\]

L8

？

L9

？

L10

Pairwise independent random variables

$n$ 个变量 $S=\{X_1,X_2...X_n\}$ 满足 $k-$ 独立定义为：

$\forall I\subseteq S,|I|\le k$ have $\textrm{Pr}[X_{i}=a_i|i\in I]=\prod_{i\in I} \textrm{Pr}[X_i=a]$

现在存在语言 $L\subseteq \{0,1\}^n$ ，$A$ 是一个鉴别文本 $e\in L$ 的鉴别算法。

假定算法 $A$ one-side error，且 $P(A\textrm{~get ~right~}|f\in L)\ge \frac{1}{2}$，此时我们希望得到一个错误率不超过 $2^{-t}$ 的算法只需要 run $A$ $t$ 次，这样需要 $t$ 个随机生成的 bit，假定随机数是昂贵的资源，下文将给出一种有效的生成方法。

对于 $r\le 2^{m}$ 我们可以得到 $P[$error$]\le \frac{1}{r}$ 的算法，仅用 $2m$ 个 random bit。
Proof：
- 考虑 $X_i$ 为 $\{0,1\}^m$ 表 $A$ 的返回结果，可视为一个 coin flip sequence，具体证明通过马可夫不等式，且利用单边结果，只要存在 1 的返回值就认为 yes。
通过 $2m$ 个随机变量生成若干 $(2^m)$ 两两独立的随机变量。
- $q\in [2^m,2^{m+1}]$
- 则 $\{f_{a,b}(x)|x\in Z_q\}$ 为两两独立的随机变量。
- 随机 $a,b$，对于固定的 $x,z$，$\textrm{Pr}[f_{a,b}(x)=z]=\frac{1}{q}$，同时有 $\textrm{Pr}[f_{a,b}(x)=z_1\vee f_{a,b}(y)=z_2]=\frac{1}{q^2}$
- 此时通过 $2m$ 个随机 bit 得到了 $2^m$ 个 $m$ bit 的两两独立的随机bit。
一些更严格的版本（用于通过 $dm$ 个 bits 生成 $2^m$ 个 m-bits $d-$wise independent 变量）：
Notice $\mathbb{F}_{2^m}\sim (\mathbb{F}_{2})^m$, we only need to deal with $m\ge 2$. Set $\mathbb{F}=G\bmod{(x^m+x+1)}$ where $G$ is the field of polynomial such as $\sum a_ix^i,a_i\in [0,1](\textrm{a polynomial on } \mathbb{F}_2)$. Since $x^m+x+1$ is an irreducible polynomial we know that $\mathbb{F}$ is a field.
Not hard to check $\mathbb{F}\sim (\mathbb{F}_2)^m$. Define $\varphi(\cdot):\mathbb{F}_{2^m}\to \mathbb{F}, \varphi(x)=\sum_{i=0}^{m-1} \bigg(\lfloor\frac{x}{2^i}\rfloor\bmod 2\bigg)x^i$, and $\varphi^{-1}(\sum a_ix^i)=\sum a_i2^i$. this time we randomly choose $d_i\in [0,2^m)$ for $i=0,1,2...d-1$ using $dm$ random bits. Let $f(i)=\varphi^{-1}(\sum_{j=0}^{d-1}\varphi(d_j)\cdot\varphi(i)^j)$. Then we claim that $\{f(i)|i\in [0,2^m)\}$ are $d-$wise independent random variables.
To proof the claim, firstly we proof that Pr$[f(x)=y]=\frac{1}{2^m}$ for any $x,y$: let $x'=\sum_{j=1}^{d-1} \varphi(d_j)\cdot \varphi(i)^j$ then $f(x)=y$ iff $\varphi(d_0)+x'=\varphi(y)$, meaning there exist only one solution. So Pr$[f(x)=y]=\frac{1}{2^m}$
Let's proof that for any $\{x_1,x_2...x_k\},\{y_1,y_2...y_k\}(k\le d)$ we have $\textrm{Pr}\bigg(x_1=y_1,x_2=y_2...x_k=y_k\bigg)=\frac{1}{2^{km}}$ thus the claim is right. Let $X=\begin{bmatrix}\varphi(d_1)\\\varphi(d_2)\\...\\\varphi(d_d)\end{bmatrix},Y=\begin{bmatrix}\varphi(y_1)\\\varphi(y_2)\\...\\\varphi(y_k)\end{bmatrix},A=\begin{bmatrix}\varphi(x_1)^0&\varphi(x_1)^1&...&\varphi(x_1)^d\\\varphi(x_2)^0&...\\...\\\varphi(x_k)^0&\varphi(x_k)^1&...&\varphi(x_k)^d\end{bmatrix}$ then we have $AX=Y$. If all lines on $A$ are independent then the matrix can be describe as $k$ equation on $\varphi(d_1)...\varphi(d_d)$ meaning there are $d-k$ variables free, thus we have Pr$(...)=\frac{1}{2^{km}}$. To proof all lines on $A$ are independent we only need to proof $\det A\neq 0$ when $k=d$ for any $\{x_1...x_d\}$. Since $\det A=\prod_{i\neq j} (x_i-x_j)$ and $x_i$ are distinct we have $\det A\neq 0$. Q.E.D.

去随机化

对于顶点个数为 $n\le 2^{k/2}$ 的完全图，寻找边的 $2-$ 染色方案使得任何一个 $k-$完全子图都不是单色的。

存在性证明：$\mathbb{E}[X]=\binom{n}{k}(2\times\frac{1}{2^{\binom{k}{2}}})\to\frac{n^{k}}{k!}2^{-k(k-1)/2+1}\le \frac{2^{k/2+1}}{k!}<1$
构造性证明：排列好边的顺序，每次选择使得期望 $X$ 更小的染色方案。

注意到存在性证明方法中只需要边之间的染色为 $d-$wise 独立，从而只需要 $d$ 个 $\log q$ 长度的随机 bit 即可生成，$q=\binom{n}{2}$

随机哈希

鸽

L11

Unbiased Estimators

给定平面点集 $A,R$ 有 $R\subseteq A$，现在希望计算 $R$ 的面积，或者在 $A,R$ 有限的情况（且 $|A|$ 已知）的情况下计算 $R$ 的大小。

一种蒙特卡洛算法是随机 $k$ 次每次从 $A$ 中随机选择一个点 $x$ 并检查 $x\in R$

令 $p_1,p_2...p_k$ 为独立随机的选点，令 $X_i=[p_i\in R]$，$X=\frac{1}{k}\sum X_i$，则有 $E[X]=E[X_i]=\frac{|R|}{|A|}$

我们称 $X_i$ 为 $\mu=\frac{|R|}{|A|}$ 的无偏估计吗，按定义有 $X$ 亦然。

另一边，对 $X$ 的方差估计：由 $X_i$ 互相独立得 $Cov(X_i,X_j)=0$ 另一边得出 $D(X)=\frac{1}{t}D(X_i)$

无偏估计定理：$E[X_i]=\mu,D(X_i)=\sigma^2$，可通过 $T=\mathcal O((\frac{\sigma}{\mu})^2\frac{1}{\epsilon^2}\log (\frac{1}{\delta}))$ 个样本 $X_1,X_2...X_t$ 构造一个 $\mu$ 的估计 $X$ 且 $\textrm{Pr}[|X-\mu|\ge \epsilon \mu]\le \delta$
弱引理 Lemma 11.2：当 $t\ge \frac{4}{\epsilon^2}\cdot \frac{\sigma^2}{\mu^2}$ 时有 $\textrm{Pr}[|X-\mu|\ge\epsilon\mu]\le \frac{1}{4}$
Proof by 二阶矩方法：$$\textrm{Pr}[|X-\mu|\ge \epsilon \mu]\le \frac{D(X)}{(\epsilon\mu)^{2}=\frac{\frac{1}{T}D(X_i)}{(\epsilon\mu)}2}\le \frac{\epsilon^2\mu2\sigma^2}{4\sigma2(\epsilon\mu)^2}=\frac{1}{4}$$
当 $X_i$ 为二元变量时，有 $\sigma=E[X^2]-E[X]^2=\mu-\mu^2\Rightarrow\delta^2\le \mu$，从而重复次数仅需 $O(\frac{1}{\epsilon^2\mu})$ 次。

Median trick: 考虑执行上述弱引理重复 $t'$ 次，每次通过参数 $t$ 得到 $X_i'$，令 $X'=(X_1',X_2'...X_{t'}')$ 中的中位数，则当 $t'>2\log_{(4/3)}\delta^{-1}$ 时有 $\textrm{Pr}[|X'-\mu|>\epsilon\mu]\le \delta$

若 $X_i'\in [\mu(1-\epsilon),\mu(1+\epsilon)]$，视同抛掷一枚硬币且其正面朝上，否则反面朝上，此时 $\textrm{Pr}[...]\le \frac{1}{4}$，中位数正面朝上等同于至少一半的硬币正面朝上。
Lemma11.2 : $2s+1$ 枚有偏硬币正面朝上概率超过 $\frac{3}{4}$，则不超过 $s$ 枚硬币正面朝上的概率小于 $(\frac{3}{4})^s$，详见作业3。

DNF Problem

DNF: 定义表达式 $\varphi$ 接受 $n$ 个 01 变量 $x_i$ 且形如 $\bigvee(\wedge x_{f_{i,j}})$
Counting number of $(x_1,x_2...x_n)$ such that $\varphi=1$ called DNF problem
FRPAS defined as for input $(n,\varphi)$ output $Z$ such that $\textrm{Pr}[(1-\epsilon)f(x)\le Z\le(1+\epsilon)f(x)]\ge \frac{3}{4}$ and runs in poly $(n,\frac{1}{\epsilon})$
定理：DNF 存在一个 FRPAS

Naive 蒙特卡洛：随机赋值 $x_i$ 并检测通过概率，概率 $\mu=\frac{R}{A},R=f(x),A=2^n$，由 Lemma 11.2 我们需要进行 $O(\frac{1}{\epsilon^2\mu})$ 次检查。

上述算法事实上相当于对于 $\varphi$ 的第 $i$ 个表达式记 $S_i$ 表示 fix 其的集合，求解 $|\bigcup S_i|$。

优化：以正比于 $\frac{|S_i|}{\sum |S_i|}$ 的概率抽选一个集合 $S_i$，然后随机抽一个成员 $a$，如果其在 $i$ 中第一次出现（即不存在更小的 $j$ 使得 $a$ fix term $j$），则 $U\gets U+1$，输出 $X'=U\times \sum|S_i|$

注意到此蒙特卡洛法成功的概率至少是 $\frac{1}{m}$（其中 $m$ 为表达式的个数），从而由无偏估计理论，我们只需要检查重复 $\frac{4m}{\epsilon^2}$ 次即可。

L12

Network Reliability

Prob $F$ : 给定图 $G$，每条边有 $p$ 的概率 fail，求解 $\textrm{Pr}[G$ is disconnected$]$

FRPAS defined as for input $(G,\{p_e\},\epsilon)$ output $Z$ such that $\textrm{Pr}[(1-\epsilon)p_{\rm{fail}}\le Z\le(1+\epsilon)p_{\rm{fail}}]\ge \frac{3}{4}$ and runs in poly $(n,\frac{1}{\epsilon})$
定理：$F$ 存在一个 FRPAS

Work:

先对问题进行高度简化，set $p_e=p,e\in E$ 即各边 fail 概率相同的情况，此时设 $c$ 为最小割，我们有 $p_{\rm fail}\ge p^c$

If $p^c\ge \frac{1}{n^4}$ 那么蒙特卡洛模拟法可以获得有效的近似解，此时每轮我们以 $p$ 的概率移除每条边，如果第 $i$ 轮图不联通那么 $X_i=1$，否则为 $0$，则 $X_i$ 为 $p_{\rm fail}$ 的无偏估计，由无偏估计弱引理，我们仅需要 $\mathcal O(\frac{1}{\epsilon^2\mu})=\mathcal O(n^4\epsilon^{-2})$ 次实验
否则若 $p^c<\frac{1}{n^4}$，那么对于 $\alpha=2+\frac{1}{2}\log_n(2/\epsilon)$ 我们可以证明：
1. $\textrm{Pr[some cut of size}\ge\alpha c \textrm{ fails}]\le \epsilon p^c\le \epsilon p_{\textrm{fail}}$ （基于此我们可以选择忽略图中的大割）
称图中的割为 $\alpha-$ minimum cuts 当且仅当其大小不超过 $\alpha c$
1. Claim 12.3: 图中至多有 $n^{2\alpha}=\frac{2n^4}{\epsilon}$ 个 $\alpha-$minimum cuts
2. $\textrm{Pr[some }\alpha-\textrm{minimum cuts fail}]=\textrm{Pr}[\bigvee_{i=1}^t(\wedge_j x_{e_{i,j}})]$ 即 DNF problem，其中 $t\le \frac{2n^4}{\epsilon}$，由 DNF Problem 的 FRPAS 我们有 check 至多 $O(\frac{t}{\epsilon^2})$ 次。
鸽，main idea is use this to proof:

while |V | > 2 do
  Choose an edge {u,v} ∈ E uniformly at random.
  Merge u and v, maintaining all edges from either of them to other vertices.
Return the remaining cut.

L13

听说没讲？

L14

Chernoff/Hoeffding Bounds

生肉

Chernoff 界：

定理 $X_1...X_n$ 为独立的 $0-1$ 随机变量，$E[X_i]=p_i$
- $X=\sum_i X_i,\mu = E[X]=\sum p_i,p=\frac{\mu}{n}$
- $D_{KL}(P||Q)=\sum_i p_i\log (\frac{p_i}{q_i})$
- $Pr[X\ge \mu+\lambda]\le \exp(-nD_{KL}(p+\frac{\lambda}{n}||p))$
- $Pr[X\le \mu-\lambda]\le \exp(-nD_{KL}(p-\frac{\lambda}{n}||p))$
- Proof:
  - $Pr[X\ge m]=Pr[e^{Xt}\ge e^{mt}]$

\[\begin{aligned} &Pr[e^{Xt}\ge e^{mt}]\le \frac{E(e^{Xt})}{e^{mt}}=\frac{E(e^{\sum X_it})}{e^{mt}}=\frac{\prod E(e^{X_it})}{e^{mt}} \\&\le \frac{\prod ((1-p_i)+p_ie^t)}{e^{mt}} \\&\le \frac{(1-p+pe^t)^n}{e^{mt}} \end{aligned}\]

计算导数求零点：$t=\ln(\frac{m(1-p)}{(n-m)p})$

回代：

\[\begin{aligned} &\frac{(1-p+p\frac{m(1-p)}{(n-m)p})^n}{\Big(\frac{m(1-p)}{(n-m)p}\Big)^m} \end{aligned}\]

推论：$Pr[X\le \mu\pm \lambda]\le \exp(-\frac{2\lambda^2}{n})$

推论：$2m+1$ 有偏硬币，$Pr[正面]\ge \frac{3}{4}\Rightarrow Pr[\le m 正面]\le (\frac{3}{4})^m$

熟肉：

设 $X_1...X_n$ 为 $0-1$ 随机变量，且 $E[X_i]=\mu_i,D[X_i]=\sigma_i^2$

引理：

$\textrm{Pr}[X\ge a]=\textrm{Pr}[e^{sX}\ge e^{sa}],s\ge 0$
$\textrm{Pr}[X\le a]=\textrm{Pr}[e^{sX}\ge e^{sa}],s\le 0$

对于 $\exp(sX_i)$ 则有：$p_i$ 概率为 $\exp(s)$，$1-p_i$ 概率为 $1$，则 $M_i(s)=E[e^{sX_i}]=p_i\exp(s)+(1-p_i)$

Chernoff bound：

$X=\sum_i X_i,\mu = E[X]=\sum p_i,p=\frac{\mu}{n}$
- $D_{KL}(P||Q)=\sum_i p_i\log (\frac{p_i}{q_i})$
- $Pr[X\ge \mu+\lambda]\le \exp(-nD_{KL}(p+\frac{\lambda}{n}||p))$
- $Pr[X\le \mu-\lambda]\le \exp(-nD_{KL}(p-\frac{\lambda}{n}||p))$

下面证明 Chernoff bound

\[\begin{aligned} &\textrm{Pr}[X\ge \mu+\lambda]=\textrm{Pr}[e^{Xs}\ge e^{s(\mu+\lambda)}] \\&\le \frac{\mathbb{E}[e^{Xs}]}{e^{s(\mu+\lambda)}} \\&= e^{-s(\mu+\lambda)}\prod(p_i\exp(s)+(1-p_i)) \\&\le e^{-s(\mu+\lambda)}\bigg(\frac{\mu\exp(s)+n-\mu}{n}\bigg)^n \\&=e^{-s(\mu+\lambda)}\cdot (1-p+pe^{s})^n \end{aligned}\]

通过求导回代可以解得最优的 $s$ 使得上式最小为 $\exp(-nH_p(p+\frac{\lambda}{n}))$

进一步简化

$\textrm{Pr}[X\le \mu- \lambda]/\textrm{Pr}[X\ge \mu+\lambda]\le \exp(-\frac{2\lambda^2}{n})$

只需要证：

\[-nH_p(p+\frac{\lambda}{n})\le -\frac{2\lambda^2}{n} \]

\[n(p+\frac{\lambda}{n})\ln(\frac{p+\lambda/n}{p})+n(1-p-\frac{\lambda}{n})\ln(\frac{1-p-\lambda/n}{1-p})-\frac{2\lambda^2}{n}\ge 0 \]

令 $z=\frac{\lambda}{n}\in [0,1-p]$ 则改为：

\[(p+z)\ln(\frac{p+z}{p})+(1-p-z)\ln(\frac{1-p-z}{1-p})-2 z^2\ge 0 \]

后者总是成立。since $f(0)=0$ and $f'(0)=0$ and $f'(0)$ 单调。

Simple Examples

应用：coin flip 估计

Randomized Routing

考虑 $n$ 维超立方体定义的网络，顶点 $v_i=\{0,1\}^n,N=2^n$，每条边双向，单方向一条边只能经过一个数据包
$\pi$ 为 $N$ 的一个排列，$i$ 存在向 $\pi(i)$ 发送数据包的任务。
设计一条路线最小化运输所有数据包的时间。
- 简单路线：每条路线设计独立，只取决于起点 $i$、终点 $\pi(i)$

$i\oplus \pi(i)$

随机中转：$i\to \sigma(i)\to \pi(i)$
bit fix path: $i\oplus \sigma(i)$，从左往右依次修复每一位。

posted @ 2024-04-05 17:08 Soulist 阅读(48) 评论(1) 编辑收藏举报

刷新页面返回顶部

Soulist

—现在的每一步都决定着最后的结局，我迈向的是自己选定的终点

期中寄随机算法

L2

L3

L4

L5 概率方法

L6

3-CNF

单调多数决策电路

L7

Variance and the second moment method

Thresholds in random graphs

L8

L9

L10

Pairwise independent random variables

去随机化

随机哈希

L11

Unbiased Estimators

DNF Problem

L12

Network Reliability

L13

L14

Chernoff/Hoeffding Bounds

公告

Soulist

—现在的每一步都决定着最后的结局，我迈向的是自己选定的终点

期中寄 随机算法

L2

L3

L4

L5 概率方法

L6

3-CNF

单调多数决策电路

L7

Variance and the second moment method

Thresholds in random graphs

L8

L9

L10

Pairwise independent random variables

去随机化

随机哈希

L11

Unbiased Estimators

DNF Problem

L12

Network Reliability

L13

L14

Chernoff/Hoeffding Bounds

公告

期中寄随机算法