抄书 Richard P. Stanley Enumerative Combinatorics Chapter 2 Sieve Methods
2.1 Inclusion-Exclusion
Roughly speaking, a "sieve method" in enumerative combinatorics is a method for determining the cardinality of a set \(S\) that begins with a larger set and somehow subtracts off or cancels out unwanted elements. Sieve methods have two basic variations: (1) We can first approximate our answer with an overcount, and then subtract off an overcounted approximation of our original error, and so on, until after finitely many steps we have "converged" to the correct answer. This method is the combinatorial essence of the Principle of Inclusion-Exclusion, to which this section and the next four are devoted. (2) The elements of the larger set can be weighted in a natural combinatorial way so that the unwanted elements cancel out, leaving only the original set \(S\). We discuss this technique in Sections 2.6 and 2.7.
The Principle of Inclusion-Exclusion is one of the fundamental tools of enumerative combinatorics. Abstractly, the Principle of Inclusion-Exclusion amounts to nothing more than computing the inverse of a certain matrix. As such, it is simply a minor result in linear algebra. The beauty of the principle lies not in the result itself, but rather in its wide applicability. We will give several examples of problems that can be solved by Inclusion-Exclusion, some in a rather subtle way. First, we state the principle in its purest form.
2.1.1 Theorem. Let \(S\) be an \(n\)-set. Let \(V\) be the \(2^n\)-dimensional vector space (over some field \(K\)) of all functions \(f\colon 2^S\to K\). Let \(\phi\colon V\to V\) be the linear transformation defined by
\begin{equation}
\phi f(T) = \sum_{Y\supseteq T} f(Y), \text{ for all \(T\subseteq S\).}
\end{equation}
Then \(\phi^{-1}\) exists and is given by
\begin{equation}
\phi^{-1}f(T) = \sum_{Y\supseteq T} (-1)^{\#(Y-T)}f(Y), \text{ for all \(T\subseteq S\).}
\end{equation}
Proof. Define \(\psi\colon V\to V\) by \(\psi f(T) = \sum_{Y\supseteq T}(-1)^{\\#(Y-T)}f(Y)\). Then (composing functions right to left)
\begin{aligned}
\phi\psi f(T) & = \phi(\psi f)(T) \\
&= \sum_{Y\supseteq T} (\psi f)(Y) \\
&= \sum_{Y\supseteq T} \sum_{Z\supseteq Y} (-1)^{\#(Z- Y)} f(Z) \\
&= \sum_{Z \supseteq Y \supseteq T} (-1)^{\#(Z- Y)} f(Z)
\end{aligned}
Setting \(m = \\# (Z- T)\), we have
\begin{equation*}
\sum_{\substack{Z\supseteq Y \supseteq T\\ (Z,T \text{ fixed})}} (-1)^{\#(Z-Y)} = \sum_{i = 0}^{m} (-1)^{i}\binom{m}{i} = \delta_{0m},
\end{equation*}
so \(\phi\psi f(T) = f(T)\). Hence, \(\phi\psi f = f\), so \(\psi = \phi^{-1}\).
注:以上所述容斥原理可以看作局部有限偏序集上的 Möbius 反演公式的一个特例,这里的局部有限偏序集为 \((2^{S}, \subseteq)\),其上的 Möbius 函数为 \(\mu(X, Y) = (-1)^{\\#(Y - X)}\)。详见李文威《代数学方法》卷一 \(\S 5.4\),尤其是命题 5.4.3。
The following is the usual combinatorial situation involving Theorem 2.1.1. We think of \(S\) as being a set of properties that the elements of some given set \(A\) of objects may or may not have. For any subset \(T\) of \(S\), let \(f_=(T)\) be the number of objects in \(A\) that have exactly the properties in \(T\) (so they fail to have the properties in \(\overline T = S - T\)). More generally, if \(w\colon A\to K\) is any weight function on \(A\) with values in a field (or abelian group) \(K\), then one could set \(f_=(T) = \sum_x w(x)\), where \(x\) ranges over all objects in \(A\) having exactly the properties in \(T\). Let \(f_\ge(T)\) be the number of objects in \(A\) that have at least the properties in \(T\). Clearly then,
\begin{equation}
f_\ge(T) = \sum_{Y\supseteq T} f_=(Y). \label{E:f_\ge(T)}
\end{equation}
Hence by Theorem 2.1.1,
\begin{equation}
f_=(T) = \sum_{Y\supseteq T}(-1)^{\#(Y-T)}f_\ge(Y). \label{E:4}
\end{equation}
In particular, the number of objects having none of the properties in \(S\) is given by
\begin{equation}
f_=(\emptyset) = \sum_{Y}(-1)^{\#Y}f_\ge(Y), \label{E:5}
\end{equation}
where \(Y\) ranges over all subsets of \(S\). In typical applications of the Principle of Inclusion-Exclusion, it will be relatively easy to compute \(f_\ge(Y)\) for \(Y\subseteq S\), so equation \eqref{E:4} will yield a formula for \(f_=(T)\).
In equation \eqref{E:4} one thinks of \(f_\ge(T)\) (the term indexed by \(Y = T\)) as being a first approximation to \(f_=(T)\). We then subtract
\begin{equation*}
\sum_{\substack{Y\supseteq T\\ \#(Y-T) = 1}} f_\ge(Y),
\end{equation*}
to get a better approximation. Next we add back in
\begin{equation*}
\sum_{\substack{Y\supseteq T\\ \#(Y-T) = 2}} f_{\ge}(Y),
\end{equation*}
and so on, until finally reaching the explicit formula \eqref{E:4}. This reasoning explains the terminology "Inclusion-Exclusion."
Perhaps the most standard formulation of the Principle of Inclusion-Exclusion is one that dispenses with the set \(S\) of properties per se, and just considers subsets of \(A\). Thus, let \(A_1, \dots, A_n\) be subsets of a finite set \(A\). For each subset \(T\) of \([n]\), let
\begin{equation*}
A_T = \bigcap_{i\in T} A_i
\end{equation*}
(with \(A_\emptyset = A\)), and for \(0\le k\le n\) set
\begin{equation}
S_k = \sum_{\#T = k} \# A_T, \label{E:S_k}
\end{equation}
the sum of the cardinalities, or more generally the weighted cardinalities
\begin{equation*}
w(A_T) = \sum_{x\in A_T} w(x),
\end{equation*}
of all \(k\)-tuple intersections of the \(A_i\)'s. Think of \(A_i\) as defining a property \(P_i\) by the condition that \(x\in A\) satisfites \(P_i\) if and only if \(x\in A_i\). Then \(A_T\) is just the set of objects in \(A\) that have at least the properties in \(T\), so by \eqref{E:5} the number \(\\#(\overline{A_1} \cap \dots \cap\overline{A_n})\) of elements of \(A\) lying in none of the \(A_i\)'s is given by
\begin{equation}
\#(\overline{A_1} \cap \dots \cap\overline{A_n}) = S_0 - S_1 + S_2 - \dots + (-1)^{n}S_n, \label{E:7}
\end{equation}
where \(S_0 = \\#A_{\emptyset} = \\#A\).
The Principle of Inclusion-Exclusion and its various reformulations can be dualized by interchanging \(\cap\) and \(\cup\), \(\subseteq\) and \(\supseteq\), and so on, throughout. The dual form of Theorem 2.1.1 states that if
\[
\widetilde{\phi} f(T) = \sum_{Y\subseteq T} f(Y), \quad \text{for all \(T\subseteq S\)},
\]
then \(\widetilde{\phi}^{-1} f(T)\) exists and is given by
\begin{equation*}
\widetilde{\phi}^{-1} f(T) = \sum_{Y\subseteq T} (-1)^{\#(T-Y)} f(Y), \quad\text{for all \(T \subseteq S\)}.
\end{equation*}
Similarly, if we let \(f_\le(T)\) be the (weighted) number of objects of \(A\) having at most the properties in \(T\), then
\begin{equation}
\begin{aligned}
f_\le(T) &= \sum_{Y\subseteq T}f_=(Y), \\
f_=(T) &= \sum_{Y\subseteq T} (-1)^{\#(T-Y)} f_\le(Y).
\end{aligned}\label{E:8}
\end{equation}
A common special case of the Principle of Inclusion-Exclusion occurs when the function \(f_=\) satisfies \(f_=(T) = f_=(T')\) whenever \(\\#T = \\#T'\). Thus also \(f_\ge(T)\) depends only on \(\\#T\), and we set \(a(n-i) = f_=(T)\) and \(b(n-i) = f_\ge(T)\) whenever \(\\#T= i\). (Caveat. In many problems the set \(A\) of objects and \(S\) of properties will depend on a parameter \(p\), and the functions \(a(i)\) and \(b(i)\) may depend on \(p\). Thus, for example, \(a(0)\) and \(b(0)\) are the number of objects having all the properties, and this number may certainly depend on \(p\). Proposition 2.2.2 is devoted to the situation when \(a(i)\) and \(b(i)\) are independent of \(p\).) We thus obtain from equation \eqref{E:f_\ge(T)} and \eqref{E:4} the equivalence of the formulas
\begin{align}
b(m) &= \sum_{i= 0}^m \binom{m}{i} a(i), \quad 0\le m\le n, \label{E:9} \\
a(m) &= \sum_{i=0}^m \binom{m}{i} (-1)^{m-i} b(i), \quad 0\le m \le n. \label{E:10}
\end{align}
In other words, the inverse of the \((n+1)\times(n+1)\) matrix whose \((i,j)\)-entry \((0\le i, j\le n)\) is \(\binom{j}{i}\) has \((i,j)\)-entry \((-1)^{j-i}\binom{j}{i}\). For instance,
\[
\begin{bmatrix}
1 & 1 & 1 & 1\\
0 & 1 & 2 & 3\\
0 & 0 & 1 & 3 \\
0 & 0 & 0 & 1
\end{bmatrix}^{-1} = \begin{bmatrix}
1 & -1 & 1 & -1 \\
0 & 1 & -2 & 3 \\
0 & 0 & 1 & -3 \\
0 & 0 & 0 & 1
\end{bmatrix} .
\]
Of course, we may let \(n\) approach \(\infty\) so that \eqref{E:9} and \eqref{E:10} are equivalent for \(n = \infty\).
Note that in language of the calculus of finite differences, \eqref{E:10} can be rewritten as
\[
a(m) = \Delta^m b(0), \quad 0 \le m \le n.
\]
2.2 Examples and Special Cases
The canonical example of the use of the Principle of Inclusion-Exclusion is the following.
2.2.1 Example. (the "derangement problem"). How many permutations \(w \in \mathfrak{S}\_n\) have no fixed points, that is, \(w(i) \ne i\) for all \(i \in [n]\)? Such a permutation is called a derangement. Call this number \(D(n)\). Thus, \(D(0) = 1, D(1) = 0, D(2) = 1, D(3) = 2\). Think of the condition \(w(i) = i\) as the \(i\)th property of \(w\). Now the number of permutations with at least the set \(T \sse [n]\) of points fixed is \(f\_{\ge}(T) = b(n - i) = (n - i)!\), where \(\\#T = i\) (since we fix the elements of \(T\) and permute the remaining \(n - i\) elements arbitrarily). Hence by \eqref{E:10}, the number \(f\_{=}(\emptyset) = a(n) = D(n)\) of permutations with no fixed points is
\begin{equation}
D(n) = \sum_{i = 0}^{n} \binom{n}{i} (-1)^{n-i}i!. \label{E:11}
\end{equation}
The last expression may be rewritten
\begin{equation}
D(n) = n! \left(1 - \frac{1}{1!} + \frac{1}{2!} - \frac{1}{3!} + \dots + (-1)^{n} \frac{1}{n!} \right). \label{E:12}
\end{equation}
Since \(0.36787944\dots = e^{-1} = \sum\_{j\ge 0}(-1)^j / j!\), it is clear from \eqref{E:12} that \(n!/e\) is a good approximation to \(D(n)\), and indeed is is not difficult to show that \(D(n)\) is the nearest integer to \(n!/e\). It also follows immediately from \eqref{E:12} that for \(n \ge 1\),
\begin{align}
D(n) &= n D(n - 1) + (-1)^{n}, \label{E:13} \\
D(n) &= (n-1) (D(n-1) + D(n-2)). \label{E:14}
\end{align}
While it is easy to give a direct combinational proof of equation \eqref{E:14}, considerably more work is necessary to prove \eqref{E:13} combinatorially. In terms of generating functions, we have that
\begin{equation*}
\sum_{n \ge 0} D(n) \frac{x^n}{n!} = \frac{e^{-x}}{1 - x}.
\end{equation*}
The function \(b(i) = i!\) has a very special property—it depends only on \(i\), not on \(n\). Equivalently, the number of permutations \(w \in \mathfrak{S}\_n\) that have at most the set \(T \sse [n]\) of points unfixed depends only \(\\#T\), not on \(n\). This means that equation \eqref{E:11} can be rewritten in the language of the calculus of finite differences as
\begin{aligned}
D(n) = \Delta^{n} x! |_{x = 0},
\end{aligned}
which is abbreviated \(\Delta^{n} 0!\). Since the number \(b(i)\) of permutations in \(\mathfrak{S}\_n\) that have at most some specified \(i\)-set of points unfixed depends only on \(i\), the same is true of the number \(a(i)\) of permutations in \(\mathfrak{S}\_n\) that have exactly some specified \(i\) set of points unfixed. It is clear combinatorially that \(a(i) = D(i)\), and this fact is also evident from equations \eqref{E:10} and \eqref{E:11}.
Let us state formally the general result that follows from the preceding considerations.
2.2.2 Proposition. For each \(n \in \mathbb{N}\), let \(B_n\) be a (finite) set, and let \(S_n\) be a set of \(n\) properties that elements of \(B_n\) may or may not have. Suppose that for every \(T \sse S_n\), the number of \(x\in B_n\) that lack at most the properties in \(T\) (i.e., that have at least the properties in \(S -T\)) depends only on \(\\#T\), not on \(n\). Let \(b(n) = \\#B_n\), and let \(a(n)\) be the number of objects \(x \in B_n\) that have none of the properties in \(S_n\). Then \(a(n) = \Delta_{n}b(0)\).