(段设期中复习) Great Ideas in Algorithm Analysis: Midterm Review
Distance Algorithms
Basic sampling lemma: Let \(S_1,\dots,S_n \subset [n]\) be sets of size at least \(D\), then randomly choose \(c (n/D) \log n\) elements will make each \(S_i\) contain at least one element.
2-Additive Approximation of APSP
Sample \(\tilde O(n^{1/2})\) points be \(D_1\) such that all vertices with degree \(\geq n^{1/2}\) is adjacent to at least one of them. First compute the distance from each \(D_1\). Then run APSP over \(E_2 \cup \delta(V \times D_1)\), where \(E_2\) are the edges have \(u\notin V_1\) or \(v\notin V_1\). Since \(|E_2| \ll n^{3/2}\), the time complexity is \(\tilde O(n^{5/2})\).
The correctness come from that if there is a \(V_1\) point on the path, then we have an adjacent point in \(D_1\). Thus we have an \(2\)-additive approximation.
A further improvement: Let \(V_1\) be those degree \(\geq n^{2/3}\) and \(V_2\) be those degree \(\geq n^{1/3}\), and \(D_1,D_2\) respectively. Let \(E_2\) be edges \(u_1\notin V_1\) or \(v_1\notin V_1\), and \(E_3\) be edges \(u_2\notin V_2\) or \(v_2\notin V_2\).
Let \(\delta\) be the distances from \(D_1\), and \(\delta'\) be the distances from \(D_2\) under \(E_2\). For each \(u\), compute the answer over \(E_3 \cup E^* \cup \delta \cup \delta'(u\times D_2)\), where \(E^*\) are edges connecting \(V_2\) and \(D_2\). Clearly this has time complexity \(\tilde O(n^{7/3})\).
If a path has no \(V_2\) points, clearly will be covered. If it has a \(V_1\) point, will be covered by \(\delta\). For the rest case, let \(w\) be the last \(V_2\) point, \((u, w')\) is covered by \(\delta'\), and \((w', v)\) is covered by \(E^*\) and the rest edges.
3-Multiplicative Approximation of APSP
Under some preparation, we can compute the nearest \(b\) elements in \(\tilde O(b^2)\) time.
For each point, compute for \(b = n^{2/3}\), and sample a hitting set \(D\) of size \(\tilde O(n^{1/3})\). This needs time \(\tilde O(n^{7/3})\). For \(u, v\) such that \(v\) doesn't lie in the ball of \(u\), use the \(D\) point in the ball as a middle point. This gives a \(3\)-multiplicative approximation.
Fibonacci Heap and its Applications
We first consider how to maintain a heap only supporting merge (this immediately maintains insert) and extract-min.
We maintain a list, locating on the minimum, each element of the list is a binomial heap.
When we merge to lists, we simply locate the new minimum. When we extract-min, we first remove the minimum, and we should find the new minimum. We can simply merge the binomial heaps with the same rank. Let potential \(\Phi\) be the size of the list, then the amortized time complexity is \(O(\log n)\), and for insert it's \(O(1)\).
Now we consider how to support decrease-key. When a node is decreased, we cut it from its parent, and insert it into the list. But this could make the rank very large.
We let each node has a mark. When we cut a node, we mark its parent, when the parent is already marked, we also cut it and recursively check its parent.
This guarantees that each node has at most one cut child, for a rank \(r\) node (here we use degree to be rank), the number of its subtree is \(\gg \phi^r\). This gives the upper bound \(O(\log n)\) on the rank of tree.
Minimum Spanning Tree
Recall that we now have two different approaches.
- Prim's algorithm, using Fibonacci heap, achieves \(O(m + n \log n)\) time.
- Bor\r{u}vka's algorithm, achieves \(O(m\log n)\) time.
Combine them: Let \(k = m / n\), each phase we run Prim until we reach a connected component of size \(2^k\). In the last of each phase, we contract the graph in the Bor\r{u}vka sense. This gives us \(O(m + n \log(2^k)) = O(m)\) time to contract the graph into size \(n' = n / 2^k\). Thus \(k' = m'/n' = 2^k\). The total time complexity is \(O(m \beta(m, n))\). Here \(\beta(m, n) = \min\{ i : \log^{(i)}n \leq m/n \}\).
Union-Find and Inverse Ackermann function
First optimization is the rank heuristic, it makes the depth \(\ll \log n\), thus gives time
complexity \(O(\log n)\).
Second optimization is the path compression.
The first analysis is for a compression, in a find operation, all but at most \(O(\log \log n)\) nodes don't have their \(r(p(x)) - r(x)\) multiply by \(2\). So take the potential be \(\Phi = \sum \log(r(p(x))-r(x))\), gives an upper bound of amortized time complexity \(O(\log \log n)\).
A better analysis: Let \(T(m, r)\) be the time complexity of \(m\) operations with rank \(\leq r\). Then for those rank \(\geq s\), they have at most \(n/2^s\) points. So for those compressions on the high rank vertices, total time is at most \(m/2^s \cdot r\). This gives a bound \(T(m, r) \leq T(m, s) + O(m/2^s \cdot r + m)\). Take \(s = \log r\), we have
\(T(m, r) \leq O(m\log^* m)\).
A further analysis will give the bound \(O(m\alpha(m))\).
Semigroup Partial Sum
Well known in OI.
Splay Tree, Link/Cut Tree and ET-Tree
Well known in OI.
Dynamic Connectivity
Holm–de Lichtenberg–Thorup Algorithm
Intuition: We first maintain a spanning forest of the graph, the problem is that when we delete a tree edge, we need to find an edge that can replace it. We brute force on the smaller component and enumerate the non-tree edges on them. And we also need to prevent each edge from being enumerated so many times. The key point is to push those probed edges into a smaller tree.
Let \(E^{(i)}\) be a nested sequence of edge sets, where \(\bigsqcup_i E^{(i)} = E\). Let \(F^{(i)}\) be the spanning forest of \(E^{(0)} \sqcup \cdots \sqcup E^{(i)}\), and satisfying \(F^{(i)} \subset F^{(i-1)}\). When we delete a tree edge, for those probed edges, we push them to a higher level.
By induction, we can prove that each connected component in \(F^{(i)}\) has size at most \(n / 2^i\). This shows that each edge will be probed at most \(O(\log n)\) times. The amortized time complexity is \(O(\log^2 n)\).
Frederickson's Worst-Case Algorithm
First split the graph into bounded degree, then each tree can be partitioned into \(\Theta(m/B)\) blocks of size \(\Theta(B)\).
Use a worst-case dynamic tree to maintain the extra edges between blocks. Take \(B = \sqrt m\), we can find the replacing edge in \(\tilde O (\sqrt m)\) time.
Comment: This is also capable for maintaining MST!
Eppstein–Galil–Italiano–Nissenzweig's Sparsification Trick
Suppose we maintain two forests \(F_1, F_2\), then each time there are minor changes of \(F_1\) or \(F_2\), we can use the above algorithm to maintain the spanning forest of \(F_1\cup F_2\) in time \(\tilde O(\sqrt n)\).
So we can maintain the whole structure in a binary tree, in time \(\tilde O(\sqrt n)\).
Some Maximum Flow Algorithms
Capacity Scaling
Each phase might augment \(m\) paths at most, so the time is \(O(m^2 \log U)\).
Edmonds–Karp Algorithm
Each time augments a shortest path. For a shortest path DAG, lose one edge each time. So each \(m\) augmentations will increase the length of the shortest path by at least \(1\). There are at most \(mn\) augmentations, so the time complexity is \(O(m^2n)\).
Dinic's Algorithm
Use dynamic tree to find an augmenting path. Each path only takes \(O(\log n)\) time, so the total time complexity is \(O(mn\log n)\).
Or a naive implementation without dynamic tree, has time \(O(mn^2)\).
Analysis on Unit Capacity Graph
Firstly, the blocking flow only takes \(O(m)\) time, since each time we will use all the edges.
After \(n^{2/3}\) phases, the distance from \(s\) to \(t\) is at least \(n^{2/3}\). By pigeonhole principle, there are two adjacent levels with total size at most \(n^{1/3}\), so the rest flow is at most \(n^{2/3}\).
Another analysis: After \(m^{1/2}\) phases, the distance from \(s\) to \(t\) is at least \(m^{1/2}\). By pigeonhole principle, there is a level with at most \(m^{1/2}\) edges between, so the rest flow is at most \(m^{1/2}\).
This gives an upper bound \(O(m \min \{ n^{2/3}, m^{1/2} \})\).
Karger–Levine's Algorithm over Undirected Unit Capacity Graph
We first show that in a unit capacity graph, an acyclic flow has at most \(O(n^{3/2})\) edges. Suppose the shortest path has length \(d\), the preceding argument shows that the maximum flow is at most \(O((n/d)^2)\), thus for a \(v\)-flow, the shortest path is \(O(n / v^{1/2})\). So the total number of edges is \(\ll n \cdot (1 / 1^{1/2} + \cdots + 1/v^{1/2}) \ll n v^{1/2}\).
Using fast dynamic connectivity, we can maintain the directed graph given by the residual network, which only has \(O(n v^{1/2})\) edges. So the total time complexity is \(\tilde O(n v^{3/2})\).
Min-Cut and Gomory–Hu Tree
Karger's Algorithm
Consider the following randomized strategy: Select a random edge and contract it, until there are two vertices. For the first \(i\) contraction, the minimum cut is preserved with probability at least \(\sim (n-i)^2 / n^2\). Each time we contract to \(n-i = n/\sqrt 2\) and then try two times.
The time is \(T(n) = 2T(n/\sqrt 2) + O(n^2)\), the master theorem gives \(T(n) = O(n^2\log n)\).
The success probability gives \(P(n) \geq 1 - (1 - P(n / \sqrt 2) / 2)^2\), one can prove that \(P(n) \gg 1/\log n\). So repeating \(O(\log n)\) times, i.e., with time \(O(n^2\log^2 n)\), gives constant probability of finding a minimum cut.
Gomory–Hu Tree
Let \(f(s, t)\) be the minimum cut between \(s\) and \(t\). Let \(\delta(S)\) be the cut value between \(S\) and \(V\smallsetminus S\). It's not hard to see that \(\delta\) is symmetric and submodular, i.e., \(\delta(S) + \delta(T) \geq \delta(S\cap T) + \delta(S\cup T)\).
The construction of Gomory–Hu tree is based on the following induction. We have a tree over a partition \(\bigsqcup V_i = V\), for adjacent \((V_i, V_j)\), there exists a representative \(s_i\) in each block, such that the partition given by removing the edge \((V_i, V_j)\) is a minimum cut of \((s_i, s_j)\).
Now suppose we want to refine \(V_i\), select another representative \(s'\), compute a minimum cut between \(s_i\) and \(s'\), for each adjacent block, by submodularity, it's not hard to prove that it can be completely lying in \(S\) or not in \(S\). This gives a partition on the adjacent edges. The two cases can be verified separately.
Whenever the Gomory–Hu tree is completely done, for each two points \(s, t\), we can prove that \(f(s, t) \geq \min \{ f(x_0, x_1), \dots, f(x_{\ell-1}, x_\ell) \}\) where \(x_0 = s, x_\ell = t\). And since now each one gives an \((s, t)\) cut, the equality is achieved.
So our Gomory–Hu tree gives an equivalent graph, the minimum cut on such tree is exactly the same with the original graph.