AtCoder keyence2019 E Connecting Cities
keyence2019_e
\(N\) 个节点的无向图 \(G\),节点 \(i,j\) 之间的边权值为 \(|i - j| \times D + A_i + A_j\) 。
求最小生成树(Minimum Spanning Tree, MST)的权值。
数据范围
- \(1 \leq N \leq 2 \times 10^5\)
- \(1 \leq D \leq 10^9\)
- \(1 \le A_i \le 10^9\)
- \(A_i\) and \(D\) are integers.
From the editorial:
We want to compute the MST of the graph, but a straightforward algorithm doesn't work because there are \(\mathcal{O}(N^2)\) edges. We first enumerate candidates of MST edges and then compute the MST of the graph with only those edges.
考虑连通的有向图 \(G=(V,E)\) 。设 \(T=(V, E_T)\) 是 \(G\) 生成树。显然边集 \(E_T\) 完全描述了 \(T\),我们径用 \(E_T\) 来表示 \(T\) 。
From CLRS3 pp 625
Becuase a tree is a type of graph, in order to be precise we must define a tree in terms of not just its edges, but its vertices as well. Although this chapter focuses on trees in terms of their edges, we shall operate with the understanding that vertices of a tree \(T\) are those that some edge of \(T\) is incident on.
从 Kruskal 算法的过程出发,很容易得到下述结论:
边 \((u,v)\in E\) 不可能出现在任一 MST 中的充要条件是
\(E\) 中存在路径 \(u=v_0, v_1, \dots, v_{k-1}, v_{k} = v\) 满足此路径上每条边的权值都小于边 \((u,v)\) 的权值。
设从 \(E\) 中去掉边 \((u,v)\) 后图仍连通。从 \(E\) 中去掉边 \((u,v)\) 后 MST 的权值不变的充要条件是
\(E\setminus (u, v)\) 中存在路径 \(u=v_0, v_1, \dots, v_{k-1}, v_{k} = v\) 满足此路径上每条边的权值都不大于边 \((u,v)\) 的权值。
From the editorial:
... There are two ways to find candidates:
Divide and conquer.
Let's divide the array into two halves. Only consider edges between the two halves. When are interested in the edge between \(i\) and \(j\) such that \(1 \le i \le N/2\) and \(N/2 < i \le N\).
The cost of the edges can be written as \(f(i) + g(i)\), where \(f(i) = A_i - D_i\) and \(g(j) = A_j + D_j\). Let \(i_0, j_0\) be the indices that minimize the values of \(f(i), g(j)\). We claim that the edge between \(i\) and \(j\) can be a candidate only when \(i = i_0\) or \(j = j_0\). Otherwise, the three edges \((i, j_0), (i_0, j), (i_0, j_0)\) are cheaper than the edge \((i, j)\), so this edge can't be included in the MST. Thus we limit the number of edges between the two halves to \(\mathcal{O}(N)\).
If we apply divide-and-conquer with the observation above, the total number of candidate edges will be \(\mathcal{O}(N\log N)\), and this solution works in \(\mathcal{O}(N \log^2 N)\). ...
题解中的上述论述实际上假设了所有 \(f(i)\) 都不相同,所有 \(g(j)\) 都不相同;虽有瑕疵但无伤大雅。根据上述第二个结论,对于边 \((i,j)\),若 \(i \ne i_0\) 且 \(j \ne j_0\) 则路径 \(i \to j_0 \to i_0 \to j\) 上的每条边的权值都不大于(not more expensive than)边 \((i,j)\) 的权值。因此若选了 \((i,j_0), (j_0, i_0), (i_0, j)\),就不必选 \((i,j)\) 了。
关于两种常见的 MST 算法,Kruskal 和 Prim,的复杂度,CLRS3 pp 624:
... We can easily make each of them run in time \(\mathcal{O}(E \log V)\)...
题解上又提供另一种「to find candidates」的方法
Sort by \(A_i\).
For simplicity, assume that the values of \(A_i\) are pairwise distinct.
Consider a particular city (call it \(x\)). We can prove the following about the edges that connect this city and smaller cities (cities that satisfy \(A_i < A_x\)):
- Among edges that connects \(x\) and all smaller cities to the left of \(x\), we should only consider the cheapest edge.
- Among edges that connects \(x\) and all smaller cities to the right of \(x\), we should only consider the cheapest edge.
Let's prove the first claim. Suppose that among edges that connects \(x\) and all smaller cities to the left of \(x\), the cheapest one is \((x, y)\). Then, for each other \(z\) that satisfies \(z < x\) and \(A_z < A_x\), both edges \((x, y)\) and \((y, z)\) are cheaper than \((x,y)\). Thus, \((x, z)\) never becomes the MST edge. The second claim is similar.
This way, the candidates will be \(\mathcal{O}(N)\), and this solution works in \(\mathcal{O}(N\log N)\).
此段论述中仍有瑕疵,下面做一点解释和补充。
「cities to the left of \(x\)」意谓编号小于 \(x\) 的城市(即节点)。
「the values of \(A_i\) are pairwise distinct」这个 assumption 是不必要的。
把 smaller cities 理解为把所有城市按 \(A_i\) 从小到大排序后,排在 \(x\) 左边的城市,不要与此“左”与 “to the left to \(x\)” 之左相混淆。
为了便于描述,借用 CLRS3 pp 626 中 light edge 这个术语
... More generally, we say that an edge is a light edge satisfying a given property if its weight is the minimum of any edge satisfying the property.
令 \(L_x\) 表示 smaller cities to the left of \(x\) 之集合,\(R_x\) 表示 smaller cities to the right of \(x\) 之集合。
以 \(w(u,v)\) 表示边 \((u,v)\) 的权值。
设 \((y,x)\) 是连接 \(x\) 与 \(L_x\) 之间的 light edge。则对于任意 \(z \ne y\) 且 \(z \in L_x\),有 \(w(x,y) \le w(x, z)\) 且 \(w(y,z) \le w(x,z)\)。
证明:\(w(x,y) \le w(x, z)\) 由定义自明。
若 \(y > z\) 则 \(w(x, z) > w(y, z)\) 是显然的。
\(w (x, y) \le w(x, z) \implies A_y - Dy \le A_z - Dz \implies D(y - z) \ge A_y - A_z\) 。
若 \(y < z\) 则有
\begin{aligned}
w(x, z) - w(y, z) &= A_x - A_y + D(x - z) + D(y - z) \\
&\ge A_x - A_y + D(x - z) + A_y - A_ z \\
&= A_x - A_z + D(x - z) \\
&> 0
\end{aligned}
至此可得出结论:对于 \(x\) 与 \(L_x\) 之间的边,只保留 \((x,y)\)(即只保留一条 light edge),最小生成树的权值不变。
按右到左的顺序删边,即可安全地(safely)将此类冗余的边全部删掉。
注意:按右到左的顺序删边,蕴含着一个 loop invariant 。
依对称性顷见 \(x\) 与 \(R_x\) 之间亦有类似结论。