强连通分量算法

Strongly connected component

主要说下强连通分量,一般简称为scc

A directed graph is called strongly connected if there is a path from each vertex in the graph to every other vertex. In particular, this means paths in each direction; a path from a to b and also a path from b to a.

如果一个有向图的每个顶点到其余的顶点都有一条路径,那么这个有向图就叫强连通的。尤其注意的是,每个从a到b的路径都有一个从b到a的路径.

The strongly connected components of a directed graph G are its maximal strongly connected subgraphs. If each strongly connected component is contracted to a single vertex, the resulting graph is a directed acyclic graph, the condensation of G. A directed graph is acyclic if and only if it has no (nontrivial) strongly connected subgraphs (because a cycle is strongly connected, and every strongly connected graph contains at least one cycle).

一个有向图的强连通分量是它的最大强连通子图。如果每个强连通分量压缩为一个顶点,结果图是一个DAG有向分循环图,图G的压缩形式。有向图是非循环的,当且仅当它不包含有强连通子图(因为一个循环就是强连通的,而且每个强连通图都包含了至少一个循环)

Kosaraju's algorithm, Tarjan's algorithm and Gabow's algorithm all efficiently compute the strongly connected components of a directed graph, but Tarjan's and Gabow's are favoured in practice since they require only one depth-first search rather than two.

Kosaraju算法,tarjan算法,gobow算法都可以高效的计算出有向图的强连通分量,但是tarjan算法和gobow算法在实际应用中更受喜爱,因为它只需要一次DFS.

Algorithms for finding strongly connected components may be used to solve 2-satisfiability problems (systems of Boolean variables with constraints on the values of pairs of variables): as Aspvall, Plass & Tarjan (1979) showed, a 2-satisfiability instance is unsatisfiable if and only if there is a variable v such that v and its complement are both contained in the same strongly connected component of the implication graph of the instance.

找强连通分量的算法可以用来解决2SAT问题(变量对的值受限的一种布尔变量系统)。Aspvall,plass and tarjan (1979) 指出,一个2SAT实例是不满足的,当且仅当有一个变量v使得v和它的对立变量都包含在一个强连通分量中。

样例

一个有向图强连通分量的示例。

Kosaraju's algorithm

下面从kosaraju算法解说。

In computer science, Kosaraju's algorithm is an algorithm to find the strongly connected components of a directed graph. Aho, Hopcroft and Ullman credit it to an unpublished paper from 1978 by S. Rao Kosaraju. It makes use of the fact that the transpose graph (the same graph with the direction of every edge reversed) has exactly the same strongly connected components as the original graph.

在计算机科学中,kosaraju算法是一种寻找有向图强连通分量的算法。它利用了一个事实,置换图(将图中的边的方向换向)和原始图有相同的强连通分量。

 

Kosaraju's algorithm is simple and works as follows:

  • Let G be a directed graph and S be an empty stack.
  • While S does not contain all vertices:
    • Choose an arbitrary vertex v not in S. Perform a depth-first search starting at v. Each time that depth-first search finishes expanding a vertex u, push u onto S.
  • Reverse the directions of all arcs to obtain the transpose graph.
  • While S is nonempty:
    • Pop the top vertex v from S. Perform a depth-first search starting at v. The set of visited vertices will give the strongly connected component containing v; record this and remove all these vertices from the graph G and the stack S. Equivalently, breadth-first search (BFS) can be used instead of depth-first search.

Kodaraju算法描述如下:

1.         首先使G为一个有向图,并且S是一个空栈

2.         while S非空:

3.                任意选择一个不在栈S中的顶点。从这个顶点v出发运行一个DFS(深搜). 每次dfs扩展到一个顶点u,就将u压入栈S

4.         反向图G中所有的边获取一个置换图

5.         while S非空:

6.                取出栈顶元素v. 从v开始运行一个DFS. 访问的结点组成了一个包含v在内的强连通分量,记录下,然后从图G和栈S中去掉这些顶点。同样的,BFS也可以来做。

Mark C. Chu-Carroll(Mark Chu-Carroll (aka MarkCC) is a PhD Computer Scientist, who works for Google as a Software Engineer.) 给出来更为详细的描述:

As promised, today I'm going to talk about how to compute the strongly connected components of a directed graph. I'm going to go through one method, called Kosaraju's algorithm, which is the easiest to understand. It's possible to do better that Kosaraju's by a factor of 2, using an algorithm called Tarjan's algorithm, but Tarjan's is really just a variation on the theme of Kosaraju's.

今天,我将要讨论下如果计算有向图的强连通分量。我会描述一种算法叫kosaraju算法,最容易理解的一种算法。使用tarjan算法确实比kosaraju算法快乐2倍,但是tarjan算法也只是kosaraju算法一个变种。

Kosaraju's algorithm is amazingly simple. It uses a very clever trick based on the fact that if you reverse all of the edges in a graph, the resulting graph has the same strongly connected components as the original. So using that, we can get the SCCs by doing a forward traversal to find an ordering of vertices, then doing a traversal of the reverse of the graph in the order generated by the first traversal.

Kosaraju算法相当简单。它使用一个非常聪明的技巧,如果你反转了图的所有的边,结果图和原图有相同的强连通分量。因此,使用这个,我们可以做一个前向遍历找到一个顶点序列,然后在对反转图按照第一次遍历生成的序列在做一次遍历,就能得到scc了

That may sound a bit mysterious, but it's really very simple. Take the graph G, and do a recursive depth-first traversal of the graph, along with an initially empty stack of vertices. As the recursion started at a vertex V finishes, push V onto the stack. At the end of the traversal, you'll have all of the vertices of G in the stack. The order of the reverse traversal will be starting with the vertices on the top of that stack.

这听起来有点神秘,但是它确实很简单。那图G来说,有个初试的空栈,然后做个递归的深度遍历。当从顶点v开始的递归结束的时候,把V压入栈中。当遍历结束的时候,所有的顶点都在栈中了。反转遍历的顺序是从栈的顶部顶点开始的。

So you reverse all of the edges in the graph, creating the reverse graph, G'. Start with the vertex on top of the stack, and to a traversal from that vertex. All of the nodes reachable from that vertex form one strongly connected component. Remove everything in that SCC from the stack, and then repeat the process with the new top of the stack. When the stack is empty, you'll have accumulated all of the SCCs.

然后,反转图所有的边,创建一个反转图,G’. 从栈顶的顶点开始,从此顶点开始以个遍历,所有能够到达的顶点都属于一个强连通分量中。从栈中取出掉这些顶点,然后重新取出栈顶结点开始下个强连通分量的寻找。当栈空的时候,所有的强连通分量就找到了

As usual, things are a whole lot clearer with an example. Let's do that using the graph to the right as an example.

通常,有一个例子事情才能更加清晰。我们使用右图做个例子。

样例

First, we do a depth-first traversal of the graph, putting vertices on a stack as we finish the recursive step started with that node. We'll start the traversal with "A". A,F,E,C,G,H - H is finished, so it goes on the stack. Then G goes on the stack; then C, then E, then F, and we're back to A. So at this point, the stack is [H, G, C, E, F]. Then we keep going from A: A, B, D. D is done, so it goes on the stack; then B, then A. So the final stack is [H, G, C, E, F, D, B, A]

首先,我们做个深度遍历,当遍历结束的时候把结点放进栈中。我们从顶点A开始遍历。A F E C G H。当H遍历完时,被放入栈中,接着G被放入栈中,然后C,然后E,然后F,然后回到A. 此时,栈是[H,G,C,E,F]. 然后继续从A遍历另外一个路线,A B D. 当D完时,放入栈,然后是B,最后是A.此时的最终栈是[H,G,C,E,F,D,B,A].

Then, we reverse the graph, and start traversing from whatever is on top of the stack. So first, we start from A in the reversed graph. That gives us A, D, B. So {A, D, B} form one strongly connected component. Now, the top of the stack that hasn't been traversed yet is F. So we do F, E. {F,E} is another SCC. The remaining stack is now [H, G, C], which traverses straightforwardly as C, H, G. So we end up with {A, B, D}, {E, F}, and {C, G, H} as our SCCs.

样例

然后,我们反转图。从栈顶的元素开始,不管它是哪个。首先,我们从A开始,遍历完后事,A D B. 那么形成了一个scc. 现在栈顶没被遍历过的元素是F。因此从F遍历得到F,E.又形成了一个scc. 剩下的栈是[H G C]. 从C开始遍历得到 C H G.又形成了一个scc. 最终得到三个scc. (A,B D) (E,F) (C G H)

What's the time complexity? We need to do two traversals, and one graph reversal. If we're using an adjacency list representation, we can create the reverse graph while we do the first traversal, so it's really just two depth-first traversals. So two times the complexity of a traversal; in adjacency list representation, traversal is O(V+E), where E is the number of edges - so Kosaraju's SCC algorithm is also O(V+E). (If you use an adjacency matrix, then it's O(N2).)

时间复杂度呢。我们需要做两次遍历和一次反转。如果使用临界表表示,我们能够在第一次遍历的时候创建反转图。因此只需两次遍历即可。遍历的复杂度是O(V+E),E是边数。如果使用邻接矩阵,复杂度是O(N^2)

Can we do better than that? Order of complexity, no. You can't do better than the cost of a single traversal of the graph. You can eliminate the second traversal. Instead of doing that second traversal, you can do the forward traversal using the stack, and then pull things off the stack checking if they are the root of a strongly connected component, and if so, removing all members of that component. Tarjan's algorithm works this way, and ends up doing one traversal, but considering each edge in the graph twice. So it's slightly, but not dramatically faster. It's generally preferred just because it avoids the computation of the reverse graph.

能够在好点吗?不能了。不能好于单一的图遍历。可以消除二次遍历。代替二次遍历的方法是,可以使用栈来时前向遍历,然后取出顶点并且判断是否为 scc的根,如果是的,移除所有的相关结点。 Tarjan算法就是这样来工作的,但是要考虑边两次,因此也并不是非常快,只是稍微快点。之所以都选择tarjan是因为避免了反转操作的时间。

In terms of what we were talking about in the last post, this is good news. The computation of the SCCs is quite fast - quite a lot better than NP-complete. So we can, feasibly, use the division into SCCs as the basis of parallelization of graph algorithms.

Scc的计算非常快。要比NP完全问题好多了。因此可以吧scc的划分作为图算法并行化的基础。

未完待续。。。

posted @ 2011-02-24 13:58  AC2012  阅读(3906)  评论(0编辑  收藏  举报