【Python排序搜索基本算法】之深度优先搜索、广度优先搜索、拓扑排序、强联通&Kosaraju算法
Graph Search and Connectivity
Generic Graph Search
Goals 1. find everything findable
2. don't explore anything twice
Generic Algorithm (given graph G, vertex S)
--- initialize S explored (all others unexplored)
--- while possible:
--- choose an edge(u, v) with u explored and v unexplored
--- mark v explored
1. Breadth-First Search (BFS) O(m+n) time using a queue
--- explore nodes in 'layers'
--- can compute shortest paths
--- can compute connected components of an undirected graph
The basics:pseudocode
BFS(Graph G, start vertex s)
(all nodes initially unexplored)
mark s as explored
let Q = queue data structure(FIFO), initialized with s
while Q != 0:
remove the first node of Q, call it v
for each edge(v, w):
if w unexplored
mark w as explored
add into Q (at the end)
Shortest Paths:
Goal: compute dist(v), the fewest # of edges on a path from s to v
Extra code:
initialize dist(v) = 0 if v == s
when considering edge(v, w):
if w unexplored then set dist(w) = dist(v) + 1
claim: at termination, dist(v) = i <=> v in ith layer
Undirected Connectivity
let G = (V, E) be an undirected graph
Connected components == the 'pieces' of G
Goal: compute all connected components(why? check if network is disconnected, graph visualization, clustering, similarity)
all nodes unexplored
(assume labelled 1 to n)
for i = 1 to n
if i not yet explored
BFS(G, i) //discovers precisely i's connected components
2. Depth-First Search (DFS) O(m+n) time using a stack
--- explore aggressively like a maze, backtrack only when necessary
--- compute topological ordering of directed acycle graph(DAG)
--- compute connected components in directed graphs
pseudocode:
use a stack instead of a queue
recursive version:
DFS(Graph G, start vertex s)
mark s as explored
for every edge(s,v)
if v unexplored
DFS(G,v)
Application: Topological Sort (DAG)
Definition: A topological ordering of a directed graph G is a labelling f of G's node's such that:
1. the f(v)'s are the set{1,2,...,n}
2. (u,v) => f(u) < f(v)
note that if G has directed cycle => no topological ordering
Straightforward solution to Topological Sort
note: every directed acyclic graph has a sink vertex(入度为0的node,无前驱)
To compute topological ordering:
let v be a sink vertex of G
set f(v) = n
recurse on G - {v}
(1) 从有向图中选一个没有前驱的顶点
(2) 从图中删去该点,并删去从该点出发的所有边
(3) 重复上两步,直到图中再没有有前驱的点为止
Topological Sort via DFS
DFS(G, s)
mark s explored
for every edge(s, v)
if v not yet explored
DFS(G, v)
set f(s) = current_label
current_label --
DFS-loop(Graph G)
mark all node unexplored
current_label = n
for each vertex v:
if v unexplored
DFS(G, v)
3. Computing Strong Components: The Algorithm
Strongly connected Components
Formal Definition: the strongly connected Components(SCCs) of a directed graph G are the equivalance classes of the relation:
u~v <=> u ->v and v -> u in G
Kosaraju's Two-Pass Algorithm 2*DFS = O(m+n)
1. let Gr = G with all arcs reversed
2. run DFS-loop on Gr <---------- Goal: compute 'magical ordering' of nodes
let f(v) = 'finishing time' of each v
3. run DFS-loop on G <---------- Goal: discover the SCCs one-by-one
processing nodes in decreasing order of finishing times
SCCs = nodes with the same 'leader'
pseudocode:
DFS(G, i)
make i as explored
set leader(i) = node s
for each arc(i, j):
if j not yet explored:
DFS(G, j)
t++
set f(i) = t // i's finishing time
DFS-loop(Graph G)
global variable t = 0 // # of nodes pressed so far (for finishing times in 1st pass)
global variable s = Null // current source vertex (for leaders in 2nd pass)
Assume nodes labelled 1 to n
for i = n down to 1
if i not yet explored
s = i
DFS(G, i)
Python Code:
import sys import threading import copy threading.stack_size(67108864) sys.setrecursionlimit(300000) def DFS(edges, i, index): global t, vertices, new_vertices, s, compare if index == 1: # 1st pass vertices[i-1][1] = True # mark it explored if index == 2: # 2nd pass vertices[compare[i]-1][1] = True vertices[compare[i]-1].append(s) # set leader(i) = node s if i in edges: for v in edges[i]: if index == 1: if vertices[v-1][1] == False: DFS(edges, vertices[v-1][0], index) if index == 2: if vertices[compare[v]-1][1] == False: DFS(edges, vertices[compare[v]-1][0], index) if index == 1: t = t + 1 # i's finishing time vertices[i-1].append(t) temp = vertices[i-1].copy() temp[1] = False new_vertices.append(temp) compare[vertices[i-1][0]] = t def DFS_loop(edges, index): global t, vertices, new_vertices, s t = 0 #for finishing times in 1st pass n = len(vertices) for i in range(1, n+1): v = vertices[n-i] if v[1] == False: s = v[0] DFS(edges, v[0], index) def main(): global vertices, new_vertices, compare f = open('SCC.txt') _f = list(f) vertices = list() #[number, False] false indicates unexplored new_vertices = list() #[number, False, t, s] edges = dict() # {1:[2,5,6...]...} edges_rev = dict() # {2:[8,9,5...]...} compare = dict() for i in range(0, 875714): #875714 initialize V vertices.append([i+1, False]) for edge in _f: # initialize E temp = edge.split() edge_temp = [int(temp[0]), int(temp[1])] edge_rev_temp = [edge_temp[1], edge_temp[0]] if edge_temp[0] not in edges: edges[edge_temp[0]] = [edge_temp[1]] else: edges[edge_temp[0]].append(edge_temp[1]) if edge_rev_temp[0] not in edges_rev: edges_rev[edge_rev_temp[0]] = [edge_rev_temp[1]] else: edges_rev[edge_rev_temp[0]].append(edge_rev_temp[1]) DFS_loop(edges_rev, 1) vertices = copy.deepcopy(new_vertices) DFS_loop(edges, 2) result = dict() for item in vertices: # nodes with the same 'leader' if item[3] not in result: result[item[3]] = 1 else: result[item[3]] = result[item[3]] + 1 r = list() #output the sizes of the 10 largest SCCs for key in result: r.append(result[key]) r = sorted(r, reverse = True) print(r[0:9]) if __name__ == '__main__': thread = threading.Thread(target = main) thread. start()