幂迭代法应用:pivotMDS--大图数据的稀疏化布局

论文解读

首先pivotMDS是通过p个锚点,做p次单源最短路,得到一个n*p\(d_{ij}\)矩阵。通过这样一个n*p\(d_{ij}\)计算得到一个C矩阵,\(C \in R^{n\times k}\)
C矩阵计算方式:

\[\begin{equation} C_{ij} = -\frac{1}{2} (\delta_{ij}^2 - \frac{1}{n} \sum_{r=1}^{n}\delta_{rj}^2 - \frac{1}{k} \sum_{s=1}^{k}\delta_{is}^2 + \frac{1}{nk} \sum_{r=1}^{n}\sum_{s=1}^{k}\delta_{rs}^2) \end{equation} \]

\[B = C^TC \]

\[v_1 = poweriteration(B)_{max} \]

\[v_2 = poweriteration(B)_{sec} \]

\[x = Cv_1, y=Cv_2 \]

其中poweriteration由上一节幂迭代法求特征值的方法求得最大和次大特征向量。

代码

读取图数据

#
def readGraph(graph_name,fmt):
    if 'mat' in fmt:
        mat_data = io.loadmat(graph_name + '.mat')
        graph = mat_data['Problem']['A'][0][0]
        return graph
    elif 'txt' in fmt or 'csv' in fmt:
        df = pd.read_csv(graph_name + '.txt',sep=' ',names=["src","tgt"])
#         df["data"] = np.ones_like(df["src"].values)
        row = df["src"].values
        col = df["tgt"].values
        data = np.ones_like(row)
        graph = csc_matrix((data, (row, col)), shape=(max(row.max(),col.max()) + 1, max(row.max(),col.max()) + 1)) 
        return graph
    else:
        print("error:not supported format.")
        return None
graph_name = '../data/can_96'
graph = readGraph(graph_name,'txt')

数据可以从tamu稀疏矩阵数据库,其中mat文件为matlab格式文件,txt为空格分隔的文本文件,共两列,分别表示连接一条边的两个点。

计算C矩阵和B矩阵(C^TC)

pivot_Point = np.random.randint(graph.shape[0],size = NP)
d = csgraph.shortest_path(graph, directed=False, unweighted=True,indices=pivot_Point)
d[np.isinf(d)] = 0
(k,n) = d.shape
d2  = d**2
deltaIS = d2.sum(axis=0)/k
deltaRJ = d2.sum(axis=1)/n
sumALL = d2.sum()/(n*k)
C = np.zeros((n,NP),dtype=np.float64)

@jit(nopython=True) 
def getC(C):
    for i in range(n):
        for j in range(k):
            C[i][j] = -1.0/2 * (d[j][i]**2 - deltaRJ[j] - deltaIS[i] + sumALL)
    return C
C = getC(C)
B = np.dot(C.T,C)

幂迭代法

def power_iteration(A, num_simulations: int):
    b_k = np.random.rand(A.shape[1])
    
    for _ in range(num_simulations):
        b_k1 = np.dot(A, b_k)
        b_k1_norm = np.linalg.norm(b_k1)
        b_k = b_k1 / b_k1_norm
 
    return b_k

V_1 = power_iteration(B, 100).reshape(1,-1)
lbd  = np.dot(V_1,np.dot(B,V_1.T)) 
B_2 = B - lbd / np.linalg.norm(V_1)**2 * np.dot(V_1.T,V_1)
V_2 = power_iteration(B_2, 100)

结果

pos = np.zeros((n,2))
pos[:,0] = np.dot(C,V_1.reshape(-1,1)).reshape(-1)
pos[:,1] = np.dot(C,V_2.reshape(-1,1)).reshape(-1)


X = pos.copy()
plt.figure(figsize=(32,32))
plt.axis('equal')
ax = plt.axes()
ax.set_xlim(min(X[:,0])-10, max(X[:,0])+10)
ax.set_ylim(min(X[:,1])-10, max(X[:,1])+10)
# ax.plot(X[:,0],X[:,1],'*',color='b',markersize=0.4)
lines = []
for i,j in zip(*graph.nonzero()):
    lines.append([X[i], X[j]])

lc = mc.LineCollection(lines, linewidths=.3, colors='#0000007f')
ax.add_collection(lc)
plt.savefig('../result/' + graph_name + '.pdf', dpi=1000)
plt.show()

最终效果如下:

can_96 pivotMDS 布局效果
dwt_1005  pivotMDS 布局效果
欢迎访问个人博客。转载请注明出处。

posted @ 2020-02-02 22:33  fahaizhong  阅读(567)  评论(0编辑  收藏  举报