幂迭代法应用:pivotMDS--大图数据的稀疏化布局
论文解读
首先pivotMDS
是通过p个锚点,做p次单源最短路,得到一个n*p
的\(d_{ij}\)矩阵。通过这样一个n*p
\(d_{ij}\)计算得到一个C矩阵,\(C \in R^{n\times k}\)。
C矩阵计算方式:
\[\begin{equation}
C_{ij} = -\frac{1}{2} (\delta_{ij}^2 - \frac{1}{n} \sum_{r=1}^{n}\delta_{rj}^2 - \frac{1}{k} \sum_{s=1}^{k}\delta_{is}^2 + \frac{1}{nk} \sum_{r=1}^{n}\sum_{s=1}^{k}\delta_{rs}^2)
\end{equation}
\]
\[B = C^TC
\]
\[v_1 = poweriteration(B)_{max}
\]
\[v_2 = poweriteration(B)_{sec}
\]
\[x = Cv_1, y=Cv_2
\]
其中poweriteration由上一节幂迭代法求特征值的方法求得最大和次大特征向量。
代码
读取图数据
#
def readGraph(graph_name,fmt):
if 'mat' in fmt:
mat_data = io.loadmat(graph_name + '.mat')
graph = mat_data['Problem']['A'][0][0]
return graph
elif 'txt' in fmt or 'csv' in fmt:
df = pd.read_csv(graph_name + '.txt',sep=' ',names=["src","tgt"])
# df["data"] = np.ones_like(df["src"].values)
row = df["src"].values
col = df["tgt"].values
data = np.ones_like(row)
graph = csc_matrix((data, (row, col)), shape=(max(row.max(),col.max()) + 1, max(row.max(),col.max()) + 1))
return graph
else:
print("error:not supported format.")
return None
graph_name = '../data/can_96'
graph = readGraph(graph_name,'txt')
数据可以从tamu稀疏矩阵数据库,其中mat文件为matlab格式文件,txt为空格分隔的文本文件,共两列,分别表示连接一条边的两个点。
计算C矩阵和B矩阵(C^TC)
pivot_Point = np.random.randint(graph.shape[0],size = NP)
d = csgraph.shortest_path(graph, directed=False, unweighted=True,indices=pivot_Point)
d[np.isinf(d)] = 0
(k,n) = d.shape
d2 = d**2
deltaIS = d2.sum(axis=0)/k
deltaRJ = d2.sum(axis=1)/n
sumALL = d2.sum()/(n*k)
C = np.zeros((n,NP),dtype=np.float64)
@jit(nopython=True)
def getC(C):
for i in range(n):
for j in range(k):
C[i][j] = -1.0/2 * (d[j][i]**2 - deltaRJ[j] - deltaIS[i] + sumALL)
return C
C = getC(C)
B = np.dot(C.T,C)
幂迭代法
def power_iteration(A, num_simulations: int):
b_k = np.random.rand(A.shape[1])
for _ in range(num_simulations):
b_k1 = np.dot(A, b_k)
b_k1_norm = np.linalg.norm(b_k1)
b_k = b_k1 / b_k1_norm
return b_k
V_1 = power_iteration(B, 100).reshape(1,-1)
lbd = np.dot(V_1,np.dot(B,V_1.T))
B_2 = B - lbd / np.linalg.norm(V_1)**2 * np.dot(V_1.T,V_1)
V_2 = power_iteration(B_2, 100)
结果
pos = np.zeros((n,2))
pos[:,0] = np.dot(C,V_1.reshape(-1,1)).reshape(-1)
pos[:,1] = np.dot(C,V_2.reshape(-1,1)).reshape(-1)
X = pos.copy()
plt.figure(figsize=(32,32))
plt.axis('equal')
ax = plt.axes()
ax.set_xlim(min(X[:,0])-10, max(X[:,0])+10)
ax.set_ylim(min(X[:,1])-10, max(X[:,1])+10)
# ax.plot(X[:,0],X[:,1],'*',color='b',markersize=0.4)
lines = []
for i,j in zip(*graph.nonzero()):
lines.append([X[i], X[j]])
lc = mc.LineCollection(lines, linewidths=.3, colors='#0000007f')
ax.add_collection(lc)
plt.savefig('../result/' + graph_name + '.pdf', dpi=1000)
plt.show()
最终效果如下:
欢迎访问个人博客。转载请注明出处。