算法的输入是所有数据在高维情况下两两之间的距离(记i与j的距离为Dij)。
首先我们把所有数据点随机绘制在一张二维图像上,然后计算它们两两之间的距离dij,然后我们计算出它与高维距离Dij的误差,根据这些误差,我们将每对数据点按比例移近或移远,然后重新计算所有dij,不断重复到我们没法减少误差为止。
python代码实现
def scaledown(data,distance=pearson,rate=0.01): n=len(data) # The real distances between every pair of items realdist=[[distance(data[i],data[j]) for j in range(n)] for i in range(0,n)] # Randomly initialize the starting points of the locations in 2D loc=[[random.random(),random.random()] for i in range(n)] fakedist=[[0.0 for j in range(n)] for i in range(n)] lasterror=None for m in range(0,1000): # Find projected distances for i in range(n): for j in range(n): fakedist[i][j]=sqrt(sum([pow(loc[i][x]-loc[j][x],2) for x in range(len(loc[i]))])) # Move points grad=[[0.0,0.0] for i in range(n)] totalerror=0 for k in range(n): for j in range(n): if j==k: continue # The error is percent difference between the distances errorterm=(fakedist[j][k]-realdist[j][k])/realdist[j][k] # Each point needs to be moved away from or towards the other # point in proportion to how much error it has grad[k][0]+=((loc[k][0]-loc[j][0])/fakedist[j][k])*errorterm grad[k][1]+=((loc[k][1]-loc[j][1])/fakedist[j][k])*errorterm # Keep track of the total error totalerror+=abs(errorterm) print totalerror # If the answer got worse by moving the points, we are done if lasterror and lasterror<totalerror: break lasterror=totalerror # Move each of the points by the learning rate times the gradient for k in range(n): loc[k][0]-=rate*grad[k][0] loc[k][1]-=rate*grad[k][1] return loc