余弦相似度Cosine Sim

what

余弦相似度是一种用于度量向量相似性的metric。

\[cos\theta = \frac{A.B}{|A|.|B|} \]

  • A.B:向量的内积
  • |A|:向量的模长
  • \(cos\theta\):的范围$ [ -1 , 1 ] $

why

余弦相似度的计算复杂度很低,对于稀疏向量而言,只用考虑非零向量

How

math库实现

import numpy as np
import math

def cosine_similarity(vec1, vec2) -> float:
    norm_vec1, norm_vec2 = 0, 0
    dot_product = 0
    for v1, v2 in zip(vec1, vec2):
        dot_product += v1 * v2
        norm_vec1 += v1 * v1
        norm_vec2 += v2 * v2
    norm_vec1 = math.sqrt(norm_vec1)
    norm_vec2 = math.sqrt(norm_vec2)
    return dot_product / (norm_vec1 * norm_vec2)

if __name__ == '__main__':
    print(cosine_similarity([1, 2, 3], [-1, -2, -3]))

numpy实现

import numpy as np

def cosine_similarity(vec1, vec2) -> float:
    norm_vec1 = np.linalg.norm(vec1)
    norm_vec2 = np.linalg.norm(vec2)
    return np.dot(vec1, vec2) / (norm_vec1 * norm_vec2)

if __name__ == '__main__':
    print(cosine_similarity([1, 2, 3], [1, 2, 3]))

pytorch实现

import torch
import torch.nn.functional as F

vec1 = torch.FloatTensor([1, 2, 3, 4])
vec2 = torch.FloatTensor([5, 6, 7, 8])

cos_sim = F.cosine_similarity(vec1, vec2, dim=0)
print(cos_sim) 
posted @ 2024-12-15 22:38  cxy8  阅读(3)  评论(0编辑  收藏  举报