SciTech-Mathematics-Probability+Statistics-原创设计与实现“数学模型”与“算法” : “点积、向量和余弦相似度”的革新: “Dot products, cosine similarity, text vectors”

请参考一下的“点积、余弦相似度和文本向量”，
我们会发现:
Cosine Similarity = (A . B) / (||A||.||B||)
以上这个“数学公式”具有“对称性”:
C.S.(A, B) = C.S.(B, A) = (A . B) / (||A||.||B||)

也就表示:
与“向量 A ”的“余弦相似度”为“某一个值”向量“B”,
有“许多个“解向量”。

那怎么解决实现“复数集”上的“N维向量空间”？

并且在在“复数集”上的“N维向量空间”, 的任何一维上，
建立“复数集”上的“运算域”及其“运算规则”？
并且设计实现多维(复数集)之间的“运算”定义与性质？

以上两条是有重要的现实意义的.
例如在“复数坐标系”上:
∠(A, B) 与 ∠(B, A) 是不同的(一个是+正角度，一个是–负角度 )
我们就能用“复数坐标系”的“两个向量的夹角”度量这“两向量”的“Similarity” 。

OR我们也能提取“两个角度”的 “Polarity”(极性，正:1, 负:-1) 与 Cosine Similarity组合来表示Similarity,

既能度量大小，也能度量两个向量的起与止.

坐标系”？
C.S.(A, B) = C.S.(B, A) = (A . B) / (||A||.||B||)

优秀的原创都“开始于一个问题”

例如 Embeding(词向量的嵌入)、Google Transformer的Attention(Q,K,V)和 P.E. Encoding, 都是有原创的数学模型与算法设计的, 当然, 他们也都是“开始于一个问题”。

Dot products, cosine similarity, text vectors

https://dev.to/sayemmh/dot-products-cosine-similarity-text-vectors-2lo4

Sayem Hoque, Posted on Oct 20, 2022

Dot products, cosine similarity, text vectors
Cosine similarity is a measure between two single dimensional vectors that gives us a value ranging 0-1 to inform of the similarity between the vectors. The formula is below:
Cosine Similarity = (A . B) / (||A||.||B||)
Where (A . B) is the dot product between vector A and B. A dot product is the sum of the element-by-element product between A and B. For example,

A = [1, 2, 3]
B = [4, 5, 6]


A . B
>> 32
# (1 * 4) + (2 * 5) + (3 * 6) = 32

Meanwhile, ||A|| is the notation used to denote the L2 Norm of a vector. The L2 norm is a method to calculate the length of a vector in Euclidean space. Think of this as the length of a vector of length N as a "line" if the vector was drawn out on a N-dimensional graph. You sum the squares of the values in each dimension, and take the square root of the sum.

A = [1, 2, 3]

norm(A)

>> 3.7416573
# (1^2 + 2^2 + 3^2)^0.5 = 3.7416573

Numpy has a bunch of helpers so we don't need to run all of these calculations manually:

import numpy as np
from numpy.linalg import norm

# define two lists or array
A = np.array([1,2,3,4])
B = np.array([1,2,3,5])

# cosine similarity
cosine = np.dot(A, B) / (norm(A) * norm(B))
print("cosine similarity:", cosine)

>> 0.9939990885479664

A cosine similarity score near 1 means the vectors are very close to one another if they were projected

posted @ 2024-07-19 21:09 abaelhe 阅读(61) 评论(0) 收藏举报

刷新页面返回顶部

abaelhe

SciTech-Mathematics-Probability+Statistics-原创设计与实现“数学模型”与“算法” : “点积、向量 和余弦相似度”的革新: “Dot products, cosine similarity, text vectors”