图像数据向量化
PyTorch
pip install -i
transformers
transformers包又名pytorch-transformers或者pytorch-pretrained-bert。
它提供了一些列的STOA模型的实现,包括(Bert、XLNet、RoBERTa等)
pytorch_model.bin
pip install sentence-transformers
该框架基于 PyTorch 和 Transformers
一种是直接使用,
另一种是在自己的数据集上fine-tune
零开始创建一个SentenceTransformers模型
文件说明
01.下载模型到本地
第一个是配置文件;config.json。
第二个是词典文件,vocab.json。
第三个是预训练模型文件,如果你使用pytorch则保存pytorch_model.bin文件,如果你使用tensorflow 2,则保存tf_model.h5
02.加载代码修改
from sentence_transformers import SentenceTransformer
model = SentenceTransformer('/data/pretrained_model/all-MiniLM-L6-v2')
CLIP
CLIP(Contrastive Language-Image Pre-Training,以下简称 CLIP) 模型
零样本图像分类任务(Zero-shot Image Classification, ZS-IMGC),是指分类未在训练集中出现的类别的图像
嵌入(Embedding)
Embedding就是从原始数据提取出来的Feature,也就是那个通过神经网络映射之后的低维向量。
主成分分析(PCA)和MSD(Multi-Dimensional Scaling)
PCD 希望降维数据各个数据维度之间相互独立且方差最大从而保留最多的信息
MDS算法希望降维后数据保留原始数据的相对位置信息
lsomap 算法希望保留是原始数据之间的测地距离
深度学习下,目标通过损失函数来进行体现。
Extracting Embeddings from CNNs
Image Embedding: vector representation of an image
Depending on the model architecture, the method for extracting would be different.
def get_image_embedding(model,img_path):
pass
preds = model.predict(x)
curr_df = pd.DataFrame(preds[0]).T
return curr_df
Feature, vector and embedding space
代码示例
from sentence_transformers import SentenceTransformer, util
from PIL import Image
import glob
import torch
# We have implemented our own, efficient method to find high density regions in vector space
def community_detection(embeddings, threshold, min_community_size=10, init_max_size=1000):
"""
Function for Fast Community Detection
Finds in the embeddings all communities, i.e. embeddings that are close (closer than threshold).
Returns only communities that are larger than min_community_size. The communities are returned
in decreasing order. The first element in each list is the central point in the community.
"""
# Compute cosine similarity scores
cos_scores = util.cos_sim(embeddings, embeddings)
# Minimum size for a community
top_k_values, _ = cos_scores.topk(k=min_community_size, largest=True)
# Filter for rows >= min_threshold
extracted_communities = []
for i in range(len(top_k_values)):
if top_k_values[i][-1] >= threshold:
new_cluster = []
# Only check top k most similar entries
top_val_large, top_idx_large = cos_scores[i].topk(k=init_max_size, largest=True)
top_idx_large = top_idx_large.tolist()
top_val_large = top_val_large.tolist()
if top_val_large[-1] < threshold:
for idx, val in zip(top_idx_large, top_val_large):
if val < threshold:
break
new_cluster.append(idx)
else:
# Iterate over all entries (slow)
for idx, val in enumerate(cos_scores[i].tolist()):
if val >= threshold:
new_cluster.append(idx)
extracted_communities.append(new_cluster)
# Largest cluster first
extracted_communities = sorted(extracted_communities, key=lambda x: len(x), reverse=True)
# Step 2) Remove overlapping communities
unique_communities = []
extracted_ids = set()
for community in extracted_communities:
add_cluster = True
for idx in community:
if idx in extracted_ids:
add_cluster = False
break
if add_cluster:
unique_communities.append(community)
for idx in community:
extracted_ids.add(idx)
return unique_communities
def duplicates(img_emb, threshold=0.9):
duplicates = util.paraphrase_mining_embeddings(img_emb)
for score, idx1, idx2 in duplicates[0:3]:
print("\n duplicate Score: {:.3f}".format(score))
print(img_names[idx1])
print(img_names[idx2])
near_duplicates = [entry for entry in duplicates if entry[0] < threshold and entry[0] > 0.9]
i=0
for score, idx1, idx2 in near_duplicates:
i=i+1
print("\n {} near Score: {:.3f}".format(i,score) )
print(img_names[idx1])
print(img_names[idx2])
if __name__=="__main__":
#First, we load the CLIP model
model = SentenceTransformer('/data/cloud/sclip-ViT-B-32')
img_names = list(glob.glob('/data/cloud/front_wide/*.jpg'))
print("Images:", len(img_names))
img_emb = model.encode([Image.open(filepath) for filepath in img_names], batch_size=8, convert_to_tensor=True, show_progress_bar=True)
##图像相似性
duplicates(img_emb, threshold=0.9)
##聚类
clusters = community_detection(img_emb, threshold=0.9, min_community_size=3, init_max_size=10)
# Now we output the first 10 (largest) clusters
for cluster in clusters[0:10]:
print("\n\nCluster size:", len(cluster))
#Output 3 images
for idx in cluster[0:3]:
print(img_names[idx])
使用说明
执行图像搜索,需要加载像 CLIP 这样的模型,并使用其encode 方法对图像和文本进行编码
util模块中的cos_sim函数来计算它们的余弦相似度
util模块的semantic_search函数来执行 语义搜索
util 模块的 paraphrase_mining 函数来实 即具有非常相似含义的文本。
dot_score
def pairwise_dot_score(a: Tensor, b: Tensor):
def pairwise_cos_sim(a: Tensor, b: Tensor):
def normalize_embeddings(embeddings: Tensor):
def paraphrase_mining(model,
def paraphrase_mining_embeddings(embeddings:
def semantic_search(query_embeddings: Tensor,
def community_detection(embeddings, threshold=0.75, min_community_size=10, batch_size=1024):
一般用 G(V,E),其中G就是Graph,V是节点Vertices,而E就是各节点对的连接(或称作边)Edges。
社区检测算法(Community Detection)为网络分析量身定制的,网络分析依赖于称为边的单一属性类型。
聚类算法倾向于将单个外围节点与其应属于的社区分开
拓扑数据分析(Topological Data Analysis, TDA)
Topological Point Cloud Clustering https://arxiv.org/abs/2303.16716
https://git.rwth-aachen.de/netsci
将基于谱图理论的见解引入语义分割 https://github.com/michaelschaub
https://github.com/michaelschaub/calcium-imaging-analysis/wiki
将点云生成 转化为 拓扑表达学习的问题来解决
美感缩略图
Reliable and Efficient Image Cropping: A Grid Anchor based Approach
https://github.com/HuiZeng/Grid-Anchor-based-Image-Cropping
https://github.com/HuiZeng/Image-Adaptive-3DLUT
BERT (Bidirectional Encoder Representations from Transformers )BERT是以Transformer结构为基础的
基于LSTM的语言建模体系结构
基于Transformer的体系结构
参考
https://www.zhihu.com/people/andrewcheung/posts
https://github.com/UKPLab/sentence-transformers
Image Similarity using CNN feature embeddings https://medium.com/@f.a.reid/image-similarity-using-feature-embeddings-357dc01514f8
https://github.com/totogot/ImageSimilarity/blob/main/src/ImgSim/image_similarity.py
https://github.com/qhduan/notebook_gist/blob/master/opensearch.ipynb
https://huggingface.co/sentence-transformers
https://github.com/UKPLab/sentence-transformers
https://opensearch.org/docs/latest/
https://opensearch.org/docs/latest/install-and-configure/install-opensearch/docker/
https://opensearch.org/docs/latest/search-plugins/knn/approximate-knn/
https://weaviate.io/blog/how-to-choose-a-sentence-transformer-from-hugging-face
训练一个SentenceTransformer模型 https://zhuanlan.zhihu.com/p/563844192
sentence_transformers模型无法直接下载的解决方案 https://blog.csdn.net/PolarisRisingWar/article/details/126991633
huggingface transformers预训练模型如何下载至本地,并使用?https://zhuanlan.zhihu.com/p/147144376