图像向量化_向量存储以及向量搜索和匹配

图像数据向量化

PyTorch
  pip install -i

transformers
   transformers包又名pytorch-transformers或者pytorch-pretrained-bert。
   它提供了一些列的STOA模型的实现,包括(Bert、XLNet、RoBERTa等)
    pytorch_model.bin


pip install sentence-transformers
    该框架基于 PyTorch 和 Transformers
	一种是直接使用,
	另一种是在自己的数据集上fine-tune
	零开始创建一个SentenceTransformers模型

文件说明

01.下载模型到本地
 第一个是配置文件;config.json。
 第二个是词典文件,vocab.json。
 第三个是预训练模型文件,如果你使用pytorch则保存pytorch_model.bin文件,如果你使用tensorflow 2,则保存tf_model.h5
02.加载代码修改
  from sentence_transformers import SentenceTransformer
  model = SentenceTransformer('/data/pretrained_model/all-MiniLM-L6-v2')

CLIP

CLIP(Contrastive Language-Image Pre-Training,以下简称 CLIP) 模型 
零样本图像分类任务(Zero-shot Image Classification, ZS-IMGC),是指分类未在训练集中出现的类别的图像

嵌入(Embedding)

 Embedding就是从原始数据提取出来的Feature,也就是那个通过神经网络映射之后的低维向量。
  主成分分析(PCA)和MSD(Multi-Dimensional Scaling)
  PCD 希望降维数据各个数据维度之间相互独立且方差最大从而保留最多的信息
  MDS算法希望降维后数据保留原始数据的相对位置信息
  lsomap 算法希望保留是原始数据之间的测地距离
  深度学习下,目标通过损失函数来进行体现。
  Extracting Embeddings from CNNs
 Image Embedding: vector representation of an image
     Depending on the model architecture, the method for extracting would be different.
 def get_image_embedding(model,img_path):
    pass
    preds = model.predict(x)
    curr_df = pd.DataFrame(preds[0]).T
    return curr_df
    
Feature, vector and embedding space    

代码示例

from sentence_transformers import SentenceTransformer, util
from PIL import Image
import glob
import torch

# We have implemented our own, efficient method to find high density regions in vector space
def community_detection(embeddings, threshold, min_community_size=10, init_max_size=1000):
    """
    Function for Fast Community Detection
    Finds in the embeddings all communities, i.e. embeddings that are close (closer than threshold).
    Returns only communities that are larger than min_community_size. The communities are returned
    in decreasing order. The first element in each list is the central point in the community.
    """
    # Compute cosine similarity scores
    cos_scores = util.cos_sim(embeddings, embeddings)

    # Minimum size for a community
    top_k_values, _ = cos_scores.topk(k=min_community_size, largest=True)

    # Filter for rows >= min_threshold
    extracted_communities = []
    for i in range(len(top_k_values)):
        if top_k_values[i][-1] >= threshold:
            new_cluster = []

            # Only check top k most similar entries
            top_val_large, top_idx_large = cos_scores[i].topk(k=init_max_size, largest=True)
            top_idx_large = top_idx_large.tolist()
            top_val_large = top_val_large.tolist()

            if top_val_large[-1] < threshold:
                for idx, val in zip(top_idx_large, top_val_large):
                    if val < threshold:
                        break

                    new_cluster.append(idx)
            else:
                # Iterate over all entries (slow)
                for idx, val in enumerate(cos_scores[i].tolist()):
                    if val >= threshold:
                        new_cluster.append(idx)

            extracted_communities.append(new_cluster)

    # Largest cluster first
    extracted_communities = sorted(extracted_communities, key=lambda x: len(x), reverse=True)

    # Step 2) Remove overlapping communities
    unique_communities = []
    extracted_ids = set()

    for community in extracted_communities:
        add_cluster = True
        for idx in community:
            if idx in extracted_ids:
                add_cluster = False
                break

        if add_cluster:
            unique_communities.append(community)
            for idx in community:
                extracted_ids.add(idx)

    return unique_communities


def duplicates(img_emb, threshold=0.9):
    duplicates = util.paraphrase_mining_embeddings(img_emb)
    for score, idx1, idx2 in duplicates[0:3]:
        print("\n duplicate Score: {:.3f}".format(score))
        print(img_names[idx1])
        print(img_names[idx2])
        
    near_duplicates = [entry for entry in duplicates if entry[0] < threshold and entry[0] > 0.9]
    i=0
    for score, idx1, idx2 in near_duplicates:
        i=i+1 
        print("\n {} near Score: {:.3f}".format(i,score) )
        print(img_names[idx1])
        print(img_names[idx2])
        
 
if __name__=="__main__": 
    #First, we load the CLIP model
    model = SentenceTransformer('/data/cloud/sclip-ViT-B-32')
    img_names = list(glob.glob('/data/cloud/front_wide/*.jpg'))
    print("Images:", len(img_names))
    img_emb = model.encode([Image.open(filepath) for filepath in img_names], batch_size=8, convert_to_tensor=True, show_progress_bar=True)
     
    ##图像相似性
    duplicates(img_emb, threshold=0.9)    

    ##聚类
    clusters = community_detection(img_emb, threshold=0.9, min_community_size=3, init_max_size=10)
    # Now we output the first 10 (largest) clusters
    for cluster in clusters[0:10]:
        print("\n\nCluster size:", len(cluster))

        #Output 3 images
        for idx in cluster[0:3]:
            print(img_names[idx])  

使用说明

 执行图像搜索,需要加载像 CLIP 这样的模型,并使用其encode 方法对图像和文本进行编码
 util模块中的cos_sim函数来计算它们的余弦相似度
 util模块的semantic_search函数来执行 语义搜索 
 util 模块的 paraphrase_mining 函数来实 即具有非常相似含义的文本。
    dot_score
    def pairwise_dot_score(a: Tensor, b: Tensor):
    def pairwise_cos_sim(a: Tensor, b: Tensor):
    def normalize_embeddings(embeddings: Tensor):
    def paraphrase_mining(model,
    def paraphrase_mining_embeddings(embeddings: 
    def semantic_search(query_embeddings: Tensor,
    def community_detection(embeddings, threshold=0.75, min_community_size=10, batch_size=1024):
      一般用 G(V,E),其中G就是Graph,V是节点Vertices,而E就是各节点对的连接(或称作边)Edges。
      社区检测算法(Community Detection)为网络分析量身定制的,网络分析依赖于称为边的单一属性类型。
      聚类算法倾向于将单个外围节点与其应属于的社区分开
      
    拓扑数据分析(Topological Data Analysis, TDA)
       Topological Point Cloud Clustering   https://arxiv.org/abs/2303.16716 
        https://git.rwth-aachen.de/netsci  
       将基于谱图理论的见解引入语义分割   https://github.com/michaelschaub
       https://github.com/michaelschaub/calcium-imaging-analysis/wiki
      将点云生成 转化为 拓扑表达学习的问题来解决

美感缩略图

  Reliable and Efficient Image Cropping: A Grid Anchor based Approach
     https://github.com/HuiZeng/Grid-Anchor-based-Image-Cropping  
  https://github.com/HuiZeng/Image-Adaptive-3DLUT  
   BERT (Bidirectional Encoder Representations from Transformers )BERT是以Transformer结构为基础的
      基于LSTM的语言建模体系结构
      基于Transformer的体系结构

参考

https://www.zhihu.com/people/andrewcheung/posts 
https://github.com/UKPLab/sentence-transformers
Image Similarity using CNN feature embeddings  https://medium.com/@f.a.reid/image-similarity-using-feature-embeddings-357dc01514f8  
 https://github.com/totogot/ImageSimilarity/blob/main/src/ImgSim/image_similarity.py    
https://github.com/qhduan/notebook_gist/blob/master/opensearch.ipynb 
https://huggingface.co/sentence-transformers
https://github.com/UKPLab/sentence-transformers
https://opensearch.org/docs/latest/
https://opensearch.org/docs/latest/install-and-configure/install-opensearch/docker/
https://opensearch.org/docs/latest/search-plugins/knn/approximate-knn/
https://weaviate.io/blog/how-to-choose-a-sentence-transformer-from-hugging-face
 训练一个SentenceTransformer模型 https://zhuanlan.zhihu.com/p/563844192
 sentence_transformers模型无法直接下载的解决方案 https://blog.csdn.net/PolarisRisingWar/article/details/126991633
huggingface transformers预训练模型如何下载至本地,并使用?https://zhuanlan.zhihu.com/p/147144376	 
posted @ 2023-04-04 18:06  辰令  阅读(662)  评论(0编辑  收藏  举报