Semantic Kernel 学习笔记:初步体验用 Semantic Memory 生成 Embedding 并进行语义搜索

Semantic Kernel 的 Memory 有两种实现,一个是 Semantic Kernel 内置的 Semantic Memory,一个是独立的 Kernel Memory,Kernel Memory 是从 Semantic Kernel 进化而来。

关于 Semantic Memory 的介绍(来源):

Semantic Memory (SM) is a library for C#, Python, and Java that wraps direct calls to databases and supports vector search. It was developed as part of the Semantic Kernel (SK) project and serves as the first public iteration of long-term memory. The core library is maintained in three languages, while the list of supported storage engines (known as "connectors") varies across languages.

学习目标:通过 Semantic Memory 调用 OpenAI 的 api,使用 text-embedding-ada-002 模型生成文本的 embedding,保存在 in-memory 向量数据库中,然后进行语义搜索。

学习材料:Semantic Kernel 源码仓库中的示例程序 Example14_SemanticMemory.cs

创建 .NET 控制台项目

dotnet new console
dotnet add package Microsoft.SemanticKernel
dotnet add package --prerelease Microsoft.SemanticKernel.Plugins.Memory

创建 ISemanticTextMemory 实例

使用 MemoryBuilder 基于 OpenAITextEmbeddingGenerationService 创建 ISemanticTextMemory 的实例 SemanticTextMemory

#pragma warning disable SKEXP0011 
#pragma warning disable SKEXP0003 
#pragma warning disable SKEXP0052 
ISemanticTextMemory memory = new MemoryBuilder()
    .WithOpenAITextEmbeddingGeneration("text-embedding-ada-002", apiKey)
    .WithMemoryStore(new VolatileMemoryStore())
    .Build();
#pragma warning restore SKEXP0052 
#pragma warning restore SKEXP0003 
#pragma warning restore SKEXP0011 

注:上面代码中的 warning disable 是因为 MemoryBuilder 以及2个扩展方法都是 experimental feature

准备用户生成 Embedding 的文本数据

var sampleData = new Dictionary<string, string>
{
    ["https://github.com/microsoft/semantic-kernel/blob/main/README.md"]
        = "README: Installation, getting started, and how to contribute",
    ["https://github.com/microsoft/semantic-kernel/blob/main/dotnet/notebooks/02-running-prompts-from-file.ipynb"]
        = "Jupyter notebook describing how to pass prompts from a file to a semantic plugin or function"
};

生成 Embedding 并保存至 in-memory 向量数据库

var i = 0;
foreach (var entry in sampleData)
{
    await memory.SaveReferenceAsync(
        collection: "SKGitHub",
        externalSourceName: "GitHub",
        externalId: entry.Key,
        description: entry.Value,
        text: entry.Value);

    Console.Write($" #{++i} saved.");
}

SaveReferenceAsync 方法中调用了 IEmbeddingGenerationServiceGenerateEmbeddingAsync 方法生成 embedding,详见 SK 源码 SemanticTextMemory.cs#L60

var embedding = await this._embeddingGenerator.GenerateEmbeddingAsync(text, kernel, cancellationToken).ConfigureAwait(false);

注:embedding 值的类型是 ReadOnlyMemory<float>

我们这里用的是 OpenAI,所以调用的是 OpenAITextEmbeddingGenerationServiceGenerateEmbeddingsAsync 方法生成 embedding(详见SK源码),最终调用的是 Azure.AI.OpenAI.OpenAIClientGetEmbeddingsAsync 方法,详见 Azure SDK for .NET 的源码 OpenAIClient.cs#L552

基于 Embedding 数据进行语义搜索

var query = "How do I get started?";
var memoryResults = memory.SearchAsync("SKGitHub", query, limit: 1, minRelevanceScore: 0.5);

SearchAsync 方法中也调用了 GenerateEmbeddingsAsync 方法基于查询文本生成 embedding,详见 SemanticTextMemory.cs#L108

输出语义搜索的结果

await foreach (var memoryResult in memoryResults)
{
    Console.Write($"Result:");
    Console.Write("  URL:     : " + memoryResult.Metadata.Id);
    Console.Write("  Title    : " + memoryResult.Metadata.Description);
    Console.Write("  Relevance: " + memoryResult.Relevance);
}

运行控制台程序

输出结果:

  #1 saved.
  #2 saved.
Result:
  URL:     : https://github.com/microsoft/semantic-kernel/blob/main/README.md
  Title    : README: Installation, getting started, and how to contribute
  Relevance: 0.8224089741706848

搜索成功,学习完成,完整示例代码见 https://www.cnblogs.com/dudu/articles/18037216

posted @ 2024-02-27 16:47  dudu  阅读(300)  评论(0编辑  收藏  举报