Semantic Kernel 学习笔记:初步体验用 Semantic Memory 生成 Embedding 并进行语义搜索
Semantic Kernel 的 Memory 有两种实现,一个是 Semantic Kernel 内置的 Semantic Memory,一个是独立的 Kernel Memory,Kernel Memory 是从 Semantic Kernel 进化而来。
关于 Semantic Memory 的介绍(来源):
Semantic Memory (SM) is a library for C#, Python, and Java that wraps direct calls to databases and supports vector search. It was developed as part of the Semantic Kernel (SK) project and serves as the first public iteration of long-term memory. The core library is maintained in three languages, while the list of supported storage engines (known as "connectors") varies across languages.
学习目标:通过 Semantic Memory 调用 OpenAI 的 api,使用 text-embedding-ada-002 模型生成文本的 embedding,保存在 in-memory 向量数据库中,然后进行语义搜索。
学习材料:Semantic Kernel 源码仓库中的示例程序 Example14_SemanticMemory.cs
创建 .NET 控制台项目
dotnet new console
dotnet add package Microsoft.SemanticKernel
dotnet add package --prerelease Microsoft.SemanticKernel.Plugins.Memory
创建 ISemanticTextMemory 实例
使用 MemoryBuilder
基于 OpenAITextEmbeddingGenerationService
创建 ISemanticTextMemory
的实例 SemanticTextMemory
#pragma warning disable SKEXP0011
#pragma warning disable SKEXP0003
#pragma warning disable SKEXP0052
ISemanticTextMemory memory = new MemoryBuilder()
.WithOpenAITextEmbeddingGeneration("text-embedding-ada-002", apiKey)
.WithMemoryStore(new VolatileMemoryStore())
.Build();
#pragma warning restore SKEXP0052
#pragma warning restore SKEXP0003
#pragma warning restore SKEXP0011
注:上面代码中的 warning disable
是因为 MemoryBuilder
以及2个扩展方法都是 experimental feature
准备用户生成 Embedding 的文本数据
var sampleData = new Dictionary<string, string>
{
["https://github.com/microsoft/semantic-kernel/blob/main/README.md"]
= "README: Installation, getting started, and how to contribute",
["https://github.com/microsoft/semantic-kernel/blob/main/dotnet/notebooks/02-running-prompts-from-file.ipynb"]
= "Jupyter notebook describing how to pass prompts from a file to a semantic plugin or function"
};
生成 Embedding 并保存至 in-memory 向量数据库
var i = 0;
foreach (var entry in sampleData)
{
await memory.SaveReferenceAsync(
collection: "SKGitHub",
externalSourceName: "GitHub",
externalId: entry.Key,
description: entry.Value,
text: entry.Value);
Console.Write($" #{++i} saved.");
}
在 SaveReferenceAsync
方法中调用了 IEmbeddingGenerationService
的 GenerateEmbeddingAsync
方法生成 embedding,详见 SK 源码 SemanticTextMemory.cs#L60
var embedding = await this._embeddingGenerator.GenerateEmbeddingAsync(text, kernel, cancellationToken).ConfigureAwait(false);
注:embedding
值的类型是 ReadOnlyMemory<float>
我们这里用的是 OpenAI,所以调用的是 OpenAITextEmbeddingGenerationService
的 GenerateEmbeddingsAsync
方法生成 embedding(详见SK源码),最终调用的是 Azure.AI.OpenAI.OpenAIClient
的 GetEmbeddingsAsync
方法,详见 Azure SDK for .NET 的源码 OpenAIClient.cs#L552
基于 Embedding 数据进行语义搜索
var query = "How do I get started?";
var memoryResults = memory.SearchAsync("SKGitHub", query, limit: 1, minRelevanceScore: 0.5);
在 SearchAsync
方法中也调用了 GenerateEmbeddingsAsync
方法基于查询文本生成 embedding,详见 SemanticTextMemory.cs#L108
输出语义搜索的结果
await foreach (var memoryResult in memoryResults)
{
Console.Write($"Result:");
Console.Write(" URL: : " + memoryResult.Metadata.Id);
Console.Write(" Title : " + memoryResult.Metadata.Description);
Console.Write(" Relevance: " + memoryResult.Relevance);
}
运行控制台程序
输出结果:
#1 saved.
#2 saved.
Result:
URL: : https://github.com/microsoft/semantic-kernel/blob/main/README.md
Title : README: Installation, getting started, and how to contribute
Relevance: 0.8224089741706848
搜索成功,学习完成,完整示例代码见 https://www.cnblogs.com/dudu/articles/18037216