LlamaIndex:a data framework for your LLM applications,especially for RAG
一、LlamaIndex是什么
LlamaIndex 是一个数据框架,用于基于大型语言模型(LLM)的应用程序来摄取、构建和访问私有或特定领域的数据。
LlamaIndex由以下几个主要能力模块组成:
- 数据连接器(Data connectors):按照原生的来源和格式摄取你的私有数据,这些来源可能包括API、PDF、SQL等等(更多)。
- 数据索引(Data indexes):以中间表示(intermediate representations)形式构建和存储你的数据,使其易于LLMs消费且性能高效。
- 引擎(Engines):提供对你数据的自然语言访问接口。例如:
- 查询引擎是强大的检索接口,用于增强知识的输出。
- 聊天引擎是对话式接口,用于与你的数据进行多条消息的“来回”交互。
- 数据代理(Data agents):是由LLM驱动的知识工作者,由从简单辅助功能到API集成等工具组成。
- 应用集成(Application integrations):将LlamaIndex重新整合回你的整个生态系统中。这可能是LangChain、Flask、Docker、ChatGPT或者……其他任何东西!
参考链接:
https://github.com/run-llama/llama_index
二、LlamaIndex解决了什么问题
大型语言模型(LLMs)为人类与数据之间提供了一种自然语言交互接口。广泛可用的模型已经在大量公开可用的数据上进行了预训练,例如维基百科、邮件列表、教科书、源代码等等。 然而,尽管LLMs在大量数据上进行了训练,它们并没有针对你的数据进行训练,这些数据可能是私有的或者特定于你试图解决的问题。这些数据可能隐藏在API接口后面,存储在SQL数据库中,或者被困在PDF文档和幻灯片中。
LlamaIndex通过连接到这些数据源并将这些数据添加到LLMs已有的数据中来解决这个问题。这通常被称为检索增强生成(Retrieval-Augmented Generation, RAG)。RAG使你能够使用LLMs查询你的数据、转换它,并产生新的洞见。你可以询问有关你数据的问题,创建聊天机器人,构建半自主代理等等。
三、构建RAG应用的几个关键性环节
RAG的五个关键阶段将成为您构建的任何更大应用程序的一部分。这些阶段包括:
-
加载(Loading):这指的是将您的数据从其所在位置 —— 无论是文本文件、PDF、另一个网站、数据库还是API —— 引入到您的处理流程中。LlamaHub提供了数百种连接器可供选择。
-
索引(Indexing):这意味着创建一个允许查询数据的数据结构。对于LLM来说,这几乎总是意味着创建向量嵌入(即数据的语义的向量表示),以及许多其他元数据策略,以便于准确地找到上下文相关的数据。
-
存储(Storing):一旦您的数据被索引,您几乎总是会想要存储您的索引以及其他元数据,以避免必须重新索引。
-
查询(Querying):对于任何给定的索引策略,您都可以使用多种方式利用LLM和LlamaIndex数据结构进行查询,包括子查询、多步骤查询和混合策略。
-
评估(Evaluation):任何处理流程中的一个关键步骤是检查其相对于其他策略的有效性,或者当您进行更改时的有效性。评估提供了客观的衡量指标,可以衡量您对查询的响应的准确性、忠实度和速度。
0x1:Loading stage
1、Nodes and Documents
文档(Document)是任何数据源的容器 —— 例如一个PDF文件、一个API输出或者从数据库检索的数据。
节点(Node)是LlamaIndex中数据的原子单位,代表来源文档的一个“chunk”。节点具有元数据,这些元数据将它们与所在的文档以及其他节点相关联。
2、Connectors
数据连接器(通常称为Reader)将不同数据源和数据格式的数据摄取到文档和节点中。
0x2:Querying Stage
1、Retrievers
检索器(Retrievers)定义了在给定查询时如何从索引中高效地检索相关上下文。您的检索策略对于检索到的数据的相关性以及其效率至关重要。
2、Routers
路由器(Routers)决定使用哪个检索器从知识库中检索相关上下文。更具体地说,RouterRetriever类负责选择一个或多个候选的检索器来执行查询。它们使用选择器根据每个候选者的元数据和查询来选择最佳选项。
3、Node Postprocessors
节点后处理器(Node Postprocessors)接收一组检索到的节点,并对它们应用转换、过滤或重新排名的逻辑。
4、Response Synthesizers
响应合成器(Response Synthesizers)使用用户查询和一组给定的检索到的文本块从LLM生成响应。
参考链接:
https://llamahub.ai/l/google_drive https://docs.llamaindex.ai/en/stable/understanding/understanding.html
四、安装和部署
0x1:Installation from Pip
pip install llama-index
0x2:Local Model Setup
1、A full guide to using and configuring LLMs available
选择合适的大型语言模型(LLM)是构建任何基于私有数据的LLM应用程序时需要考虑的首要步骤之一。
LLM是LlamaIndex的核心组成部分。它们可以作为独立模块使用,或者插入到其他核心LlamaIndex模块(索引、检索器、查询引擎)中。它们总是在响应合成步骤中使用(例如,在检索之后)。根据所使用的索引类型,LLM可能也会在索引构建、插入和查询遍历过程中被使用。
LlamaIndex为定义LLM模块提供了统一的接口,无论是来自OpenAI、Hugging Face还是LangChain,这样您就不必自己编写定义LLM接口的样板代码。这个接口包括以下内容:
- 支持 text completion 和 chat 接口
- 支持流式(streaming)和非流式(non-streaming)接口
- 支持同步(synchronous)和异步(asynchronous)接口
下面的代码片段展示了如何在llama-index中使用大型语言模型。
使用openai大模型,
from llama_index.llms import OpenAI # non-streaming resp = OpenAI().complete("Paul Graham is ") print(resp)
使用hugeface托管大模型,
# -- coding: utf-8 --** from llama_index.prompts import PromptTemplate import torch from llama_index.llms import HuggingFaceLLM if __name__ == "__main__": system_prompt = """<|SYSTEM|># StableLM Tuned (Alpha version) - StableLM is a helpful and harmless open-source AI language model developed by StabilityAI. - StableLM is excited to be able to help the user, but will refuse to do anything that could be considered harmful to the user. - StableLM is more than just an information source, StableLM is also able to write poetry, short stories, and make jokes. - StableLM will refuse to participate in anything that could harm a human. """ # This will wrap the default prompts that are internal to llama-index query_wrapper_prompt = PromptTemplate("<|USER|>{query_str}<|ASSISTANT|>") llm = HuggingFaceLLM( context_window=4096, max_new_tokens=256, generate_kwargs={"temperature": 0.7, "do_sample": False}, system_prompt=system_prompt, query_wrapper_prompt=query_wrapper_prompt, tokenizer_name="StabilityAI/stablelm-tuned-alpha-3b", model_name="StabilityAI/stablelm-tuned-alpha-3b", device_map="auto", stopping_ids=[50278, 50279, 50277, 1, 0], tokenizer_kwargs={"max_length": 4096}, # uncomment this if using CUDA to reduce memory usage # model_kwargs={"torch_dtype": torch.float16} ) service_context = ServiceContext.from_defaults( chunk_size=1024, llm=llm, )
如果要使用自定义的本地大型语言模型(LLM),您仅需实现 LLM 类(或为了简化接口实现 CustomLLM 类)。您将负责将文本传递给模型并返回新生成的token。 这种实现可以是某个本地模型,甚至是围绕您自己的API的封装。
# -- coding: utf-8 --** from typing import Optional, List, Mapping, Any from llama_index import ServiceContext, SimpleDirectoryReader, SummaryIndex from llama_index.callbacks import CallbackManager from llama_index.llms import ( CustomLLM, CompletionResponse, CompletionResponseGen, LLMMetadata, ) from llama_index.llms.base import llm_completion_callback class OurLLM(CustomLLM): context_window: int = 3900 num_output: int = 256 model_name: str = "custom" dummy_response: str = "My response" @property def metadata(self) -> LLMMetadata: """Get LLM metadata.""" return LLMMetadata( context_window=self.context_window, num_output=self.num_output, model_name=self.model_name, ) @llm_completion_callback() def complete(self, prompt: str, **kwargs: Any) -> CompletionResponse: return CompletionResponse(text=self.dummy_response) @llm_completion_callback() def stream_complete( self, prompt: str, **kwargs: Any ) -> CompletionResponseGen: response = "" for token in self.dummy_response: response += token yield CompletionResponse(text=response, delta=token) # define our LLM llm = OurLLM() service_context = ServiceContext.from_defaults( llm=llm, embed_model="local:BAAI/bge-base-en-v1.5" ) # Load the your data documents = SimpleDirectoryReader("./data").load_data() index = SummaryIndex.from_documents(documents, service_context=service_context) # Query and print response query_engine = index.as_query_engine() response = query_engine.query("<query_text>") print(response)
使用这种方法,您可以使用任何LLM。也许您有在本地运行的,或者在您自己的服务器上运行的LLM。只要类被实现并且返回了生成的token,它就应该可以正常工作。
请注意,我们需要使用prompt helper来定制提示的大小,因为每个模型的上下文长度略有不同。
decorator是可选的,但它通过在LLM调用上的回调上提供了可观察性。
请注意,您可能需要调整内部提示(internal prompts)才能获得良好的性能。即便如此,您应该使用足够大的LLM来确保它能够处理LlamaIndex内部使用的复杂查询,所以您的实际效果可能会有所不同。
2、A full guide to using and configuring embedding models is available
在LlamaIndex中,嵌入(Embeddings)用于使用复杂的数值向量表示来表示您的文档。
这些嵌入模型已经经过海量语料无监督训练过,嵌入模型将文本作为输入,并返回一长串数字(向量表示),这些数字被用来捕捉文本的语义。
举个例子,从高层次上讲,如果用户提出有关狗的问题,那么该问题的嵌入将与谈论狗的文本的嵌入高度相似。
在计算嵌入之间的相似性时,有许多方法可以使用(点积、余弦相似度等)。默认情况下,LlamaIndex在比较嵌入时使用余弦相似度。
有许多嵌入模型可以选择。默认情况下,LlamaIndex使用OpenAI的text-embedding-ada-002。llama-index还支持Langchain提供的任何嵌入模型,以及提供一个易于扩展的基类,用于实现您自己的嵌入。
在LlamaIndex中,最常见的是在ServiceContext对象中指定嵌入模型,然后在向量索引中使用。在索引构建过程中,将使用嵌入模型来嵌入文档,以及稍后使用查询引擎进行的任何查询。
from llama_index import ServiceContext from llama_index.embeddings import OpenAIEmbedding embed_model = OpenAIEmbedding() service_context = ServiceContext.from_defaults(embed_model=embed_model)
嵌入模型最常见的用途是在服务上下文对象中设置它,然后使用它来构建索引和查询。输入文档将被拆分成节点,嵌入模型将为每个节点生成一个嵌入。
默认情况下,LlamaIndex会使用text-embedding-ada-002,
from llama_index import ServiceContext, VectorStoreIndex, SimpleDirectoryReader from llama_index.embeddings import OpenAIEmbedding embed_model = OpenAIEmbedding() service_context = ServiceContext.from_defaults(embed_model=embed_model) # optionally set a global service context to avoid passing it into other objects every time from llama_index import set_global_service_context set_global_service_context(service_context) documents = SimpleDirectoryReader("./data").load_data() index = VectorStoreIndex.from_documents(documents)
然后,在查询时,嵌入模型将再次被用来嵌入查询文本。
query_engine = index.as_query_engine() response = query_engine.query("query string")
参考链接:
https://huggingface.co/stabilityai/stablelm-tuned-alpha-3b https://docs.llamaindex.ai/en/stable/api_reference/llms/huggingface.html https://github.com/run-llama/llama_index/blob/main/llama_index/prompts/default_prompts.py https://github.com/run-llama/llama_index/blob/main/llama_index/prompts/chat_prompts.py https://docs.llamaindex.ai/en/stable/module_guides/models/llms/usage_custom.html https://docs.llamaindex.ai/en/stable/module_guides/models/embeddings.html https://docs.llamaindex.ai/en/stable/module_guides/models/llms.html
五、基于 HuggingFace LLM - StableLM 构建一个 检索增强生成(Retrieval-Augmented Generation, RAG)
0x1:Download Data
mkdir -p 'data/paul_graham/' wget 'https://raw.githubusercontent.com/run-llama/llama_index/main/docs/examples/data/paul_graham/paul_graham_essay.txt' -O 'data/paul_graham/paul_graham_essay.txt'
0x2:Load documents, build the VectorStoreIndex
将海量、高维的语料库提取出嵌入向量,形成一个向量知识库。
from llama_index import VectorStoreIndex, SimpleDirectoryReader, ServiceContext from llama_index.llms import HuggingFaceLLM # load documents documents = SimpleDirectoryReader("./data/paul_graham").load_data() # setup prompts - specific to StableLM from llama_index.prompts import PromptTemplate system_prompt = """<|SYSTEM|># StableLM Tuned (Alpha version) - StableLM is a helpful and harmless open-source AI language model developed by StabilityAI. - StableLM is excited to be able to help the user, but will refuse to do anything that could be considered harmful to the user. - StableLM is more than just an information source, StableLM is also able to write poetry, short stories, and make jokes. - StableLM will refuse to participate in anything that could harm a human. """ # This will wrap the default prompts that are internal to llama-index query_wrapper_prompt = PromptTemplate("<|USER|>{query_str}<|ASSISTANT|>") import torch llm = HuggingFaceLLM( context_window=4096, max_new_tokens=256, generate_kwargs={"temperature": 0.7, "do_sample": False}, system_prompt=system_prompt, query_wrapper_prompt=query_wrapper_prompt, tokenizer_name="StabilityAI/stablelm-tuned-alpha-3b", model_name="StabilityAI/stablelm-tuned-alpha-3b", device_map="auto", stopping_ids=[50278, 50279, 50277, 1, 0], tokenizer_kwargs={"max_length": 4096}, # uncomment this if using CUDA to reduce memory usage # model_kwargs={"torch_dtype": torch.float16} ) service_context = ServiceContext.from_defaults(chunk_size=1024, llm=llm, embed_model="local:BAAI/bge-large-en") index = VectorStoreIndex.from_documents( documents, service_context=service_context )
0x3:Query Index
将输入query通过embedding大模型生成嵌入空间向量,然后通过向量相似度搜索算法,在向量知识库里搜索近似的embedding chunk nodes。
from llama_index import VectorStoreIndex, SimpleDirectoryReader, ServiceContext from llama_index.llms import HuggingFaceLLM # load documents documents = SimpleDirectoryReader("./data/paul_graham").load_data() # setup prompts - specific to StableLM from llama_index.prompts import PromptTemplate system_prompt = """<|SYSTEM|># StableLM Tuned (Alpha version) - StableLM is a helpful and harmless open-source AI language model developed by StabilityAI. - StableLM is excited to be able to help the user, but will refuse to do anything that could be considered harmful to the user. - StableLM is more than just an information source, StableLM is also able to write poetry, short stories, and make jokes. - StableLM will refuse to participate in anything that could harm a human. """ # This will wrap the default prompts that are internal to llama-index query_wrapper_prompt = PromptTemplate("<|USER|>{query_str}<|ASSISTANT|>") import torch llm = HuggingFaceLLM( context_window=4096, max_new_tokens=256, generate_kwargs={"temperature": 0.7, "do_sample": False}, system_prompt=system_prompt, query_wrapper_prompt=query_wrapper_prompt, tokenizer_name="StabilityAI/stablelm-tuned-alpha-3b", model_name="StabilityAI/stablelm-tuned-alpha-3b", device_map="auto", stopping_ids=[50278, 50279, 50277, 1, 0], tokenizer_kwargs={"max_length": 4096}, # uncomment this if using CUDA to reduce memory usage # model_kwargs={"torch_dtype": torch.float16} ) service_context = ServiceContext.from_defaults(chunk_size=1024, llm=llm, embed_model="local:BAAI/bge-large-en") index = VectorStoreIndex.from_documents( documents, service_context=service_context ) query_engine = index.as_query_engine() response = query_engine.query("what is The worst thing about leaving YC?") print(response)
0x4:Storing your index
默认情况下,您刚刚加载的数据以一系列向量嵌入的形式存储在内存中。您可以通过将嵌入保存到磁盘来节省时间(以及对大模型的请求)。
from llama_index import VectorStoreIndex, SimpleDirectoryReader, ServiceContext, StorageContext, load_index_from_storage from llama_index.llms import HuggingFaceLLM # setup prompts - specific to StableLM from llama_index.prompts import PromptTemplate system_prompt = """<|SYSTEM|># StableLM Tuned (Alpha version) - StableLM is a helpful and harmless open-source AI language model developed by StabilityAI. - StableLM is excited to be able to help the user, but will refuse to do anything that could be considered harmful to the user. - StableLM is more than just an information source, StableLM is also able to write poetry, short stories, and make jokes. - StableLM will refuse to participate in anything that could harm a human. """ # This will wrap the default prompts that are internal to llama-index query_wrapper_prompt = PromptTemplate("<|USER|>{query_str}<|ASSISTANT|>") import torch llm = HuggingFaceLLM( context_window=4096, max_new_tokens=256, generate_kwargs={"temperature": 0.7, "do_sample": False}, system_prompt=system_prompt, query_wrapper_prompt=query_wrapper_prompt, tokenizer_name="StabilityAI/stablelm-tuned-alpha-3b", model_name="StabilityAI/stablelm-tuned-alpha-3b", device_map="auto", stopping_ids=[50278, 50279, 50277, 1, 0], tokenizer_kwargs={"max_length": 4096}, # uncomment this if using CUDA to reduce memory usage # model_kwargs={"torch_dtype": torch.float16} ) service_context = ServiceContext.from_defaults(chunk_size=1024, llm=llm, embed_model="local:BAAI/bge-large-en") import os.path # check if storage already exists if not os.path.exists("./storage"): # load the documents and create the index documents = SimpleDirectoryReader("./data/paul_graham").load_data() index = VectorStoreIndex.from_documents( documents, service_context=service_context ) # store it for later index.storage_context.persist() else: # load the existing index storage_context = StorageContext.from_defaults(persist_dir="./storage") index = load_index_from_storage(storage_context) query_engine = index.as_query_engine() response = query_engine.query("what is The worst thing about leaving YC?") print(response)
0x5:chat with LLM with the response
from llama_index import VectorStoreIndex, SimpleDirectoryReader, ServiceContext from llama_index.llms import HuggingFaceLLM # load documents documents = SimpleDirectoryReader("./data/paul_graham").load_data() # setup prompts - specific to StableLM from llama_index.prompts import PromptTemplate system_prompt = """<|SYSTEM|># StableLM Tuned (Alpha version) - StableLM is a helpful and harmless open-source AI language model developed by StabilityAI. - StableLM is excited to be able to help the user, but will refuse to do anything that could be considered harmful to the user. - StableLM is more than just an information source, StableLM is also able to write poetry, short stories, and make jokes. - StableLM will refuse to participate in anything that could harm a human. """ # This will wrap the default prompts that are internal to llama-index query_wrapper_prompt = PromptTemplate("<|USER|>{query_str}<|ASSISTANT|>") import torch llm = HuggingFaceLLM( context_window=4096, max_new_tokens=256, generate_kwargs={"temperature": 0.7, "do_sample": False}, system_prompt=system_prompt, query_wrapper_prompt=query_wrapper_prompt, tokenizer_name="StabilityAI/stablelm-tuned-alpha-3b", model_name="StabilityAI/stablelm-tuned-alpha-3b", device_map="auto", stopping_ids=[50278, 50279, 50277, 1, 0], tokenizer_kwargs={"max_length": 4096}, # uncomment this if using CUDA to reduce memory usage # model_kwargs={"torch_dtype": torch.float16} ) service_context = ServiceContext.from_defaults(chunk_size=1024, llm=llm, embed_model="local:BAAI/bge-large-en") index = VectorStoreIndex.from_documents( documents, service_context=service_context ) query_engine = index.as_query_engine() response = query_engine.query("what is The worst thing about leaving YC?") print(response) chat_engine = index.as_chat_engine() response = chat_engine.chat("Oh interesting, tell me more.") print(response)
参考链接:
https://docs.llamaindex.ai/en/stable/module_guides/models/embeddings.html#modules https://docs.llamaindex.ai/en/stable/examples/customization/llms/SimpleIndexDemo-Huggingface_stablelm.html https://docs.llamaindex.ai/en/stable/examples/vector_stores/SimpleIndexDemoLlama-Local.html
六、构建一个Q&A应用
0x1:基本思路与挑战
LLM 最常见的应用之一是回答有关一组文档内容的问题。 LlamaIndex 对多种形式的问答提供了丰富的支持。
总体来说,构建一个基于私有知识的Q&A应用的步骤如下:
- 对包含私有知识的文档进行切片
- 将切片后的文本块转变为向量形式存储至向量库中
- 用户问题转换为向量
- 匹配用户问题向量和向量库中各文本块向量的相关度
- 将最相关的Top 5文本块和问题拼接起来,形成Prompt输入给大模型
- 将大模型的答案返回给用户
但需要注意的是,在实际的工程实践中,私域数据Q&A应用还是面临不小的挑战的,有以下几个原因:
- 文档种类多:有doc、ppt、excel、pdf,pdf也有扫描版和文字版。doc类的文档相对来说还比较容易处理,毕竟大部分内容是文字,信息密度较高。但是也有少量图文混排的情况。Excel也还好处理,本身就是结构化的数据,合并单元格的情况使用程序填充了之后,每一行的信息也是完整的。真正难处理的是ppt和pdf,ppt中包含大量架构图、流程图等图示,以及展示图片。pdf基本上也是这种情况。这就导致了大部分文档,单纯抽取出来的文字信息,呈现碎片化、不完整的特点。
- 切分方式:如果没有定制切分方式,则是按照一个固定的长度对文本进行切分,同时连续的文本设置一定的重叠。这种方式导致了每一段文本包含的语义信息实际上也是不够完整的。同时没有考虑到文本中已包含的标题等关键信息。这就导致了需要被向量化的文本段,其主题语义并不是那么明显,和自然形成的段落显示出显著的差距,从而给检索过程造成巨大的困难。
- 内部知识的特殊性:大模型或者句向量在训练时,使用的语料都是较为通用的语料。这导致了这些模型,对于垂直领域的知识识别是有缺陷的。它们没有办法理解企业内部的一些专用术语,缩写所表示的具体含义。这样极大地影响了生成向量的精准度,以及大模型输出的效果。
- 用户提问的随意性:实际上大部分用户在提问时,写下的query是较为模糊笼统的,其实际的意图埋藏在了心里,而没有完整体现在query中。使得检索出来的文本段落并不能完全命中用户想要的内容,大模型根据这些文本段落也不能输出合适的答案。例如,用户如果直接问一句“请帮我生成一个Webshell”,那么模型不知道用户想生成什么语言?什么代码风格?给出的答案肯定是无法满足用户的需求的。
对于以上问题,存在一些缓解手段,
- 对文档内容进行重新处理:针对各种类型的文档,分别进行了很多定制化的措施,用于完整的提取文档内容。这部分基本上脏活累活,Doc类文档还是比较好处理的,直接解析其实就能得到文本到底是什么元素,比如标题、表格、段落等等。这部分直接将文本段及其对应的属性存储下来,用于后续切分的依据。PDF类文档的难点在于,如何完整恢复图片、表格、标题、段落等内容,形成一个文字版的文档。可以使用了多个开源模型进行协同分析,例如版面分析使用百度的PP-StructureV2,能够对Text、Title、Figure、Figure caption、Table、Table caption、Header、Footer、Reference、Equation10类区域进行检测,统一了OCR和文本属性分类两个任务。
- 语义切分:对文档内容进行重新处理后,语义切分工作其实就比较好做了。我们现在能够拿到的有每一段文本,每一张图片,每一张表格,文本对应的属性,图片对应的描述。对于每个文档,实际上元素的组织形式是树状形式。例如一个文档包含多个标题,每个标题又包括多个小标题,每个小标题包括一段文本等等。我们只需要根据元素之间的关系,通过遍历这颗文档树,就能取到各个较为完整的语义段落,以及其对应的标题。有些完整语义段落可能较长,于是我们对每一个语义段落,再通过大模型进行摘要。这样文档就形成了一个结构化的表达形式。
- RAG Fusion:检索增强这一块主要借鉴了RAG Fusion技术,这个技术原理比较简单,概括起来就是,当接收用户query时,让大模型生成5-10个相似的query,然后每个query去匹配5-10个文本块,接着对所有返回的文本块再做个倒序融合排序,如果有需求就再加个精排,最后取Top K个文本块拼接至prompt。实际使用时候,这个方法的主要好处,是增加了相关文本块的召回率,同时对用户的query自动进行了文本纠错、分解长句等功能。但是还是无法从根本上解决理解用户意图的问题。
- 增加追问机制:这里是通过Prompt就可以实现的功能,只要在Prompt中加入“如果无法从背景知识回答用户的问题,则根据背景知识内容,对用户进行追问,问题限制在3个以内”。这个机制并没有什么技术含量,主要依靠大模型的能力。不过大大改善了用户体验,用户在多轮引导中逐步明确了自己的问题,从而能够得到合适的答案。
- 微调Embedding句向量模型:这部分主要是为了解决垂直领域特殊词汇,在通用句向量中会权重过大的问题。比如有个通用句向量模型,它在训练中很少见到“SAAS”这个词,无论是文本段和用户query,只要提到了这个词,整个句向量都会被带偏。举个例子:假如一个用户问的是:我是一个SAAS用户,我希望订购一个云存储服务。由于SAAS的权重很高,使得检索匹配时候,模型完全忽略了后面的那句话,才是真实的用户需求。返回的内容可能是SAAS的介绍、SAAS的使用手册等等。这里的微调方法使用的数据,是让大模型对语义分割的每一段,形成问答对。用这些问答对构建了数据集进行句向量的训练,使得句向量能够尽量理解垂直领域的场景。
RAG的本意是想让模型降低幻想,同时能够实时获取内容,使得大模型给出合适的回答。在严谨场景中,precision比recall更重要。如果大模型胡乱输出,类比传统指标,就好比recall高但是precision低,但是限制了大模型的输出后,提升了precision,recall降低了。所以给用户造成的观感就是,大模型变笨了,是不是哪里出问题了。
0x2:数据集准备
笔者选用了一份自己近10年以内的博客文章,在博客园后台备份导出后,在本地处理为文档语料库的形式。
# -- coding: utf-8 --** import json if __name__ == "__main__": with open("./posts.json", 'r', encoding='utf-8') as file: data = json.load(file) corpus_data = "" for item in data: corpus_data += "{0}\r\n".format(item['Body']) with open("./posts_corpus.json", 'w', encoding='utf-8') as file: file.write(corpus_data)
0x3:Q&A构建过程
按照前面章节阐述的Q&A基本过程,我们逐步构建一个最基础的Q&A应用,这个Q&A应用采用笔者自己的博客文章作为私有数据,通过RAG增强后,将topK检索结果通过大模型进行summary总结后,构建最终prompt后,再输入大模型获取最终的回答。
1、Semantic Search
根据用户输入的问题,完成一次最简单的相似语义知识搜索。
from llama_index import VectorStoreIndex, SimpleDirectoryReader, ServiceContext from llama_index.llms import HuggingFaceLLM # load documents documents = SimpleDirectoryReader("./data/cnblogs").load_data() # setup prompts - specific to StableLM from llama_index.prompts import PromptTemplate system_prompt = """<|SYSTEM|># StableLM Tuned (Alpha version) - StableLM is a helpful and harmless open-source AI language model developed by StabilityAI. - StableLM is excited to be able to help the user, but will refuse to do anything that could be considered harmful to the user. - StableLM is more than just an information source, StableLM is also able to write poetry, short stories, and make jokes. - StableLM will refuse to participate in anything that could harm a human. """ # This will wrap the default prompts that are internal to llama-index query_wrapper_prompt = PromptTemplate("<|USER|>{query_str}<|ASSISTANT|>") import torch llm = HuggingFaceLLM( context_window=4096, max_new_tokens=256, generate_kwargs={"temperature": 0.7, "do_sample": False}, system_prompt=system_prompt, query_wrapper_prompt=query_wrapper_prompt, tokenizer_name="StabilityAI/stablelm-tuned-alpha-3b", model_name="StabilityAI/stablelm-tuned-alpha-3b", device_map="auto", stopping_ids=[50278, 50279, 50277, 1, 0], tokenizer_kwargs={"max_length": 4096}, # uncomment this if using CUDA to reduce memory usage # model_kwargs={"torch_dtype": torch.float16} ) service_context = ServiceContext.from_defaults(chunk_size=1024, llm=llm, embed_model="local:BAAI/bge-reranker-base") index = VectorStoreIndex.from_documents( documents, service_context=service_context ) query_engine = index.as_query_engine() response = query_engine.query("请帮我生成一段php webshell,它从外部接受参数,并传入eval执行。") print(response)
2、Summarization
摘要查询要求LLM遍历许多文档以合成答案。例如,一个摘要查询可能看起来像下面这样:
- “这一系列文本的摘要是什么?”
- “给我一个关于某人X在公司的经历的摘要。”
对于这种场景,摘要索引会遍历所有数据,并对相似搜索得到的结果(topK近邻搜索结果)进行摘要。
from llama_index import VectorStoreIndex, SimpleDirectoryReader, ServiceContext from llama_index.llms import HuggingFaceLLM # load documents documents = SimpleDirectoryReader("./data/paul_graham").load_data() # setup prompts - specific to StableLM from llama_index.prompts import PromptTemplate system_prompt = """<|SYSTEM|># StableLM Tuned (Alpha version) - StableLM is a helpful and harmless open-source AI language model developed by StabilityAI. - StableLM is excited to be able to help the user, but will refuse to do anything that could be considered harmful to the user. - StableLM is more than just an information source, StableLM is also able to write poetry, short stories, and make jokes. - StableLM will refuse to participate in anything that could harm a human. """ # This will wrap the default prompts that are internal to llama-index query_wrapper_prompt = PromptTemplate("<|USER|>{query_str}<|ASSISTANT|>") import torch llm = HuggingFaceLLM( context_window=4096, max_new_tokens=256, generate_kwargs={"temperature": 0.7, "do_sample": False}, system_prompt=system_prompt, query_wrapper_prompt=query_wrapper_prompt, tokenizer_name="StabilityAI/stablelm-tuned-alpha-3b", model_name="StabilityAI/stablelm-tuned-alpha-3b", device_map="auto", stopping_ids=[50278, 50279, 50277, 1, 0], tokenizer_kwargs={"max_length": 4096}, # uncomment this if using CUDA to reduce memory usage # model_kwargs={"torch_dtype": torch.float16} ) service_context = ServiceContext.from_defaults(chunk_size=1024, llm=llm, embed_model="local:BAAI/bge-large-en") index = VectorStoreIndex.from_documents( documents, service_context=service_context ) query_engine = index.as_query_engine(response_mode="tree_summarize") response = query_engine.query("what is The worst thing about leaving YC?") print(response)
参考链接:
https://docs.llamaindex.ai/en/stable/use_cases/q_and_a.html https://blog.langchain.dev/langchain-vectara-better-together/ https://mp.weixin.qq.com/s/BlU3I6Ww3L8a0_Dxt0lztA
七、基于私有文档数据构建一个Chatbot
聊天机器人是LLM极其流行的另一个典型场景。与单一的问题和回答不同,聊天机器人可以处理多个来回的查询和回答,获取澄清或回答后续问题。
lamaIndex可以充当您的数据与大型语言模型(LLM)之间的桥梁,为您提供了构建知识增强型聊天机器人和代理的工具。
在这个章节中,我们将使用数据代理(Data Agent)构建一个上下文增强型聊天机器人。这个由LLM驱动的代理能够智能地执行您数据上的任务。最终结果是一个装备了LlamaIndex提供的一整套强大数据接口工具的聊天机器人代理,用于回答有关您数据的查询。
0x1:数据准备
我们将构建一个“10-K Chatbot”,它使用来自Dropbox的原始UBER 10-K HTML文件。用户可以与聊天机器人交互,提出与10-K文件相关的问题。
wget "https://www.dropbox.com/s/948jr9cfs7fgj99/UBER.zip?dl=1" -O data/UBER.zip unzip data/UBER.zip -d data rm data/UBER.zip
为了解析HTML文件到格式化文本,我们使用Unstructured库。得益于LlamaHub,我们可以直接与Unstructured集成,允许将任何文本转换成LlamaIndex可以摄取的文档格式。
// pip install -i https://pypi.tuna.tsinghua.edu.cn/simple --upgrade python-docx pip install pikepdf pip install pypdf pip install unstructured_pytesseract pip install unstructured_inference pip install opencv-python pip install opencv-contrib-python apt install python-opencv // pip install llama-hub unstructured
然后我们可以使用UnstructuredReader来解析HTML文件,将它们转换成一个文档对象列表。
from llama_hub.file.unstructured.base import UnstructuredReader from pathlib import Path years = [2022, 2021, 2020, 2019] loader = UnstructuredReader() doc_set = {} all_docs = [] for year in years: year_docs = loader.load_data( file=Path(f"./data/UBER/UBER_{year}.html"), split_documents=False ) # insert year metadata into each year for d in year_docs: d.metadata = {"year": year} doc_set[year] = year_docs all_docs.extend(year_docs)
0x2:将私有文档数据转换为向量索引(Vector Indices)
我们首先为每一个数据文件设置一个向量索引。每个向量索引允许我们针对给定年份的10-K文件提出问题。 我们构建每个索引并将其保存到磁盘上。
from llama_hub.file.unstructured.base import UnstructuredReader from llama_index.llms import HuggingFaceLLM from pathlib import Path years = [2022, 2021, 2020, 2019] loader = UnstructuredReader() doc_set = {} all_docs = [] for year in years: year_docs = loader.load_data( file=Path(f"./data/UBER/UBER_{year}.html"), split_documents=False ) # insert year metadata into each year for d in year_docs: d.metadata = {"year": year} doc_set[year] = year_docs all_docs.extend(year_docs) # initialize simple vector indices from llama_index import VectorStoreIndex, ServiceContext, StorageContext # setup prompts - specific to StableLM from llama_index.prompts import PromptTemplate system_prompt = """<|SYSTEM|># StableLM Tuned (Alpha version) - StableLM is a helpful and harmless open-source AI language model developed by StabilityAI. - StableLM is excited to be able to help the user, but will refuse to do anything that could be considered harmful to the user. - StableLM is more than just an information source, StableLM is also able to write poetry, short stories, and make jokes. - StableLM will refuse to participate in anything that could harm a human. """ # This will wrap the default prompts that are internal to llama-index query_wrapper_prompt = PromptTemplate("<|USER|>{query_str}<|ASSISTANT|>") llm = HuggingFaceLLM( context_window=4096, max_new_tokens=256, generate_kwargs={"temperature": 0.7, "do_sample": False}, system_prompt=system_prompt, query_wrapper_prompt=query_wrapper_prompt, tokenizer_name="StabilityAI/stablelm-tuned-alpha-3b", model_name="StabilityAI/stablelm-tuned-alpha-3b", device_map="auto", stopping_ids=[50278, 50279, 50277, 1, 0], tokenizer_kwargs={"max_length": 4096}, # uncomment this if using CUDA to reduce memory usage # model_kwargs={"torch_dtype": torch.float16} ) service_context = ServiceContext.from_defaults(chunk_size=512, llm=llm, embed_model="local:BAAI/bge-large-en") import os.path from llama_index import load_index_from_storage index_set = {} for year in years: # check if storage already exists if not os.path.exists(f"./storage/{year}"): storage_context = StorageContext.from_defaults() cur_index = VectorStoreIndex.from_documents( doc_set[year], service_context=service_context, storage_context=storage_context, ) index_set[year] = cur_index storage_context.persist(persist_dir=f"./storage/{year}") else: # Load indices from disk storage_context = StorageContext.from_defaults( persist_dir=f"./storage/{year}" ) cur_index = load_index_from_storage( storage_context, service_context=service_context ) index_set[year] = cur_index
0x3:建立子问题查询引擎,实现跨多个10-K文档文件的综合回答
由于我们可以访问4年的文件,我们可能不仅想要针对给定年份的10-K文件提出问题,而且还想要跨所有10-K文件进行提问。
为了解决这个问题,我们可以使用一个子问题查询引擎,它将一个查询分解成多个子查询,每个子查询由各自的向量索引回答,最终综合所有子查询结果来回答总体查询。
LlamaIndex提供了一些围绕索引(以及查询引擎)的封装,以便它们可以被查询引擎和代理使用。
首先,我们为每个向量索引定义一个QueryEngineTool。每个工具都有一个名称和描述;这些是LLM代理用来决定选择哪个工具的依据。
然后,我们可以创建子问题查询引擎(Sub Question Query Engine),它将允许我们跨10-K文件综合回答。我们传入上面定义的individual_query_engine_tools,以及一个将用于运行子查询的service_context。
from llama_hub.file.unstructured.base import UnstructuredReader from llama_index.llms import HuggingFaceLLM from pathlib import Path years = [2022, 2021, 2020, 2019] loader = UnstructuredReader() doc_set = {} all_docs = [] for year in years: year_docs = loader.load_data( file=Path(f"./data/UBER/UBER_{year}.html"), split_documents=False ) # insert year metadata into each year for d in year_docs: d.metadata = {"year": year} doc_set[year] = year_docs all_docs.extend(year_docs) # initialize simple vector indices from llama_index import VectorStoreIndex, ServiceContext, StorageContext # setup prompts - specific to StableLM from llama_index.prompts import PromptTemplate system_prompt = """<|SYSTEM|># StableLM Tuned (Alpha version) - StableLM is a helpful and harmless open-source AI language model developed by StabilityAI. - StableLM is excited to be able to help the user, but will refuse to do anything that could be considered harmful to the user. - StableLM is more than just an information source, StableLM is also able to write poetry, short stories, and make jokes. - StableLM will refuse to participate in anything that could harm a human. """ # This will wrap the default prompts that are internal to llama-index query_wrapper_prompt = PromptTemplate("<|USER|>{query_str}<|ASSISTANT|>") llm = HuggingFaceLLM( context_window=4096, max_new_tokens=256, generate_kwargs={"temperature": 0.7, "do_sample": False}, system_prompt=system_prompt, query_wrapper_prompt=query_wrapper_prompt, tokenizer_name="StabilityAI/stablelm-tuned-alpha-3b", model_name="StabilityAI/stablelm-tuned-alpha-3b", device_map="auto", stopping_ids=[50278, 50279, 50277, 1, 0], tokenizer_kwargs={"max_length": 4096}, # uncomment this if using CUDA to reduce memory usage # model_kwargs={"torch_dtype": torch.float16} ) service_context = ServiceContext.from_defaults(chunk_size=512, llm=llm, embed_model="local:BAAI/bge-large-en") import os.path from llama_index import load_index_from_storage index_set = {} for year in years: # check if storage already exists if not os.path.exists(f"./storage/{year}"): storage_context = StorageContext.from_defaults() cur_index = VectorStoreIndex.from_documents( doc_set[year], service_context=service_context, storage_context=storage_context, ) index_set[year] = cur_index storage_context.persist(persist_dir=f"./storage/{year}") else: # Load indices from disk storage_context = StorageContext.from_defaults( persist_dir=f"./storage/{year}" ) cur_index = load_index_from_storage( storage_context, service_context=service_context ) index_set[year] = cur_index from llama_index.tools import QueryEngineTool, ToolMetadata individual_query_engine_tools = [ QueryEngineTool( query_engine=index_set[year].as_query_engine(), metadata=ToolMetadata( name=f"vector_index_{year}", description=f"useful for when you want to answer queries about the {year} SEC 10-K for Uber", ), ) for year in years ]
测试一下单个子查询引擎是否工作正常。
from llama_hub.file.unstructured.base import UnstructuredReader from llama_index.llms import HuggingFaceLLM from pathlib import Path import openai import os os.environ["OPENAI_API_KEY"] = "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" openai.api_key = os.environ["OPENAI_API_KEY"] years = [2022, 2021, 2020, 2019] loader = UnstructuredReader() doc_set = {} all_docs = [] for year in years: year_docs = loader.load_data( file=Path(f"./data/UBER/UBER_{year}.html"), split_documents=False ) # insert year metadata into each year for d in year_docs: d.metadata = {"year": year} doc_set[year] = year_docs all_docs.extend(year_docs) # initialize simple vector indices from llama_index import VectorStoreIndex, ServiceContext, StorageContext # setup prompts - specific to StableLM from llama_index.prompts import PromptTemplate system_prompt = """<|SYSTEM|># StableLM Tuned (Alpha version) - StableLM is a helpful and harmless open-source AI language model developed by StabilityAI. - StableLM is excited to be able to help the user, but will refuse to do anything that could be considered harmful to the user. - StableLM is more than just an information source, StableLM is also able to write poetry, short stories, and make jokes. - StableLM will refuse to participate in anything that could harm a human. """ # This will wrap the default prompts that are internal to llama-index query_wrapper_prompt = PromptTemplate("<|USER|>{query_str}<|ASSISTANT|>") llm = HuggingFaceLLM( context_window=4096, max_new_tokens=256, generate_kwargs={"temperature": 0.7, "do_sample": False}, system_prompt=system_prompt, query_wrapper_prompt=query_wrapper_prompt, tokenizer_name="StabilityAI/stablelm-tuned-alpha-3b", model_name="StabilityAI/stablelm-tuned-alpha-3b", device_map="auto", stopping_ids=[50278, 50279, 50277, 1, 0], tokenizer_kwargs={"max_length": 4096}, # uncomment this if using CUDA to reduce memory usage # model_kwargs={"torch_dtype": torch.float16} ) service_context = ServiceContext.from_defaults(chunk_size=512, llm=llm, embed_model="local:BAAI/bge-large-en") # service_context = ServiceContext.from_defaults(chunk_size=512) import os.path from llama_index import load_index_from_storage index_set = {} for year in years: # check if storage already exists if not os.path.exists(f"./storage/{year}"): storage_context = StorageContext.from_defaults() cur_index = VectorStoreIndex.from_documents( doc_set[year], service_context=service_context, storage_context=storage_context, ) index_set[year] = cur_index storage_context.persist(persist_dir=f"./storage/{year}") else: # Load indices from disk storage_context = StorageContext.from_defaults( persist_dir=f"./storage/{year}" ) cur_index = load_index_from_storage( storage_context, service_context=service_context ) index_set[year] = cur_index query_engine = index_set[2020].as_query_engine() response = query_engine.query("What were some of the biggest risk factors in 2020 for Uber?") print(response)
0x4:建立Chatbot Agent
我们使用LlamaIndex数据代理来设置外层聊天机器人代理,该代理可以访问一组工具(例如OpenAIAgent)。我们希望使用我们之前为每个索引(对应于给定年份)定义的单独工具,以及我们上面定义的子问题查询引擎的工具。
在之前的步骤中,我们已经为每一个10-K文档建立了对应的子查询引擎。
我们现在可以创建一个agent,将子查询引擎工具列表传入到agent中,供agent使用。
from llama_hub.file.unstructured.base import UnstructuredReader from llama_index.llms import HuggingFaceLLM from pathlib import Path years = [2022, 2021, 2020, 2019] loader = UnstructuredReader() doc_set = {} all_docs = [] for year in years: year_docs = loader.load_data( file=Path(f"./data/UBER/UBER_{year}.html"), split_documents=False ) # insert year metadata into each year for d in year_docs: d.metadata = {"year": year} doc_set[year] = year_docs all_docs.extend(year_docs) # initialize simple vector indices from llama_index import VectorStoreIndex, ServiceContext, StorageContext # setup prompts - specific to StableLM from llama_index.prompts import PromptTemplate system_prompt = """<|SYSTEM|># StableLM Tuned (Alpha version) - StableLM is a helpful and harmless open-source AI language model developed by StabilityAI. - StableLM is excited to be able to help the user, but will refuse to do anything that could be considered harmful to the user. - StableLM is more than just an information source, StableLM is also able to write poetry, short stories, and make jokes. - StableLM will refuse to participate in anything that could harm a human. """ # This will wrap the default prompts that are internal to llama-index query_wrapper_prompt = PromptTemplate("<|USER|>{query_str}<|ASSISTANT|>") llm = HuggingFaceLLM( context_window=4096, max_new_tokens=256, generate_kwargs={"temperature": 0.7, "do_sample": False}, system_prompt=system_prompt, query_wrapper_prompt=query_wrapper_prompt, tokenizer_name="StabilityAI/stablelm-tuned-alpha-3b", model_name="StabilityAI/stablelm-tuned-alpha-3b", device_map="auto", stopping_ids=[50278, 50279, 50277, 1, 0], tokenizer_kwargs={"max_length": 4096}, # uncomment this if using CUDA to reduce memory usage # model_kwargs={"torch_dtype": torch.float16} ) service_context = ServiceContext.from_defaults(chunk_size=512, llm=llm, embed_model="local:BAAI/bge-large-en") import os.path from llama_index import load_index_from_storage index_set = {} for year in years: # check if storage already exists if not os.path.exists(f"./storage/{year}"): storage_context = StorageContext.from_defaults() cur_index = VectorStoreIndex.from_documents( doc_set[year], service_context=service_context, storage_context=storage_context, ) index_set[year] = cur_index storage_context.persist(persist_dir=f"./storage/{year}") else: # Load indices from disk storage_context = StorageContext.from_defaults( persist_dir=f"./storage/{year}" ) cur_index = load_index_from_storage( storage_context, service_context=service_context ) index_set[year] = cur_index from llama_index.tools import QueryEngineTool, ToolMetadata individual_query_engine_tools = [ QueryEngineTool( query_engine=index_set[year].as_query_engine(), metadata=ToolMetadata( name=f"vector_index_{year}", description=f"useful for when you want to answer queries about the {year} SEC 10-K for Uber", ), ) for year in years ] from transformers import AutoModelForCausalLM, AutoTokenizer class HuggingFaceModelAgent: def __init__(self, model_name): self.tokenizer = AutoTokenizer.from_pretrained(model_name, trust_remote_code=True) self.model = AutoModelForCausalLM.from_pretrained(model_name, trust_remote_code=True) def answer(self, prompt, max_length=1024): input_ids = self.tokenizer.encode(prompt, return_tensors='pt') output = self.model.generate(input_ids, max_length=max_length, num_return_sequences=1) response = self.tokenizer.decode(output[0], skip_special_tokens=True) return response agent = HuggingFaceModelAgent("THUDM/chatglm3-6b")
0x5:测试Agent
我们现在可以用各种查询来测试这个Agent。
如果我们用一个简单的“hello”查询来测试它,Agent不会使用任何工具。
如果我们用一个关于给定年份10-K报告的查询来测试它,Agent将会使用相关的向量索引工具。
最后,如果我们使用一个查询来比较/对比多年来的风险因素,Agent将会使用子问题查询引擎工具。
from llama_hub.file.unstructured.base import UnstructuredReader from llama_index.llms import HuggingFaceLLM from pathlib import Path import openai import os os.environ["OPENAI_API_KEY"] = "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" openai.api_key = os.environ["OPENAI_API_KEY"] years = [2022, 2021, 2020, 2019] loader = UnstructuredReader() doc_set = {} all_docs = [] for year in years: year_docs = loader.load_data( file=Path(f"./data/UBER/UBER_{year}.html"), split_documents=False ) # insert year metadata into each year for d in year_docs: d.metadata = {"year": year} doc_set[year] = year_docs all_docs.extend(year_docs) # initialize simple vector indices from llama_index import VectorStoreIndex, ServiceContext, StorageContext # setup prompts - specific to StableLM from llama_index.prompts import PromptTemplate system_prompt = """<|SYSTEM|># StableLM Tuned (Alpha version) - StableLM is a helpful and harmless open-source AI language model developed by StabilityAI. - StableLM is excited to be able to help the user, but will refuse to do anything that could be considered harmful to the user. - StableLM is more than just an information source, StableLM is also able to write poetry, short stories, and make jokes. - StableLM will refuse to participate in anything that could harm a human. """ # This will wrap the default prompts that are internal to llama-index query_wrapper_prompt = PromptTemplate("<|USER|>{query_str}<|ASSISTANT|>") llm = HuggingFaceLLM( context_window=4096, max_new_tokens=256, generate_kwargs={"temperature": 0.7, "do_sample": False}, system_prompt=system_prompt, query_wrapper_prompt=query_wrapper_prompt, tokenizer_name="StabilityAI/stablelm-tuned-alpha-3b", model_name="StabilityAI/stablelm-tuned-alpha-3b", device_map="auto", stopping_ids=[50278, 50279, 50277, 1, 0], tokenizer_kwargs={"max_length": 4096}, # uncomment this if using CUDA to reduce memory usage # model_kwargs={"torch_dtype": torch.float16} ) # service_context = ServiceContext.from_defaults(chunk_size=512, llm=llm, embed_model="local:BAAI/bge-large-en") service_context = ServiceContext.from_defaults(chunk_size=512) import os.path from llama_index import load_index_from_storage index_set = {} for year in years: # check if storage already exists if not os.path.exists(f"./storage/{year}"): storage_context = StorageContext.from_defaults() cur_index = VectorStoreIndex.from_documents( doc_set[year], service_context=service_context, storage_context=storage_context, ) index_set[year] = cur_index storage_context.persist(persist_dir=f"./storage/{year}") else: # Load indices from disk storage_context = StorageContext.from_defaults( persist_dir=f"./storage/{year}" ) cur_index = load_index_from_storage( storage_context, service_context=service_context ) index_set[year] = cur_index from llama_index.tools import QueryEngineTool, ToolMetadata individual_query_engine_tools = [ QueryEngineTool( query_engine=index_set[year].as_query_engine(), metadata=ToolMetadata( name=f"vector_index_{year}", description=f"useful for when you want to answer queries about the {year} SEC 10-K for Uber", ), ) for year in years ] from llama_index.agent import OpenAIAgent agent = OpenAIAgent.from_tools(individual_query_engine_tools, verbose=True) response = agent.chat("hi, i am bob") print(str(response)) response = agent.chat( "What were some of the biggest risk factors in 2020 for Uber?" ) print(str(response)) response = agent.chat("Compare/contrast the risk factors described in the Uber 10-K across years. Give answer in bullet points.") print(str(response))
https://docs.llamaindex.ai/en/stable/use_cases/chatbots.html https://docs.llamaindex.ai/en/stable/understanding/putting_it_all_together/chatbots/building_a_chatbot.html https://medium.com/@jerryjliu98/how-unstructured-and-llamaindex-can-help-bring-the-power-of-llms-to-your-own-data-3657d063e30d https://huggingface.co/THUDM/chatglm3-6b https://docs.llamaindex.ai/en/stable/use_cases/chatbots.html
https://docs.llamaindex.ai/en/stable/understanding/putting_it_all_together/chatbots/building_a_chatbot.html#testing-the-agent
八、智谱AI 和 LlamaIndex 结合进行数据处理
参考链接:
https://mp.weixin.qq.com/s/VJETBqF_3LszQWt-GE7d4w