大模型agent开发之文档处理链

新版的langchain弃用了很多链的方法，但是保留了LLMChain链，使用者可以结合自身需求自用开发应用特定的链。本篇基于LLMChain,使用文档加载器，文档分割器，嵌入生成和向量存储，等方法构建问答链。

通过对LLMChain+PromptTemplate+RetrievalQA等进行链路组合实现复杂任务。

代码块

    def douc_process_chain(self,question):
        #加载文档
        global summary
        loader = PyPDFLoader("knowledge/economicist2.pdf")
        doucment = loader.load()
        #分割文档
        text_splitter = RecursiveCharacterTextSplitter(chunk_size=500,chunk_overlap=50)
        texts = text_splitter.split_documents(doucment)

        #嵌入生成与向量存储
        embeddings = LocalEmbedding('sentence-transformers/all-MiniLM-L6-v2')
        vertorstore = Chroma.from_documents(texts, embeddings)
        #问答链
        retriever = vertorstore.as_retriever()
        qa_chain = RetrievalQA.from_chain_type(llm=self.llm, retriever=retriever, return_source_documents=True)
        #提问
        query = "Please summarize the contents of this pdf document"

        result = qa_chain({"query":query})
        #摘要链
        summary_prompt_template = """
        You are a professional document analysis assistant. Please summarize the following text:
        {text}
        summary:
        """
        summary_prompt = PromptTemplate(template=summary_prompt_template,input_variable=["text"])
        summary_chain = LLMChain(llm = self.llm,prompt=summary_prompt)
        # 遍历文档分块进行摘要
        summaries = []
        for text in texts:
            prompt_value = summary_prompt.format(text=text.page_content)
            # 如果 prompt_value 是 StringPromptValue，则转换为字符串
            if isinstance(prompt_value, StringPromptValue):
                prompt_text = prompt_value.to_string()
            else:
                prompt_text = prompt_value
            summary = summary_chain.run({"text": prompt_text})
            summaries.append(summary)
        final_summary = "\n".join(summaries)
        #基于生成的摘要回答问题
        answer_prompt_template = f"""
        You have the following summarized information about the document:{summary}
        
        this about summary content can help you answer this question, answer the following question:{question}
        Answer:        
        """
        answer_prompt = PromptTemplate(template=answer_prompt_template,input_variable=["summary","question"])
        answer_chain = LLMChain(llm=self.llm,prompt = answer_prompt)
        final_answer = answer_chain.run({"summary":final_summary,"question":question})
        return final_answer

利用langchain的一些特性还能实行更多特定链的功能，比生成一篇文章的链，在这个链的实现上总共分为四个步骤1，格式化输出、2.文本生成，3.事件管理，4.输出控制。

langchain提供方法CallbackManagerForChainRun可以管理在链执行过程中触发的回调事件有调试监控和集成的功能。

代码如下

class Special_Chains_template(Chain):
    """开发一个文章生成器"""
    prompt: BasePromptTemplate
    llm: LocalLLM
    out_key: str = "economic"

    @property
    def input_keys(self) -> List[str]:
        """返回 prompt 所需要的所有 key"""
        return self.prompt.input_variables

    @property
    def output_keys(self) -> List[str]:
        """将始终返回 economic 键"""
        return [self.out_key]

    def _call(self, inputs: Dict[str, Any], run_manager: Optional[CallbackManagerForChainRun] = None) -> Dict[str, Any]:
        """运行链"""
        # 使用 prompt 格式化输入内容
        prompt_value = self.prompt.format(**inputs)

        # 调用 LocalLLM 的 invoke 方法来生成响应文本
        response_text = self.llm.invoke(prompt_value)

        # 如果有 run_manager，记录事件
        if run_manager:
            run_manager.on_text("YAli article is written")

        # 返回生成的文本作为字典，键为 out_key
        return {self.out_key: response_text}

    @property
    def _chain_type(self) -> str:
        """链类型"""
        return "YAli_article_chain"