智能体Agent-书生浦语大模型实战营学习笔记6&大语言模型10

大语言模型学习：10.智能体Agent

书生浦语大模型实战营学习笔记6

两个月之后的6月13日，特意来重写这部分的内容。之前关于智能体介绍得太粗糙了。这次修订主要是把整体逻辑换成吴恩达的逻辑，更多参考了deeplearning.ai上面的内容

定义

我们将智能体Agent定义为具有感知、规划、行动3种特性的东西，如下图：

类似人类「做事情」的过程，Agent的核心功能，可以归纳为三个步骤的循环：感知(Perception)、规划(Planning)和行动(Action)。感知(Perception)是指Agent从环境中收集信息并从中提取相关知识的能力，规划(Planning)是指Agent为了某一目标而作出的决策过程，行动(Action)是指基于环境和规划做出的动作。其中，Policy是Agent做出Action的核心决策，而行动又通过观察（Observation）成为进一步Perception的前提和基础，形成自主地闭环学习过程。

组成

设计模式

这里的设计模式，指的是构成Agent的基础模式。根据这些模式的排列组合，会得到各种不同的智能体范式。我们将在下一节讲这些范式。

根据吴恩达的总结，设计模式共有4种：

Reflection 反思：LLM自己检查自己的输出，并改进自己的输出
Tool Use 使用工具：LLM可以使用如网络搜索、代码执行等工具收集信息、采取行动或处理数据。
Planning 规划：LLM提出并执行一个包含多个步骤的计划来实现一个目标
Multi-agent collaboration 多智能体协作：多个智能体一起工作，分解任务，讨论想法，以提出比单个智能体更好的解决方案。

Reflection 反思

举个例子：当你请LLM写代码，然后它给出了个有bug的代码时，你通常会告诉LLM代码第几行执行以后会报错，并请它修改。这时，LLM就会根据你的提示反思自己的输出，并给出修正后的代码。

Tool Use 使用工具

这意味着给LLM使用工具的能力。当你请Agent进行数学计算时，它可能会先写出能解决这个问题的代码，再通过调用工具执行代码，根据代码执行的输出得到最后的输出。

Planning 规划

许多任务无法通过一个步骤就完成，但是Agent可以决定采取哪些步骤。我们来看论文“HuggingGPT: Solving AI Tasks with ChatGPT and its Friends in Hugging Face,” Shen et al. (2023)中的例子：如果你想让一个Agent根据一张男孩的照片画一张女孩的照片，并要求生成的图片与原图姿势相同，那么这个任务可能被分解成：

检测男孩照片中的姿势
画一张被检测姿势的女孩照片

可以对LLM进行微调或提示(使用少量提示)，通过输出像“ftool: pose-detection, input: image.jpg, output: temp1) {tool: pose-to-image, input: temp1, output: final.jpg}”这样的字符串来指定计划。这个结构化的输出指定了要采取的两个步骤，然后触发软件调用姿势检测工具，然后是姿势到图像的工具来完成任务。

Multi-agent collaboration 多智能体协作

对于编写软件这样的复杂任务，多智能体方法将任务分解为子任务，由不同的角色(如软件工程师、产品经理、设计师、QA(质量保证)工程师等)执行，并让不同的智能体完成不同的子任务。

可以通过提示LLM执行不同的任务来构建不同的智能体。例如，要构建一个软件工程师智能体，我们可能会提示法学硕士:“You are an expert in writing clear, efficient code. Write code to perform the task . . .."

智能体范式

个人感觉，Agent的处理更强调workflow，更像一个flow-engineering

ReAct

ReAct: Synergizing Reasoning and Acting in Language Models

ReAct是这几种范式里面最基础的。核心原理是：自己选择需要使用的工具，并使用工具获取输出。

关于ReAct，这里有个Repo实现了一个简易的ReAct Agent，可以去看看具体实现。在这里简单的提一下：

首先定义工具类。这里以谷歌搜索为例：

class Tools:
    def __init__(self) -> None:
        self.toolConfig = self._tools()
    
    def _tools(self):
        tools = [
            {
                'name_for_human': '谷歌搜索',
                'name_for_model': 'google_search',
                'parameters': [
                    {
                        'name': 'search_query',
                        'description': '搜索关键词或短语',
                        'required': True,
                        'schema': {'type': 'string'},
                    }
                ],
            }
        ]
        return tools

    def google_search(self, search_query: str):
        ...

构建系统提示：直接在prompt里告诉模型可以调用的工具（build_system_input），模型就会自己输出自己要调用的工具，之后Agent解析模型自己的输出（parse_latest_plugin_call）并调用工具（call_plugin）。

TOOL_DESC = """{name_for_model}: Call this tool to interact with the {name_for_human} API. What is the {name_for_human} API useful for? {description_for_model} Parameters: {parameters} Format the arguments as a JSON object."""
REACT_PROMPT = """Answer the following questions as best you can. You have access to the following tools:

{tool_descs}

Use the following format:

Question: the input question you must answer
Thought: you should always think about what to do
Action: the action to take, should be one of [{tool_names}]
Action Input: the input to the action
Observation: the result of the action
... (this Thought/Action/Action Input/Observation can be repeated zero or more times)
Thought: I now know the final answer
Final Answer: the final answer to the original input question

Begin!
"""


class Agent:
    def __init__(self, path: str = '') -> None:
        self.path = path
        self.tool = Tools()
        self.system_prompt = self.build_system_input()
        self.model = InternLM2Chat(path)

    def build_system_input(self):
        tool_descs, tool_names = [], []
        for tool in self.tool.toolConfig:
            tool_descs.append(TOOL_DESC.format(**tool))
            tool_names.append(tool['name_for_model'])
        tool_descs = '\n\n'.join(tool_descs)
        tool_names = ','.join(tool_names)
        sys_prompt = REACT_PROMPT.format(tool_descs=tool_descs, tool_names=tool_names)
        return sys_prompt
    
    def parse_latest_plugin_call(self, text):
        plugin_name, plugin_args = '', ''
        i = text.rfind('\nAction:')
        j = text.rfind('\nAction Input:')
        k = text.rfind('\nObservation:')
        if 0 <= i < j:  # If the text has `Action` and `Action input`,
            if k < j:  # but does not contain `Observation`,
                text = text.rstrip() + '\nObservation:'  # Add it back.
            k = text.rfind('\nObservation:')
            plugin_name = text[i + len('\nAction:') : j].strip()
            plugin_args = text[j + len('\nAction Input:') : k].strip()
            text = text[:k]
        return plugin_name, plugin_args, text
    
    def call_plugin(self, plugin_name, plugin_args):
        plugin_args = json5.loads(plugin_args)
        if plugin_name == 'google_search':
            return '\nObservation:' + self.tool.google_search(**plugin_args)

    def text_completion(self, text, history=[]):
        text = "\nQuestion:" + text
        response, his = self.model.chat(text, history, self.system_prompt)
        print(response)
        plugin_name, plugin_args, response = self.parse_latest_plugin_call(response)
        if plugin_name:
            response += self.call_plugin(plugin_name, plugin_args)
        response, his = self.model.chat(response, history, self.system_prompt)
        return response, his

AutoGPT

AutoGPT范式通过将任务发送给任务执行智能体A，将问题与A的结果存储至记忆，再将A的结果发送给任务创建智能体B，将B的结果存储至记忆，再将记忆发送给A，如此迭代直至符合条件。

ReWoo

ReWoo将用户输入进行计划拆分后运行，并将所有的结果整合为最后输出。

posted @ 2024-04-21 10:46 vanilla阿草阅读(114) 评论(0) 编辑收藏举报

刷新页面返回顶部

Loading

vanilla