ToolBench：一种整合了“Multi Steps CoT Chains”和“tool learning”的新SFT范式

一、研发背景

人类区别于其他低等生物的核心原因就是人类具有创造和利用工具的能力，使得我们可以突破身体的限制，探索更广阔的世界。

人工智能基础模型也类似，如果仅靠训练阶段得到的权重，使用场景就会非常受限。因此工具学习（tool learning）技术被提出，将特定领域的专用工具与大规模基础模型相结合，可以实现更高的效率、性能。

另外一方面，人们逐渐认识到，无论是直接基于Base Model进行zero/few-shot prompt，还是类似langchain风格的多步骤链式推理（Multi Steps CoT Chains），将一个具体的任务分解为一些列子步骤，逐步解决后再汇总，是一种更有效的大模型推理过程。

基于以上两点原因，人们开始关注如何将“Multi Steps CoT Chains”和“tool learning”进行结合，创造出一种新的SFT范式。

清华大学自然语言处理实验室等支持的开源社区OpenBMB （Open Lab for Big Model Base）发布了ToolBench项目，可以帮助开发者构建开源、大规模、高质量的指令调优数据，促进构建具有通用工具使用能力的大型语言模型。

ToolBench仓库中提供了相关数据集、训练和评估脚本，以及在ToolBench上微调的功能模型ToolLLaMA，具体特点为：

ToolBench同时支持单工具和多工具场景。单个工具设置遵循LangChain样式（提示），而多工具设置遵循AutoGPT样式（提示符）。
ToolBench提供的响应不仅包括最终答案，还包括模型的思维过程链、工具执行和工具执行结果。
ToolBench包含了现实世界场景的复杂性，支持多步骤的工具调用。
另一个显著的优点是API的多样性，它是为真实场景设计的，例如天气信息、搜索功能、股票更新和PowerPoint自动化。
所有数据都是由OpenAI API自动生成并由我们过滤的，整个数据创建过程很容易扩展。

参考链接：

https://mp.weixin.qq.com/s/U0XeFDycMNILqajENvJEGg
https://github.com/OpenBMB/ToolBench/tree/master

二、SFT训练数据集生成

ToolBench的数据集是研发人员通过prompt engining，按照工具调用和Cot Chains的格式构造的prompt templates，并通过GPT API接口进行打标，将获取的completions和prompt进行融合整理，得到的一个由dict数组组成的JSON数据集。

如果想要扩展这个数据集，开发者需要从prompt engining开始，将自己的领域任务转换为对应的prompt templates，然后通过GPT API接口进行打标，通过同样的方式融合整理得到新的数据集。

ToolBench开源出来的数据集包含单工具和多工具场景，以下是单工具场景的统计数据：

Tool	Query Num	Chains Num	Chains/Query
Weather	9827	23740	2.4
Chemical	8585	29916	3.5
Translation	10267	23011	2.2
Map	7305	23325	3.2
Stock	11805	32550	2.8
Meta analysis	2526	15725	6.2
Bing search	31089	102088	3.3
Wolfram	16130	56169	3.5
Database	1264	6347	5

多工具场景统计：

Scenario	Tools	Query num	Sub-Query num	Chains num	Chains per Query
Meta_file	chemical-prop/meta_analysis/Slides Making/Wikipedia/file_operation/Bing_search	331	1197	5899	17.8
Multi_film	Wolfram/Film Search/Slides Making/Wikipedia/file_operation/Bing_search	795	2703	12445	15.7
Vacation_plan	google_places/wikipedia/weather/bing search	191	654	2742	14.4

我们来逐步分析数据集的产生和处理过程。

我们以single tool为例，multi tool的原理是类似的。

0x1：将原始问题转化为“Multi Steps CoT Chains”和“tool learning”的格式

1、将原始问题展开为包含工具调用的思维链的描述形式（人工prompt增强）

首先，从一个最基础的question开始，假设你的问题是：

What will be the UV index for Miami tomorrow?

然后按照“Multi Steps CoT Chains”和“tool learning”的范式构造出对应的prompt template。

------------------------------------------------------------------------------------------------------------------------------------------

"Answer the following questions as best you can. Specifically, you have access to the following APIs:\n\n

get_weather_today: Get today's the weather. Your input should be a json (args json schema): {{\"location\" : string, }} The Action to trigger this API should be get_weather_today and the input parameters should be a json dict string. Pay attention to the type of parameters.\n

forecast_weather: Forecast weather in the upcoming days.. Your input should be a json (args json schema): {{\"location\" : string, \"days\" : integer, }} The Action to trigger this API should be forecast_weather and the input parameters should be a json dict string. Pay attention to the type of parameters.\n\n

Use the following format:\n\n

Question: the input question you must answer\n

Thought: you should always think about what to do\n

Action: the action to take, should be one of [get_weather_today, forecast_weather]\n

Action Input: the input to the action\n

Observation: the result of the action\n

... (this Thought/Action/Action Input/Observation can repeat N times)\n

Thought: I now know the final answer\n

Final Answer: the final answer to the original input question\n\n

Begin! Remember: (1) Follow the format, i.e,\n

Thought:\n

Action:\n

Action Input:\n

Observation:\n

Final Answer:\n

(2) Provide as much as useful information in your Final Answer. (3) Do not make up anything, and if your Observation has no link, DO NOT hallucihate one. (4) If you have enough information and want to stop the process, please use \n

Thought: I have got enough information\n

Final Answer: **your response. \n

The Action: MUST be one of the following:get_weather_today; forecast_weather\n

Question: {input}\n

Agent scratchpad (history actions):\n"

------------------------------------------------------------------------------------------------------------------------------------------

将前面的问题填充到{input}中，得到一个完整的prompt。

------------------------------------------------------------------------------------------------------------------------------------------

"Answer the following questions as best you can. Specifically, you have access to the following APIs:\n\n

Use the following format:\n\n

Question: the input question you must answer\n

Thought: you should always think about what to do\n

Action: the action to take, should be one of [get_weather_today, forecast_weather]\n

Action Input: the input to the action\n

Observation: the result of the action\n

... (this Thought/Action/Action Input/Observation can repeat N times)\n

Thought: I now know the final answer\n

Final Answer: the final answer to the original input question\n\n

Begin! Remember: (1) Follow the format, i.e,\n

Thought:\n

Action:\n

Action Input:\n

Observation:\n

Final Answer:\n

Thought: I have got enough information\n

Final Answer: **your response. \n

The Action: MUST be one of the following:get_weather_today; forecast_weather\n

Question: What will be the UV index for Miami tomorrow?\n

Agent scratchpad (history actions):\n"

------------------------------------------------------------------------------------------------------------------------------------------

注意，上面这段文本，本质上就是一种编程语法，只是它的表达载体是自然语言。

2、基于few-shot prompt和instruction prompt，输入GPT得到精简格式的“Multi Steps CoT Chains”和“tool learning”范式的prompt（形式化prompt）

------------------------------------------------------------------------------------------------------------------------------------------

Thought: I need to get the UV index for Miami tomorrow.

Action: forecast_weather

Action Input: {"location" : "Miami", "days" : 1}

Observation: The API returns the forecast weather for Miami in the next day, including the UV index.

Thought: I now know the final answer.

Final Answer: The forecast weather API with location Miami and one day ahead predicts that the UV index for tomorrow is X.

------------------------------------------------------------------------------------------------------------------------------------------

注意，到了这一步依然是在进行prompt engining。

3、基于GPT Plugins得到上一步prompt对应的completions（获取包含API调用结果的思维链执行结果）

prompt如下，

------------------------------------------------------------------------------------------------------------------------------------------

Thought: I need to get the UV index for Miami tomorrow.

Action: forecast_weather

Action Input: {"location" : "Miami", "days" : 1}

Observation: The API returns the forecast weather for Miami in the next day, including the UV index.

Thought: I now know the final answer.

Final Answer: The forecast weather API with location Miami and one day ahead predicts that the UV index for tomorrow is X.

------------------------------------------------------------------------------------------------------------------------------------------

得到completions，

------------------------------------------------------------------------------------------------------------------------------------------

"\"The weather forecast for Miami at 1 days later is: \\n

over all weather: Moderate rain,\\n

max temperature: 30.3(C), 86.5(F),\\n

min temperature: 26.0(C), 78.8(F),\\n

average temperature: 28.2(C), 82.7(F),\\n

max wind speed: 24.1(kph), 15.0(mph),\\n

total precipitation: 7.9(mm), 0.31(inch),\\n

will rain today: 1,\\n

chance of rain: 89,\\n

total snow: 0.0(cm),\\n

will snow today: 0,\\n

chance of snow: 0,\\n

average visibility: 9.6(km), 5.0(miles),\\n

average humidity: 68.0,\\n

UV index: 6.0,\\n

sunrise time: 06:46 AM,\\n

sunset time: 07:51 PM,\\n

moonrise time: 01:40 PM,\\n

moonset time: 02:46 AM,\\n\""

------------------------------------------------------------------------------------------------------------------------------------------

4、基于GPT对整个“Multi Steps CoT Chains”和“tool learning”的子步骤completions结果进行summary总结，得到一个最终answer

最终answer如下：

The UV index for Miami tomorrow is 6.0.

到此为止，相当于整个chain of though和tool call过程，得到了一个运行结果。

数据文件中的每一行都是一个json dict，其中包含数据创建模板化的提示、工具使用的人工指令（查询）、中间思想/工具执行循环以及最终答案。

下面我们展示了一个生成单一工具数据的示例。

Tool Descrition:
BMTools Tool_name: translation
Tool action: get_translation
action_input: {"text": target texts, "tgt_lang": target language}

Generated Data:
{
    "prompt": "Answer the following questions as best you can. Specifically, you have access to the following APIs:\n\nget_translation: . Your input should be a json (args json schema): {{\"text\" : string, \"tgt_lang\" : string, }} The Action to trigger this API should be get_translation and the input parameters should be a json dict string. Pay attention to the type of parameters.\n\nUse the following format:\n\nQuestion: the input question you must answer\nThought: you should always think about what to do\nAction: the action to take, should be one of [get_translation]\nAction Input: the input to the action\nObservation: the result of the action\n... (this Thought/Action/Action Input/Observation can repeat N times, max 7 times)\nThought: I now know the final answer\nFinal Answer: the final answer to the original input question\n\nBegin! Remember: (1) Follow the format, i.e,\nThought:\nAction:\nAction Input:\nObservation:\nFinal Answer:\n (2) Provide as much as useful information in your Final Answer. (3) Do not make up anything, and if your Observation has no link, DO NOT hallucihate one. (4) If you have enough information and want to stop the process, please use \nThought: I have got enough information\nFinal Answer: **your response. \n The Action: MUST be one of the following:get_translation\nQuestion: {input}\n Agent scratchpad (history actions):\n {agent_scratchpad}",
    "query": "My intention is to convert the data provided in ما هي الأقسام الثلاثة للقوات المسلحة؟ into Arabic(ara).\n",
    "chains": [
        {
            "thought": "I need to use the get_translation API to convert the text into Arabic.",
            "action": "get_translation",
            "action_input": "{\"text\": \"What are the three branches of the military?\", \"tgt_lang\": \"ara\"}",
            "observation": "\"ما هي الفروع الثلاثة للجيش ؟\""
        }
    ],
    "answer": "The translation of \"What are the three branches of the military?\" into Arabic is \"ما هي الفروع الثلاثة للجيش ؟\"."
}

多工具数据的示例，

{
        "query": "Can you suggest some travel insurance options for a backpacking trip to Southeast Asia?\n",
        "steps": [
            {
                "assistant_reply": "{\n    \"thoughts\": {\n        \"text\": \"I think I can use the Bing_search command to find travel insurance options for a backpacking trip to Southeast Asia.\",\n        \"reasoning\": \"Bing_search can help me find information on travel insurance options for backpacking trips in Southeast Asia by searching for relevant websites and articles.\"\n    },\n    \"command\": {\n        \"name\": \"Bing_search\",\n        \"args\": {\n            \"goal\": \"Find travel insurance options for a backpacking trip to Southeast Asia.\"\n        }\n    }\n}",
                "chains": [
                    {
                        "thought": "I need to search for travel insurance options for backpacking in Southeast Asia.",
                        "action": "search_top3",
                        "action_input": "{\"key_words\": \"backpacking travel insurance Southeast Asia\"}",
                        "observation": "\"page: 1\\ntitle: Backpacker Travel Insurance - The Essential Guide\\nsummary: Popular Backpacker Travel Insurance Providers: Quick Answers! Do I Need Travel Insurance for Backpacking? Is Backpacker Insurance Different From Travel Insurance? What Travel Insurance SHOULD Cover; What Most Travel Insurers DON\u2019T Cover; TOP 3 Most Popular Backpacking Insurance Companies With Our Readers! 1. World Nomads; 2.\\npage: 2\\ntitle: Asia Travel Insurance | Asia Backpacker Travel Insurance\\nsummary: More than just being a cheap travel destination, South East Asia has a wealth of history, some stunning architecture & extremely friendly locals. Our Asia travel insurance provides you with all the cover you need if you are heading to Asia - whether that be to Vietnam or Thailand, China or even Mongolia!\\npage: 3\\ntitle: Backpacking Southeast Asia Travel Guide (Budget Tips \u2022 2023)\\nsummary: Travel Insurance for Southeast Asia World Nomads \u2019 mission is to support and encourage travellers to explore their boundaries. They offer simple & flexible travel insurance, and safety advice to help you travel confidently.\\n\""
                    },
                    {
                        "thought": "I need more detailed information about the travel insurance providers listed in the search results.",
                        "action": "load_page_index",
                        "action_input": "{\"idx\": 0}",
                        "observation": "\"Toms AsiaToms AsiaAre you looking to hit the road and explore the magnificent sites of Southeast Asia for 1 month? Backpacking is the ideal way to traverse the idyllic forests, beaches and cities that the region has to offer. With a range of activities to try, friendly locals and a wonderful variety of culture, Southeast Asia is the perfect destination for an adventure-packed one month trip. This article will provide valuable tips and advice to ensure you make the most of your one month journey around Southeast Asia. So lets get into backpacking Southeast Asia for 1 Month.BudgetMake sure you have an appropriate budget for your trip that allows for backpacking Southeast Asia for 1 month and all the necessary travel expenses such as accommodation, transport, food and activities.As someone who loves to travel and explore new cultures, I am very excited to embark on my next journey- a backpacking trip through Southeast Asia. However, planning and executing a successful budget-friendly trip is no small feat. To ensure my trip is enjoyable, memorable, and actually cost-effective, I will need to plan meticulously and really research my costs in depth.Before I even consider travelling to Southeast Asia, I must first determine my budget backpacking Southeast Asia for 1 month. This will help me prioritize my spending, as well as narrow down my travel destinations. By having a good understanding of my financial limits, I can ensure that my trip is not only within my budget, but that I a\""
                    },
                    {
                        "thought": "This page is not related to the travel insurance providers, I should try loading the second or third page.",
                        "action": "load_page_index",
                        "action_input": "{\"idx\":1}",
                        "observation": "\"Sure to sweep you off your feet in any season, Napa Valley is the ultimate destination for romanceVisit Napa Valley for the perfect conference, incentive trip, social group event, corporate event or team building meeting. Unique meeting spaces create an inspiring atmosphere.Give Your Clients the Ultimate Napa Valley Experience -\u00a0information for lodging, wineries, restaurants, arts, transportation, special offers, events and moreLearn all about Visit Napa Valley the organization, our mission, our strategy, partners and much more while promoting our area as the premier destination it isSure to sweep you off your feet in any season, Napa Valley is the ultimate destination for romanceVisit Napa Valley for the perfect conference, incentive trip, social group event, corporate event or team building meeting. Unique meeting spaces create an inspiring atmosphere.Give Your Clients the Ultimate Napa Valley Experience -\u00a0information for lodging, wineries, restaurants, arts, transportation, special offers, events and moreLearn all about Visit Napa Valley the organization, our mission, our strategy, partners and much more while promoting our area as the premier destination it isFrom intimate boutique inns to full-service luxury resorts, and everything in between, Napa Valley properties are ready to welcome you with world-class hospitality.Let\u2019s raise a glass, because our legendary Napa Valley wine country has only grown better with time. Offering world-class wine, winemakers, and wineries, \""
                    },
                    {
                        "thought": "The second page is not related to travel insurance either, I should try the third one.",
                        "action": "load_page_index",
                        "action_input": "{\"idx\":2}",
                        "observation": "\"Timeout for loading this page, Please try to load another one or search again.\""
                    },
                    {
                        "thought": "",
                        "action": "",
                        "action_input": "",
                        "observation": "I was not able to load the third page for more information, but World Nomads was listed as the top recommended travel insurance provider for backpacking in Southeast Asia in the search results.\nFinal Answer: World Nomads is the top recommended travel insurance provider for backpacking in Southeast Asia."
                    }
                ],
                "answer": "Tool Bing_search returned: {'input': '{\"goal\": \"Find travel insurance options for a backpacking trip to Southeast Asia.\", \"history context\": \"[]\"}', 'output': 'World Nomads is the top recommended travel insurance provider for backpacking in Southeast Asia.'}"
            },
            {
                "assistant_reply": "{\n    \"thoughts\": {\n        \"text\": \"Based on the previous search results, I think I can use the Wikipedia command to learn more about World Nomads and confirm that it is a reliable travel insurance provider for backpacking in Southeast Asia.\",\n        \"reasoning\": \"Wikipedia can provide reliable information on various topics, including travel insurance providers and their reputations. I can use this command to verify the information found in the previous search.\"\n    },\n    \"command\": {\n        \"name\": \"Wikipedia\",\n        \"args\": {\n            \"goal\": \"Learn more about World Nomads and confirm its reliability as a travel insurance provider for backpacking in Southeast Asia.\"\n        }\n    }\n}",
                "chains": [
                    {
                        "thought": "I need to search for more information on World Nomads",
                        "action": "search",
                        "action_input": "{\"entity\": \"World Nomads\"}",
                        "observation": "\"Could not find World Nomads. Try some similar entities: ['World Nomad Games', 'Nomad', 'Worldnomads.com', 'Digital nomad', 'Eurasian nomads'].\""
                    },
                    {
                        "thought": "I should try searching for a related entity from the list of similar entities provided in the previous search result",
                        "action": "search",
                        "action_input": "{\"entity\": \"World Nomad Games\"}",
                        "observation": "\"World Nomad Games (Kyrgyz: \u0414\u04af\u0439\u043d\u04e9\u043b\u04af\u043a \u043a\u04e9\u0447\u043c\u04e9\u043d\u0434\u04e9\u0440 \u043e\u044e\u043d\u0434\u0430\u0440\u044b) are an international sport competition dedicated to ethnic sports practiced in Central Asia. The first three World Nomad Games were held in Cholpon-Ata, Kyrgyzstan. The fourth games were held in Iznik, Turkey between September 29th and October 2nd 2022.[1] More than 3000 athletes from 102 countries participated in the events.[2]. The Games are surrounded by a cultural and ethnical program. During the first edition of the Games, a yurt village was installed, and cultural events not linked with the Games' disciplines occurred, as well as other entertaining activities..\""
                    },
                    {
                        "thought": "The previous search didn't give me useful information about World Nomads. I should try disambiguating the name to find related entities.",
                        "action": "disambiguation",
                        "action_input": "{\"entity\": \"World Nomads\"}",
                        "observation": "\"Related entities to World Nomads: []\""
                    },
                    {
                        "thought": "Since the disambiguation API did not yield any related entities, I should try refining my search and try again",
                        "action": "search",
                        "action_input": "{\"entity\": \"World Nomads travel insurance\"}",
                        "observation": "\"Could not find World Nomads travel insurance. Try some similar entities: ['Digital nomad', 'Worldnomads.com (category Online insurance companies)', 'Travel visa', 'Nib Health Funds (category Health insurance in Australia)', 'Bleisure travel'].\""
                    },
                    {
                        "thought": "I should try searching for \"World Nomads travel insurance\" as a full entity name to get better results",
                        "action": "search",
                        "action_input": "{\"entity\": \"World Nomads travel insurance\"}",
                        "observation": "\"Could not find World Nomads travel insurance. Try some similar entities: ['Digital nomad', 'Worldnomads.com (category Online insurance companies)', 'Travel visa', 'Nib Health Funds (category Health insurance in Australia)', 'Rock Machine Motorcycle Club (category Articles with WorldCat identifiers)'].\""
                    },
                    {
                        "thought": "I still cannot find any information on World Nomads travel insurance. I should try looking up related keywords to see if I can find any relevant information.",
                        "action": "lookup",
                        "action_input": "{\"keyword\": \"backpacking travel insurance\"}",
                        "observation": "\"No more results.\""
                    },
                    {
                        "thought": "",
                        "action": "",
                        "action_input": "",
                        "observation": "I cannot find any relevant information on World Nomads or backpacking travel insurance. I should try using another tool or source to gather more information about travel insurance providers for backpacking in Southeast Asia.\nFinal Answer: No information found on World Nomads or any other travel insurance providers for backpacking in Southeast Asia through the given API tool. Further research is needed using other sources."
                    }
                ],
                "answer": "Tool Wikipedia returned: {'input': '{\"goal\": \"Learn more about World Nomads and confirm its reliability as a travel insurance provider for backpacking in Southeast Asia.\", \"history context\": \"[\\\\\"Tool Bing_search returned: {\\\\\\\\\\\\\"input\\\\\\\\\\\\\": \\\\\\\\\\\\\"{\"goal\": \"Find travel insurance options for a backpacking trip to Southeast Asia.\", \"history context\": \"[]\"}\\\\\\\\\\\\\", \\\\\\\\\\\\\"output\\\\\\\\\\\\\": \\\\\\\\\\\\\"World Nomads is the top recommended travel insurance provider for backpacking in Southeast Asia.\\\\\\\\\\\\\"}\\\\\"]\"}', 'output': 'No information found on World Nomads or any other travel insurance providers for backpacking in Southeast Asia through the given API tool. Further research is needed using other sources.'}"
            },
            {
                "assistant_reply": "{\n    \"thoughts\": {\n        \"text\": \"Since Wikipedia did not provide any information on World Nomads or other travel insurance providers for backpacking in Southeast Asia, I can use the google_places command to search for recommendations from backpacker communities and blogs.\",\n        \"reasoning\": \"Google Places API allows me to query for addresses and related information based on search queries, including backpacking communities and blogs that may have relevant information on travel insurance providers for Southeast Asia.\"\n    },\n    \"command\": {\n        \"name\": \"google_places\",\n        \"args\": {\n            \"goal\": \"Search for recommendations from backpacking communities and blogs on travel insurance providers for backpacking trips in Southeast Asia.\"\n        }\n    }\n}",
                "chains": [
                    {
                        "thought": "I need to use the search_places API to query the Google Places API with the appropriate input to search for recommendations on travel insurance providers for backpacking trips in Southeast Asia.",
                        "action": "search_places",
                        "action_input": "{\"query\": \"backpacking travel insurance Southeast Asia recommendations\"}",
                        "observation": "\"Google Places did not find any places that match the description\""
                    },
                    {
                        "thought": "",
                        "action": "",
                        "action_input": "",
                        "observation": "The Google Places API did not find any places that match the given query. Further research is needed using other sources.\nFinal Answer: No information found on travel insurance providers for backpacking in Southeast Asia through the given API tool. Further research is needed using other sources."
                    }
                ],
                "answer": "Tool google_places returned: {'input': '{\"goal\": \"Search for recommendations from backpacking communities and blogs on travel insurance providers for backpacking trips in Southeast Asia.\", \"history context\": \"\\\\\\\\\\\\\\\\\"{\"goal\": \"Find travel insurance options for a backpacking trip to Southeast Asia.\", \"history context\": \"[]\"}\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\", \\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\"output\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\": \\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\"World Nomads is the top recommended travel insurance provider for backpacking in Southeast Asia.\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\"}\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\"]\"}\\\\\\\\\\\\\", \\\\\\\\\\\\\"output\\\\\\\\\\\\\": \\\\\\\\\\\\\"No information found on World Nomads or any other travel insurance providers for backpacking in Southeast Asia through the given API tool. Further research is needed using other sources.\\\\\\\\\\\\\"}\\\\\"]\"}', 'output': 'No information found on travel insurance providers for backpacking in Southeast Asia through the given API tool. Further research is needed using other sources.'}"
            }
        ]
    },

0x2：将整个“Multi Steps CoT Chains”和“tool learning”的prompt-completions pair处理成一个<问题，人类对该问题的思维链和工具调用的流程分解定义，gpt的最终回答>的prompt template

还是以上面问题为例，处理后的数据集如下，

 {
        "id": "STEP 0: What will be the UV index for Miami tomorrow?\n",
        "conversations": [
            {
                "from": "human",
                "value": "Answer the following questions as best you can. Specifically, you have access to the following APIs:\n\nget_weather_today: Get today's the weather. Your input should be a json (args json schema): {{\"location\" : string, }} The Action to trigger this API should be get_weather_today and the input parameters should be a json dict string. Pay attention to the type of parameters.\nforecast_weather: Forecast weather in the upcoming days.. Your input should be a json (args json schema): {{\"location\" : string, \"days\" : integer, }} The Action to trigger this API should be forecast_weather and the input parameters should be a json dict string. Pay attention to the type of parameters.\n\nUse the following format:\n\nQuestion: the input question you must answer\nThought: you should always think about what to do\nAction: the action to take, should be one of [get_weather_today, forecast_weather]\nAction Input: the input to the action\nObservation: the result of the action\n... (this Thought/Action/Action Input/Observation can repeat N times)\nThought: I now know the final answer\nFinal Answer: the final answer to the original input question\n\nBegin! Remember: (1) Follow the format, i.e,\nThought:\nAction:\nAction Input:\nObservation:\nFinal Answer:\n (2) Provide as much as useful information in your Final Answer. (3) Do not make up anything, and if your Observation has no link, DO NOT hallucihate one. (4) If you have enough information and want to stop the process, please use \nThought: I have got enough information\nFinal Answer: **your response. \n The Action: MUST be one of the following:get_weather_today; forecast_weather\nQuestion: What will be the UV index for Miami tomorrow?\n\n Agent scratchpad (history actions):\n "
            },
            {
                "from": "gpt",
                "value": "Thought: I need to get the UV index for Miami tomorrow.\nAction: forecast_weather\nAction Input: {\"location\": \"Miami\", \"days\": 1}\n"
            }
        ]
}

上面这个prompt template代表了我们希望LLM SFT对齐的一种生成范式，即：

<原始问题，对该问题按照思维链和工具调用的流程分解为子步骤，逐步子步骤分别解答过程（过程涉及工具调用）得到每一个子步骤的大难，最终汇总所有子步骤答案得到最终回答>

在上面的例子中，通过消耗GPT token，可以得到一批包含了“Multi Steps CoT Chains”和“tool learning”范式的SFT训练数据集，有了这份数据集，接下来就可以输入例如LLamA这种基模型进行SFT训练。

基于以上范式得到一个具备思维链推理和工具调用能力的SFT模型，之后的交互流程大致如下：

Step1: 外部input: What will be the UV index for Miami tomorrow?\

------------------------------------------------------------------------------------------------>
------------------------------------------------------------------------------------------------>
------------------------------------------------------------------------------------------------>

Step2: SFT Model根据输入问题进行思维链展开，并融合外部工具调用，生成一个新的prompt（本质是在学习输入原始问题，生成出一段描述该问题的新的思维链文本）
"Answer the following questions as best you can. Specifically, you have access to the following APIs:\n\n

get_weather_today: Get today's the weather. Your input should be a json (args json schema): {{\"location\" : string, }} The Action to trigger this API should be get_weather_today and the input parameters should be a json dict string. Pay attention to the type of parameters.\n

forecast_weather: Forecast weather in the upcoming days.. Your input should be a json (args json schema): {{\"location\" : string, \"days\" : integer, }} The Action to trigger this API should be forecast_weather and the input parameters should be a json dict string. Pay attention to the type of parameters.\n\n

Use the following format:\n\n

Question: the input question you must answer\n

Thought: you should always think about what to do\n

Action: the action to take, should be one of [get_weather_today, forecast_weather]\n

Action Input: the input to the action\n

Observation: the result of the action\n

... (this Thought/Action/Action Input/Observation can repeat N times)\n

Thought: I now know the final answer\n

Final Answer: the final answer to the original input question\n\n

Begin! Remember: (1) Follow the format, i.e,\n

Thought:\n

Action:\n

Action Input:\n

Observation:\n

Final Answer:\n

(2) Provide as much as useful information in your Final Answer. (3) Do not make up anything, and if your Observation has no link, DO NOT hallucihate one. (4) If you have enough information and want to stop the process, please use \n

Thought: I have got enough information\n

Final Answer: **your response. \n

The Action: MUST be one of the following:get_weather_today; forecast_weather\n

Question: What will be the UV index for Miami tomorrow?\n

Agent scratchpad (history actions):\n"

------------------------------------------------------------------------------------------------>
------------------------------------------------------------------------------------------------>
------------------------------------------------------------------------------------------------>

Step3: 将包含工具调用的思维链prompt，输入GPT，得到一个形式化格式的思维链prompt。本例中返回结果如下：
Thought: I need to get the UV index for Miami tomorrow.

Action: forecast_weather

Action Input: {"location" : "Miami", "days" : 1}

Observation: The API returns the forecast weather for Miami in the next day, including the UV index.

Thought: I now know the final answer.

Final Answer: The forecast weather API with location Miami and one day ahead predicts that the UV index for tomorrow is X.

------------------------------------------------------------------------------------------------>
------------------------------------------------------------------------------------------------>
------------------------------------------------------------------------------------------------>

Step4: 将形式化prompt输入GPT，通过GPT得到包含了工具调用结果的思维链推理结果。本例中返回结果如下：
"\"The weather forecast for Miami at 1 days later is: \\n

over all weather: Moderate rain,\\n

max temperature: 30.3(C), 86.5(F),\\n

min temperature: 26.0(C), 78.8(F),\\n

average temperature: 28.2(C), 82.7(F),\\n

max wind speed: 24.1(kph), 15.0(mph),\\n

total precipitation: 7.9(mm), 0.31(inch),\\n

will rain today: 1,\\n

chance of rain: 89,\\n

total snow: 0.0(cm),\\n

will snow today: 0,\\n

chance of snow: 0,\\n

average visibility: 9.6(km), 5.0(miles),\\n

average humidity: 68.0,\\n

UV index: 6.0,\\n

sunrise time: 06:46 AM,\\n

sunset time: 07:51 PM,\\n

moonrise time: 01:40 PM,\\n

moonset time: 02:46 AM,\\n\""

------------------------------------------------------------------------------------------------>
------------------------------------------------------------------------------------------------>
------------------------------------------------------------------------------------------------>


Step5: 基于GPT对整个“Multi Steps CoT Chains”和“tool learning”的子步骤completions结果进行summary总结，得到一个最终answer。本例中最终answer如下：
The UV index for Miami tomorrow is 6.0.

三、模型微调（SFT fine-tune）

0x1：安装

git clone https://github.com/OpenBMB/ToolBench.git
cd ToolBench
pip install -r requirements.txt

0x2：数据处理

关于数据处理的逻辑，上一个小节已经分析讨论过了，这里不再赘述。

下载最新发布的工具数据，或者自己新增处理的新数据集，并将其放在data/original/下。

对于单工具数据预处理，可以使用以下命令处理数据进行微调

python3 data/preprocess.py \
    --tool_mode single
    --tool_data_path data/original/weather_demo.json \
    --output_path data/processed/weather_demo.json

对于多工具数据预处理，可以使用：

python data/preprocess.py \
    --tool_mode multi
    --tool_data_path data/original/meta_file_demo.json \
    --output_path data/processed/meta_file_demo.json

0x3：训练

将sft训练语料的json格式，转化为一段连续的文本。

因为sft的训练是逐个token自监督训练的，因此无论多么复杂的prompt template，最终在输入sft之前，都要转化/拼接为一段完整的文本。

处理后的conversations如下（某个例子）：

------------------------------------------------------------------------------------------------------------------------------------------

conversations: [

'A chat between a curious user and an artificial intelligence assistant who can use external tools and APIs to solve the user\'s question. The assistant gives tools and APIs calling processes or final answer to the human\'s question. Human: Answer the following questions as best you can. Specifically, you have access to the following APIs:\n\n

get_weather_today: Get today\'s the weather. Your input should be a json (args json schema): {{"location" : string, }} The Action to trigger this API should be get_weather_today and the input parameters should be a json dict string. Pay attention to the type of parameters.\n

forecast_weather: Forecast weather in the upcoming days.. Your input should be a json (args json schema): {{"location" : string, "days" : integer, }} The Action to trigger this API should be forecast_weather and the input parameters should be a json dict string. Pay attention to the type of parameters.\n\n

Use the following format:\n\n

Question: the input question you must answer\n

Thought: you should always think about what to do\n

Action: the action to take, should be one of [get_weather_today, forecast_weather]\n

Action Input: the input to the action\n

Observation: the result of the action\n

... (this Thought/Action/Action Input/Observation can repeat N times)\n

Thought: I now know the final answer\n

Final Answer: the final answer to the original input question\n\n

Begin! Remember: (1) Follow the format, i.e,\n

Thought:\n

Action:\n

Action Input:\n

Observation:\n

Final Answer:\n

Thought: I have got enough information\n

Final Answer: **your response. \n

The Action: MUST be one of the following:get_weather_today; forecast_weather\n

Question: What is the expected wind speed in Miami tomorrow? \n\n

Agent scratchpad (history actions):\n

Thought: I need to use the get_weather_today API to get the wind speed in Miami tomorrow.\n

Action: get_weather_today\n

Action Input: {"location": "Miami"}\n

Observation: "Today\'s weather report for Miami is:\\n

overall: Patchy light rain with thunder,\\n

name: Miami,\\nr

egion: Florida,\\n

country: United States of America,\\n

localtime: 2023-04-28 13:20,\\n

temperature: 29.4(C), 84.9(F),\\n

percipitation: 0.2(mm), 0.01(inch),\\n

pressure: 1014.0(milibar),\\n

humidity: 61,\\n

cloud: 75,\\n

body temperature: 33.0(C), 91.3(F),\\n

wind speed: 31.3(kph), 19.5(mph),\\n

visibility: 16.0(km), 9.0(miles),\\n

UV index: 6.0,\\n"\n

Assistant: Thought: The get_weather_today API only provides today\'s weather report, not tomorrow\'s. I need to use the forecast_weather API to get the wind speed in Miami tomorrow.\n

Action: forecast_weather\n

Action Input: {"location": "Miami", "days": 1}\n</s>'

]

------------------------------------------------------------------------------------------------------------------------------------------

向量化后的输入如下：

代码基于FastChat。您可以使用以下命令使用8 x V100（32GB）训练ToolLLaMA-7b：

export PYTHONPATH=./
torchrun --nproc_per_node=8 --master_port=20001 toolbench/train/train_mem.py \
    --model_name_or_path huggyllama/llama-7b  \
    --data_path  data/processed/weather_demo.json \
    --bf16 False \
    --output_dir output \
    --num_train_epochs 3 \
    --per_device_train_batch_size 2 \
    --per_device_eval_batch_size 2 \
    --gradient_accumulation_steps 8 \
    --evaluation_strategy "steps" \
    --eval_steps 1500 \
    --save_strategy "steps" \
    --save_steps 1500 \
    --save_total_limit 8 \
    --learning_rate 5e-5 \
    --weight_decay 0. \
    --warmup_ratio 0.04 \
    --lr_scheduler_type "cosine" \
    --logging_steps 1 \
    --fsdp "full_shard auto_wrap" \
    --fsdp_transformer_layer_cls_to_wrap 'LlamaDecoderLayer' \
    --tf32 False \
    --model_max_length 2048 \
    --gradient_checkpointing True \
    --lazy_preprocess True

需要注意的是，SFT训练只是让模型学会将输入的”问题“，转化为一系列chain of thought子步骤以及工具api调用，而实际的工具api调用结果还是需要通过动态返回。

四、模型推理（Inference）

0x1：Inference with Command Line Interface

准备api密钥和python路径：

source BMTools/secret_keys.sh
export PYTHONPATH=BMTools

下面的命令需要大约 14GB 的 GPU 内存用于 ToolLLaMA-7B。将 /path/to/ToolLLaMA/weights 替换为您转换后的 ToolLLaMA 权重路径。

# or single tool inference:
python toolbench/inference/inference_single_tool.py \
    --tool_name weather \
    --model_path /path/to/ToolLLaMA/weights

# for lora:
python toolbench/inference/inference_single_tool.py \
    --tool_name weather \
    --model_path /path/to/llama/weights \
    --lora_path /path/to/lora/weights

# For multi tools inference:
python toolbench/inference/inference_multi_tools.py \
    --model_path /path/to/ToolLLaMA/weights

五、一些类似的项目

0x1：Rewoo

https://huggingface.co/rewoo

https://huggingface.co/datasets/rewoo/planner_instruction_tuning_2k/viewer/rewoo--planner_instruction_tuning_2k/train?row=0

https://arxiv.org/pdf/2305.18323.pdf

六、一些思考和感悟 - 大模型提供一种新的数据驱动的编程范式

陆奇博士提到大模型改变了人机交互方式，这是从C端的角度或者说从用户的角度阐述的。笔者认为除此之外，大模型对开发者或者说对服务提供商来说，它还有更深层次的意义。

大模型带来的范式改变的本质在于，它让深度神经网络具备了可编程性，基于这个特性，可以衍生出无穷的变化。笔者用编程语言和软件开发做一个类比：

大模型本身相当于一种“元编程语言”，它本身只具备很有限的编程开发能力，只能实现最基本的功能
prompt工程（CoT Chains、问答模式、角色扮演）的本质就是在定义具体编程语言的语法、语义、格式、API交互方式、函数调用规约等。进行prompt工程有两种方式
- few-shot prompt generation：不改变基模型参数直接进行文本生成
- sft base on prompt dataset
基于prompt工程后的数据集，进行SFT得到SFT-LLM，本质上就是将大模型这个“元编程语言”转换为“prompt-completions orient programer“，即面向prompt-completions的编程语言，这种新的“编程语言”符合prompt工程定义的语法、语义、API交互方式、函数调用规约等特征。它本质是一种面向数据的NLP编程范式，开发者可以通过NLP的方式对数据进行自由度极高的编程开发，编程的过程就是prompt template的构建和优化。
prompt-completions是沟通外部输入和大模型输出的桥梁

在上述元素中，“prompt-completions orient programer“的变化理论上是无限的，它是一个极度开放的搜索空间。在GPT-3.5/GPT-4开放后，以及“SFT base on task-orient prompt-completions”技术被提出后，如雨后春笋冒出的各种开源大模型项目就可以印证这一点。

最令人兴奋的还不是“prompt-completions orient programer“的无限开放性，而是基于“prompt-completions orient programer“得到的“SFT-LLM on task-orient prompt-completions”本身也具备远超传统机器学习模型或者传统编程软件的泛化性能。

以上两个维度的无限开放性互相叠加在一起，使得大模型技术域呈现出前所未有的技术爆炸现象。

posted @ 2023-06-08 16:31 郑瀚Andrew 阅读(1300) 评论(0) 编辑收藏举报

会员力量，点亮园子希望

刷新页面返回顶部

Han Zheng, Practitioners and Theoretical Researcher, Now working in Alibaba Cloud Corp, China

Welcome to contact me. Wechat：LittleHann，My email, 306211321@qq.com，Job mail：zhenghan.zh@alibaba-inc.com

ToolBench：一种整合了“Multi Steps CoT Chains”和“tool learning”的新SFT范式

一、研发背景

二、SFT训练数据集生成

0x1：将原始问题转化为“Multi Steps CoT Chains”和“tool learning”的格式

1、将原始问题展开为包含工具调用的思维链的描述形式（人工prompt增强）

2、基于few-shot prompt和instruction prompt，输入GPT得到精简格式的“Multi Steps CoT Chains”和“tool learning”范式的prompt（形式化prompt）

3、基于GPT Plugins得到上一步prompt对应的completions（获取包含API调用结果的思维链执行结果）

4、基于GPT对整个“Multi Steps CoT Chains”和“tool learning”的子步骤completions结果进行summary总结，得到一个最终answer

0x2：将整个“Multi Steps CoT Chains”和“tool learning”的prompt-completions pair处理成一个<问题，人类对该问题的思维链和工具调用的流程分解定义，gpt的最终回答>的prompt template

三、模型微调（SFT fine-tune）

0x1：安装

0x2：数据处理

0x3：训练

四、模型推理（Inference）

0x1：Inference with Command Line Interface

五、一些类似的项目

0x1：Rewoo

六、一些思考和感悟 - 大模型提供一种新的数据驱动的编程范式

公告