通过llama-cpp-python web server 实现函数调用

ollama 在最新的版本中实现了函数调用，但是处理上还是有一些bug 的，llama-cpp-python web server 是利用了llama.cpp web server 同时进行了一些request 的处理，可以更好的兼容openai 支持了tools 函数调用，以下是基于llama-cpp-python web server 的
一个示例（注意需要模型支持函数调用，比如qwen2 就支持）

安装依赖

llama-cpp-python 包含web server

CMAKE_ARGS="-DLLAVA_BUILD=OFF" pip install llama-cpp-python[server]

启动服务

下载 qwen2:7b的gguf 格式模型
可以直接通过huggingface_hub 工具下载gguf 格式的模型

huggingface-cli download  Qwen/Qwen2-7B-Instruct-GGUF  qwen2-7b-instruct-q4_0.gguf  --local-dir . 

启动web server

python3 -m llama_cpp.server --model   ./Qwen2-7B-Instruct-GGUF Qwen2-7B-Instruct.Q4_K_M.gguf --model_alias qwen2:7b --host 0.0.0.0

代码访问
一个简单示例，强制使用了一个function，使用auto 模式的暂时不是很稳定

import openai

import json
 
def add(a, b):

    return a + b
 
def sub(a, b):

    return a - b
 
openai.api_key = "demo"

openai.base_url = "http://localhost:8000/v1/"
 
funcs = {

    "add": add,

    "sub": sub

}
 
tools = [{

    "type": "function",

            "function": {

                "name": "add",

                "description": "Add two numbers together",

                "parameters": {

                    "type": "object",

                    "properties": {

                        "a": {"type": "number", "description": "First number"},

                        "b": {"type": "number", "description": "Second number"}

                    },

                    "required": ["a", "b"]

                }

            }

},

    {

    "type": "function",

            "function": {

                "name": "sub",

                "description": "计算两个数的差",

                "parameters": {

                    "type": "object",

                    "properties": {

                        "a": {"type": "number", "description": "First number"},

                        "b": {"type": "number", "description": "Second number"}

                    },

                    "required": ["a", "b"]

                }

            }

}

]

response = openai.chat.completions.create(

    stream=False,

    model="qwen2:7b",

    messages=[

        {"role": "system", "content": "You are a helpful assistant."},

        {"role": "user", "content": "Calculate the subtraction of 500 and 70 using the sub function."}

    ],

    tools=tools,

    tool_choice={"type": "function", "function": {

        "name": "sub"}},

)

message = response.choices[0].message

print(message)

if message.function_call:

    result = response.choices[0].message.function_call

    print(result)

    func = funcs.get(result.name)

    params = json.loads(result.arguments)

    print(func(**params))

else:

    print(message.content)

效果

说明

一个稳定的函数调用还是比较重要的，目前auto测试基本不成功，通过tool_choice 明确指定一个很稳定，同时对于函数调用必须有支持的模型这个比较重要，当前已经有不少开源模型都支持函数调用了，llama-cpp-python web server 还是很值得使用的，而且比较稳定，相比ollama 的稳定不少

参考资料

https://llama-cpp-python.readthedocs.io/en/latest/server/#function-calling
https://github.com/abetlen/llama-cpp-python/issues/1573

posted on 2024-08-25 07:11 荣锋亮阅读(156) 评论(0) 编辑收藏举报

刷新页面返回顶部

rongfengliang-荣锋亮

通过llama-cpp-python web server 实现函数调用

安装依赖

启动服务

说明

参考资料

导航

公告