使用llama.cpp部署Qwen2.5-7B-Instruct模型

这里选用Qwen2.5-7B-Instruct做例子，其他LLM类似。

VL用这个流程暂时还不行，不过我看到llama.cpp有在讨论这个问题，我验证了也是可行的，后面整理一下。

这里部署流程如下：

1. 在modelscope上将Qwen2.5-7B-Instruct下载下来。

3. 编译llama.cpp，通常到目录下执行 mkdir build、cd build、cmake .. 、make -j8一套下来就可以，在./build/bin下会生成很多可执行文件。

4. 在llama.cpp工程下找到convert_hf_to_gguf.py，执行

python convert_hf_to_gguf.py ./model_path

model_path目录下会生成Qwen2.5-7B-Instruct-7.6B-F16.gguf文件。

5. （量化，可选）如果电脑性能不够，可以执行量化选项：

./llama-quantize ./model_path/Qwen2.5-7B-Instruct-7.6B-F16.gguf Qwen2.5-7B-Instruct-7.6B-Q4_K_M.gguf Q4_K

输出为Qwen2.5-7B-Instruct-7.6B-Q4_K_M.gguf文件。

量化有几种选项，Q4_K量化后基本能缩小到原模型的1/3，可以直接输入llama-quantize查看各种选项。

6. 最后使用该gguf文件：

./llama-cli -m Qwen2.5-7B-Instruct-7.6B-Q4_K_M.gguf -p "You are a helpful assistant" -cnv

后面根据提示就能对话了。

posted @ 2024-11-17 23:44 Dsp Tian 阅读(666) 评论(1) 编辑收藏举报

刷新页面返回顶部

Dsp Tian