从Hugging Face下载模型到本地并调用

不同的模型需要的显存不同，下载前先查一下自己GPU能支持什么模型

1. 用如下脚本可以下载HuggingFace上的各种模型，网址 https://huggingface.co/models

download.py

#coding=gbk
import time
from huggingface_hub import snapshot_download
#huggingface上的模型名称
repo_id = "LinkSoul/Chinese-Llama-2-7b-4bit"
#本地存储地址
local_dir = "E:\\work\\AI\\GPT\\llama_model_7b_4bit"
cache_dir = local_dir + "\\cache"
while True:
    try:
        snapshot_download(cache_dir=cache_dir,
        local_dir=local_dir,
        repo_id=repo_id,
        local_dir_use_symlinks=False,
        resume_download=True,
        allow_patterns=["*.model", "*.json", "*.bin",
        "*.py", "*.md", "*.txt"],
        ignore_patterns=["*.safetensors", "*.msgpack",
        "*.h5", "*.ot",],
        )
    except Exception as e :
        print(e)
        # time.sleep(5)
    else:
        print('下载完成')
        break

2. 本地环境

要运行下载的llama模型需要先创建conda虚拟环境，博主是在windows机器上安装了anaconda，创建一个虚拟环境，命令行输入

conda create -n LLM_env python=3.10

这里python版本选择了3.10，后面要跟pytorch对应上

3. 查看cuda版本

4. 安装pytorch，参考网页

https://blog.csdn.net/threestooegs/article/details/119531414

和

https://pytorch.org/get-started/previous-versions/

博主安装的是1.13.1这个版本

当然，还可能缺少一些其他的包，就看少什么装什么吧。这边可能有个坑，windows安装bitsandbytes库的问题

importlib.metadata.PackageNotFoundError: No package metadata was found for bitsandbytes

bitsandbytes-windows版本过低，重新安装高版本

pip install --trusted-host github.com --trusted-host objects.githubusercontent.com https://github.com/jllllll/bitsandbytes-windows-webui/releases/download/wheels/bitsandbytes-0.41.0-py3-none-win_amd64.whl

5. pycharm配置

pycharm上新建项目，interpreter选择刚创建的虚拟环境

6. 编写测试代码

test.py

import torch
from transformers import AutoTokenizer, AutoModelForCausalLM, TextStreamer

#本地模型路径
# model_path = "E:\\work\\AI\\GPT\\llama_model"
model_path = "E:\\work\\AI\\GPT\\llama_model_4bit"
# model_path = "E:\\work\\AI\\GPT\\llama_model_7b_8bit"
print(torch.cuda.is_available())

if torch.cuda.is_available():
    print(torch.cuda.device_count())
    device = torch.device("cuda:0" if torch.cuda.is_available() else "cpu")
    print(device)
else:
    print('没有GPU')

tokenizer = AutoTokenizer.from_pretrained(model_path, use_fast=False)
if model_path.endswith("4bit"):
    model = AutoModelForCausalLM.from_pretrained(
            model_path,
            load_in_4bit=True,
            torch_dtype=torch.float16,
            device_map='auto'
        )
elif model_path.endswith("8bit"):
        model = AutoModelForCausalLM.from_pretrained(
            model_path,
            load_in_8bit=True,
            torch_dtype=torch.float16,
            device_map='auto'
        )
else:
    model = AutoModelForCausalLM.from_pretrained(model_path).half().cuda()
streamer = TextStreamer(tokenizer, skip_prompt=True, skip_special_tokens=True)

instruction = """[INST] <<SYS>>\nYou are a helpful, respectful and honest assistant. Always answer as helpfully as possible, while being safe.  Your answers should not include any harmful, unethical, racist, sexist, toxic, dangerous, or illegal content. Please ensure that your responses are socially unbiased and positive in nature.

            If a question does not make any sense, or is not factually coherent, explain why instead of answering something not correct. If you don't know the answer to a question, please don't share false information.\n<</SYS>>\n\n{} [/INST]"""

prompt = instruction.format("Hello, what the meaning of life？")
generate_ids = model.generate(tokenizer(prompt, return_tensors='pt').input_ids.cuda(), max_new_tokens=4096, streamer=streamer)

webui

还有一种本地运行的方法，是网页形式的

参考：

https://www.cnblogs.com/zhizhixiaoxia/p/17414798.html

https://github.com/oobabooga/text-generation-webui/tree/main

posted on 2023-12-04 15:23 午夜稻草人阅读(8786) 评论(0) 编辑收藏举报

刷新页面返回顶部

午夜稻草人

公告

webui