不同的模型需要的显存不同,下载前先查一下自己GPU能支持什么模型
1. 用如下脚本可以下载HuggingFace上的各种模型, 网址 https://huggingface.co/models
download.py
#coding=gbk import time from huggingface_hub import snapshot_download #huggingface上的模型名称 repo_id = "LinkSoul/Chinese-Llama-2-7b-4bit" #本地存储地址 local_dir = "E:\\work\\AI\\GPT\\llama_model_7b_4bit" cache_dir = local_dir + "\\cache" while True: try: snapshot_download(cache_dir=cache_dir, local_dir=local_dir, repo_id=repo_id, local_dir_use_symlinks=False, resume_download=True, allow_patterns=["*.model", "*.json", "*.bin", "*.py", "*.md", "*.txt"], ignore_patterns=["*.safetensors", "*.msgpack", "*.h5", "*.ot",], ) except Exception as e : print(e) # time.sleep(5) else: print('下载完成') break
2. 本地环境
要运行下载的llama模型需要先创建conda虚拟环境,博主是在windows机器上安装了anaconda, 创建一个虚拟环境,命令行输入
conda create -n LLM_env python=3.10
这里python版本选择了3.10, 后面要跟pytorch对应上
3. 查看cuda版本
4. 安装pytorch,参考网页
https://blog.csdn.net/threestooegs/article/details/119531414
和
https://pytorch.org/get-started/previous-versions/
博主安装的是1.13.1这个版本
当然,还可能缺少一些其他的包,就看少什么装什么吧。这边可能有个坑,windows安装bitsandbytes库的问题
importlib.metadata.PackageNotFoundError: No package metadata was found for bitsandbytes bitsandbytes-windows版本过低,重新安装高版本 pip install --trusted-host github.com --trusted-host objects.githubusercontent.com https://github.com/jllllll/bitsandbytes-windows-webui/releases/download/wheels/bitsandbytes-0.41.0-py3-none-win_amd64.whl
5. pycharm配置
pycharm上新建项目,interpreter选择刚创建的虚拟环境
6. 编写测试代码
test.py
import torch from transformers import AutoTokenizer, AutoModelForCausalLM, TextStreamer #本地模型路径 # model_path = "E:\\work\\AI\\GPT\\llama_model" model_path = "E:\\work\\AI\\GPT\\llama_model_4bit" # model_path = "E:\\work\\AI\\GPT\\llama_model_7b_8bit" print(torch.cuda.is_available()) if torch.cuda.is_available(): print(torch.cuda.device_count()) device = torch.device("cuda:0" if torch.cuda.is_available() else "cpu") print(device) else: print('没有GPU') tokenizer = AutoTokenizer.from_pretrained(model_path, use_fast=False) if model_path.endswith("4bit"): model = AutoModelForCausalLM.from_pretrained( model_path, load_in_4bit=True, torch_dtype=torch.float16, device_map='auto' ) elif model_path.endswith("8bit"): model = AutoModelForCausalLM.from_pretrained( model_path, load_in_8bit=True, torch_dtype=torch.float16, device_map='auto' ) else: model = AutoModelForCausalLM.from_pretrained(model_path).half().cuda() streamer = TextStreamer(tokenizer, skip_prompt=True, skip_special_tokens=True) instruction = """[INST] <<SYS>>\nYou are a helpful, respectful and honest assistant. Always answer as helpfully as possible, while being safe. Your answers should not include any harmful, unethical, racist, sexist, toxic, dangerous, or illegal content. Please ensure that your responses are socially unbiased and positive in nature. If a question does not make any sense, or is not factually coherent, explain why instead of answering something not correct. If you don't know the answer to a question, please don't share false information.\n<</SYS>>\n\n{} [/INST]""" prompt = instruction.format("Hello, what the meaning of life?") generate_ids = model.generate(tokenizer(prompt, return_tensors='pt').input_ids.cuda(), max_new_tokens=4096, streamer=streamer)
webui
还有一种本地运行的方法,是网页形式的
参考:
https://www.cnblogs.com/zhizhixiaoxia/p/17414798.html
https://github.com/oobabooga/text-generation-webui/tree/main