Llama3的本地部署

链接地址：

github: https://github.com/meta-llama/llama3

huggingface: https://huggingface.co/meta-llama

metaAI: https://llama.meta.com/

Llama3是Meta于2024年4月18日开源的LLM，目前开放了8B和70B两个版本，两个版本均支持最大为8192个token的序列长度( GPT-4支持128K = = )

Llama3在Meta自制的两个24K GPU集群上进行预训练，使用15T的训练数据，其中5%为非英文数据，故Llama3的中文能力稍弱，Meta认为Llama3是目前最强的开源大模型

Llama3如果单纯使用的话推荐用Ollama进行部署，简单方便，我有具体的下游任务，所以需要自己微调

- 部署准备

-- 硬件

在本地对8B版本的Llama3进行了部署测试，硬件配置为

CPU i7-12700F
GPU NVIDIA Geforce RTX 3060 12G
RAM 32GB * 2

-- 环境

Llama3的部署环境对各个包的版本需求有些严格，需要注意，否则会报各种错误，环境列表附在最后(去最上面的github里找也可，我环境里可能有单纯部署之外用不到的包)，其中最需要注意的是transformers的版本，需要大于4.39.0 ( 我用的4.40.1 )，

因为Llama3比较新，老版本的transformers里没有Llama3的模型和分词器，另外就是pytorch和cuda的版本，torch 2.1.0 + cu118，主要是transformers对cuda版本有要求，部署过程中遇到的多数错误都是包的版本问题。

-- 模型权重

模型的权重可以去最上的meta或huggingface链接去下载，但是需要获得meta的授权，注册账号提个申请(玛德，给我拒了)

也可使用魔搭的库进行下载，还挺快的，推荐，下载代码如下

from modelscope import snapshot_download

cache_dir = r'D:\data\Llama3\LLM-Research'
model_dir = snapshot_download('LLM-Research/Meta-Llama-3-8B-Instruct',cache_dir=cache_dir)

# Meta-Llama-3-70B-Instruct 70B的名称

运行代码下载即可，cache_dir为权重文件的缓存路径，8B下载好的文件大小为14.9G，70B的为131G，预留好足够的空间

下载好的路径下有这些东西

- 模型推理

模型的推理按照下面代码执行即可，比较简单，我没写UI之类的，能用就行

from transformers import AutoTokenizer, AutoModelForCausalLM
import torch

model_dir = r'D:\data\Llama3\LLM-Research\Meta-Llama-3-8B-Instruct'
device = 'cuda'


tokenizer = AutoTokenizer.from_pretrained(model_dir)
model = AutoModelForCausalLM.from_pretrained(model_dir, torch_dtype='auto', device_map=device)

while 1:
    print(f'Enter a prompt to generate a response:')
    prompt = input()
    messages = [
        {'role': 'system', 'content': 'aaa'},
        {'role': 'user', 'content': prompt}
    ]

    text = tokenizer.apply_chat_template(
        messages,
        tokenize=False,
        add_generation_prompt=True
    )

    model_input = tokenizer([text], return_tensors='pt').to(device)
    attention_mask = torch.ones(model_input.input_ids.shape, dtype=torch.long, device=device)
    generated_ids = model.generate(
        model_input.input_ids,
        max_new_tokens=512,
        attention_mask=attention_mask,  
        pad_token_id=tokenizer.eos_token_id,  
    )

    generated_ids = [output_ids[len(input_ids):] for input_ids, output_ids in zip(model_input.input_ids, generated_ids)]

    response = tokenizer.batch_decode(generated_ids, skip_special_tokens=True)[0]

    print(f'{response} \n')

运行成功，Llama3的简单部署成功

accelerate==0.29.3
addict==2.4.0
aiohttp==3.9.5
aiosignal==1.3.1
aliyun-python-sdk-core==2.15.1
aliyun-python-sdk-kms==2.16.2
anyio==4.3.0
argon2-cffi==23.1.0
argon2-cffi-bindings==21.2.0
arrow==1.3.0
asttokens==2.4.1
async-lru==2.0.4
async-timeout==4.0.3
attrs==23.2.0
Babel==2.15.0
backcall==0.2.0
beautifulsoup4==4.12.3
bleach==6.1.0
certifi==2024.2.2
cffi==1.16.0
charset-normalizer==3.3.2
colorama==0.4.6
comm==0.2.2
crcmod==1.7
cryptography==42.0.5
datasets==2.18.0
debugpy==1.8.1
decorator==5.1.1
defusedxml==0.7.1
dill==0.3.8
einops==0.8.0
exceptiongroup==1.2.1
executing==2.0.1
fastjsonschema==2.19.1
filelock==3.13.4
fqdn==1.5.1
frozenlist==1.4.1
fsspec==2024.2.0
gast==0.5.4
h11==0.14.0
httpcore==1.0.5
httpx==0.27.0
huggingface-hub==0.22.2
idna==3.7
importlib_metadata==7.1.0
importlib_resources==6.4.0
ipykernel==6.29.4
ipython==8.12.3
ipywidgets==8.1.2
isoduration==20.11.0
jedi==0.19.1
Jinja2==3.1.3
jmespath==0.10.0
json5==0.9.25
jsonpointer==2.4
jsonschema==4.22.0
jsonschema-specifications==2023.12.1
jupyter==1.0.0
jupyter-console==6.6.3
jupyter-events==0.10.0
jupyter-lsp==2.2.5
jupyter_client==8.6.1
jupyter_core==5.7.2
jupyter_server==2.14.0
jupyter_server_terminals==0.5.3
jupyterlab==4.1.8
jupyterlab_pygments==0.3.0
jupyterlab_server==2.27.1
jupyterlab_widgets==3.0.10
MarkupSafe==2.1.5
matplotlib-inline==0.1.7
mistune==3.0.2
modelscope==1.14.0
mpmath==1.3.0
multidict==6.0.5
multiprocess==0.70.16
nbclient==0.10.0
nbconvert==7.16.4
nbformat==5.10.4
nest-asyncio==1.6.0
networkx==3.1
notebook==7.1.3
notebook_shim==0.2.4
numpy==1.24.4
oss2==2.18.4
overrides==7.7.0
packaging==24.0
pandas==2.0.3
pandocfilters==1.5.1
parso==0.8.4
pickleshare==0.7.5
pillow==10.3.0
pkgutil_resolve_name==1.3.10
platformdirs==4.2.1
prometheus_client==0.20.0
prompt-toolkit==3.0.43
psutil==5.9.8
pure-eval==0.2.2
pyarrow==16.0.0
pyarrow-hotfix==0.6
pycparser==2.22
pycryptodome==3.20.0
Pygments==2.18.0
python-dateutil==2.9.0.post0
python-json-logger==2.0.7
pytorch-cuda==12.1
pytz==2024.1
pywin32==306
pywinpty==2.0.13
PyYAML==6.0.1
pyzmq==26.0.3
qtconsole==5.5.2
QtPy==2.4.1
referencing==0.35.1
regex==2024.4.28
requests==2.31.0
rfc3339-validator==0.1.4
rfc3986-validator==0.1.1
rpds-py==0.18.0
safetensors==0.4.3
scipy==1.10.1
Send2Trash==1.8.3
simplejson==3.19.2
six==1.16.0
sniffio==1.3.1
sortedcontainers==2.4.0
soupsieve==2.5
stack-data==0.6.3
sympy==1.12
terminado==0.18.1
tinycss2==1.3.0
tokenizers==0.19.1
tomli==2.0.1
torch==2.1.0+cu118
torchaudio==2.1.0
torchvision==0.16.0
tornado==6.4
tqdm==4.66.2
traitlets==5.14.3
transformers==4.40.1
types-python-dateutil==2.9.0.20240316
typing_extensions==4.11.0
tzdata==2024.1
uri-template==1.3.0
urllib3==2.2.1
wcwidth==0.2.13
webcolors==1.13
webencodings==0.5.1
websocket-client==1.8.0
widgetsnbextension==4.0.10
xlrd==1.2.0
xxhash==3.4.1
yapf==0.40.2
yarl==1.9.4
zipp==3.18.1

posted @ 2024-05-14 17:52 Liang-ml 阅读(7175) 评论(0) 收藏举报

刷新页面返回顶部

Liang-ml

Llama3的本地部署

公告