千问2模型使用transformers调用时的一种量化方法

github地址:千问2

调式千问7B的一些问题记录

官方源码:

from transformers import AutoModelForCausalLM, AutoTokenizer
model_name = "Qwen/Qwen2-7B-Instruct"
device = "cuda" # the device to load the model onto
model = AutoModelForCausalLM.from_pretrained(
model_name,
torch_dtype="auto",
device_map="auto"
)
tokenizer = AutoTokenizer.from_pretrained(model_name)
prompt = "Give me a short introduction to large language model."
messages = [
{"role": "system", "content": "You are a helpful assistant."},
{"role": "user", "content": prompt}
]
text = tokenizer.apply_chat_template(
messages,
tokenize=False,
add_generation_prompt=True
)
model_inputs = tokenizer([text], return_tensors="pt").to(device)
generated_ids = model.generate(
**model_inputs,
max_new_tokens=512
)
generated_ids = [
output_ids[len(input_ids):] for input_ids, output_ids in zip(model_inputs.input_ids, generated_ids)
]
response = tokenizer.batch_decode(generated_ids, skip_special_tokens=True)[0]

配置参数

指定量化方式

"""
int4 量化代码
"""
from transformers import (
AutoConfig,
AutoModelForCausalLM,
AutoTokenizer,
BitsAndBytesConfig
)
import sys
model_name_or_path = sys.argv[1]
quantization_config = BitsAndBytesConfig(
load_in_4bit=True,
bnb_4bit_compute_dtype=None,
bnb_4bit_use_double_quant=True,
bnb_4bit_quant_type='nf4'
)
tokenizer = AutoTokenizer.from_pretrained(
model_name_or_path, trust_remote_code=True
)
model = AutoModelForCausalLM.from_pretrained(
model_name_or_path,
device_map="auto",
quantization_config=quantization_config,
trust_remote_code=True).eval()
system = input('system:')
history = None
while True:
question = input('user:')
if question == 'clear':
system = input('system:')
history = None
continue
response, history = model.chat(tokenizer=tokenizer, query=query, system=system, history=history)
print(response)

问题

由于此种方式需要使用bitsandbytes库,我在之前没有安装过,故而报错'importlib.metadata.PackageNotFoundError: No package metadata was found for bitsandbytes'。解决方案:直接pip install bitsandbytes安装对应的库即可。

posted @   Kevinarcsin001  阅读(50)  评论(0编辑  收藏  举报
编辑推荐:
· 一个奇形怪状的面试题:Bean中的CHM要不要加volatile?
· [.NET]调用本地 Deepseek 模型
· 一个费力不讨好的项目,让我损失了近一半的绩效!
· .NET Core 托管堆内存泄露/CPU异常的常见思路
· PostgreSQL 和 SQL Server 在统计信息维护中的关键差异
阅读排行:
· CSnakes vs Python.NET:高效嵌入与灵活互通的跨语言方案对比
· DeepSeek “源神”启动!「GitHub 热点速览」
· 我与微信审核的“相爱相杀”看个人小程序副业
· Plotly.NET 一个为 .NET 打造的强大开源交互式图表库
· 上周热点回顾(2.17-2.23)
点击右上角即可分享
微信分享提示