Transformers-pipline

HF Transformers Pipelines

Pipelines接口方式

任务名称	参数名称	参数描述
sentiment-analysis	model	指定使用的模型名称或路径。
	tokenizer	指定使用的分词器名称或路径。
	framework	选择使用的深度学习框架，`"pt"` 表示 PyTorch，`"tf"` 表示 TensorFlow。
	device	设置使用的设备，`-1` 表示使用 CPU，`0` 表示使用第一个 GPU。
text-generation	model	指定使用的模型名称或路径。
	tokenizer	指定使用的分词器名称或路径。
	max_length	生成文本的最大长度。
	do_sample	是否随机采样生成文本。
	top_p	控制生成文本的多样性。
translation_en_to_fr	model	指定使用的模型名称或路径。
	tokenizer	指定使用的分词器名称或路径。
	src_lang	输入文本的语言。
	tgt_lang	期望输出的语言。
question-answering	model	指定使用的模型名称或路径。
	tokenizer	指定使用的分词器名称或路径。
	context	提供上下文的文本。
	question	需要回答的问题。
text-classification	model	指定使用的模型名称或路径。
	tokenizer	指定使用的分词器名称或路径。
	return_all_scores	是否返回所有类别的得分，默认为 `False`。
	topk	返回得分最高的前 `k` 个类别，默认为 `1`。

Pipelines 已支持的完整任务列表：https://huggingface.co/docs/transformers/task_summary

transformers 自定义模型下载的路径

在transformers自定义模型下载的路径方法,调用pipeline会缓存模型.下面配置缓存路径

import os

os.environ['HF_HOME'] = '/mnt/new_volume/hf'
os.environ['HF_HUB_CACHE'] = '/mnt/new_volume/hf/hub'

接口示例(默认模型对中文理解不好,以下输出可能存在问题)

from transformers import pipeline

# 仅指定任务时，使用默认模型（不推荐）
pipe = pipeline("sentiment-analysis")
pipe("哈尔滨好冷")

output:
    [{'label': 'NEGATIVE', 'score': 0.8832131028175354}]


pipe("这道菜味道不错")
output:
    [{'label': 'NEGATIVE', 'score': 0.8870086669921875}]


# 替换为英文后，文本分类任务的表现立刻改善
pipe("You learn things really quickly. You understand the theory class as soon as it is taught.")

output:
    [{'label': 'POSITIVE', 'score': 0.9961802959442139}]

批处理调用模型推理

text_list = [
    "哈尔滨冰雪大世界很好玩",
    "I like Harbin",
    "You are very good at playing ball."
]

pipe(text_list)

output:
    [{'label': 'NEGATIVE', 'score': 0.84312504529953},
     {'label': 'POSITIVE', 'score': 0.6818807125091553},
     {'label': 'POSITIVE', 'score': 0.999847412109375}]

Token 分类

Token classification是一种自然语言处理（NLP）任务，它涉及到将文本中的每个单词或标记（token）分类到一个预定义的类别中。这种任务在多种应用场景中都非常有用，比如：

词性标注（Part-of-Speech Tagging）
- 为句子中的每个单词分配一个词性标签，如名词、动词、形容词等。
命名实体识别（Named Entity Recognition, NER）
- 识别文本中的特定实体，如人名、地名、组织名等，并将它们分类。
情感分析（Sentiment Analysis）
- 对文本中的每个标记进行情感分类，判断其是积极的、消极的还是中性的。
语义角色标注（Semantic Role Labeling）
- 识别句子中的动作（谓词）和与动作相关的参与者（如施事者、受事者）。

在深度学习和机器学习领域，Token classification任务通常使用预训练的模型来解决，这些模型在大规模语料库上进行预训练，然后可以在特定任务上进行微调。Hugging Face的Transformers库提供了多种预训练模型，这些模型可以用于处理Token classification任务。

命名实体识别 ner

from transformers import pipeline

classifier = pipeline(task="ner")


preds = classifier("Hugging Face is a French company based in New York City.")
preds = [
    {
        "entity": pred["entity"],
        "score": round(pred["score"], 4),
        "index": pred["index"],
        "word": pred["word"],
        "start": pred["start"],
        "end": pred["end"],
    }
    for pred in preds
]
print(*preds, sep="\n")



output:
    {'entity': 'I-ORG', 'score': 0.9968, 'index': 1, 'word': 'Hu', 'start': 0, 'end': 2}
    {'entity': 'I-ORG', 'score': 0.9293, 'index': 2, 'word': '##gging', 'start': 2, 'end': 7}
    {'entity': 'I-ORG', 'score': 0.9763, 'index': 3, 'word': 'Face', 'start': 8, 'end': 12}
    {'entity': 'I-MISC', 'score': 0.9983, 'index': 6, 'word': 'French', 'start': 18, 'end': 24}
    {'entity': 'I-LOC', 'score': 0.999, 'index': 10, 'word': 'New', 'start': 42, 'end': 45}
    {'entity': 'I-LOC', 'score': 0.9987, 'index': 11, 'word': 'York', 'start': 46, 'end': 50}
    {'entity': 'I-LOC', 'score': 0.9992, 'index': 12, 'word': 'City', 'start': 51, 'end': 55}

合并实体

classifier = pipeline(task="ner", grouped_entities=True)
classifier("Hugging Face is a French company based in New York City.")

output:
    [{'entity_group': 'ORG',
      'score': 0.96746373,
      'word': 'Hugging Face',
      'start': 0,
      'end': 12},
     {'entity_group': 'MISC',
      'score': 0.99828726,
      'word': 'French',
      'start': 18,
      'end': 24},
     {'entity_group': 'LOC',
      'score': 0.99896103,
      'word': 'New York City',
      'start': 42,
      'end': 55}]

Question Answering

from transformers import pipeline

question_answerer = pipeline(task="question-answering")



preds = question_answerer(
    question="What is the name of the repository?",
    context="The name of the repository is huggingface/transformers",
)
print(
    f"score: {round(preds['score'], 4)}, start: {preds['start']}, end: {preds['end']}, answer: {preds['answer']}"
)



output:
score: 0.9327, start: 30, end: 54, answer: huggingface/transformers




preds = question_answerer(
    question="What is the capital of China?",
    context="On 1 October 1949, CCP Chairman Mao Zedong formally proclaimed the People's Republic of China in Tiananmen Square, Beijing.",
)
print(
    f"score: {round(preds['score'], 4)}, start: {preds['start']}, end: {preds['end']}, answer: {preds['answer']}"
)


output:
score: 0.9458, start: 115, end: 122, answer: Beijing

posted @ 2024-12-14 17:24 MKY-门可意阅读(143) 评论(0) 收藏举报

刷新页面返回顶部