Transformers--4-37-中文文档-二-

Transformers 4.37 中文文档（二）

原文：huggingface.co/docs/transformers

因果语言建模

原始文本：huggingface.co/docs/transformers/v4.37.2/en/tasks/language_modeling

语言建模有两种类型，因果和掩码。本指南说明了因果语言建模。因果语言模型经常用于文本生成。您可以将这些模型用于创意应用，如选择自己的文本冒险或智能编码助手，如 Copilot 或 CodeParrot。

www.youtube-nocookie.com/embed/Vpjb1lu0MDk

因果语言建模预测令牌序列中的下一个令牌，模型只能关注左侧的令牌。这意味着模型无法看到未来的令牌。GPT-2 是因果语言模型的一个例子。

本指南将向您展示如何：

在ELI5数据集的r/askscience子集上微调DistilGPT2。
使用您微调的模型进行推理。

您可以按照本指南中的相同步骤微调其他架构以进行因果语言建模。选择以下架构之一：

BART, BERT, Bert Generation, BigBird, BigBird-Pegasus, BioGpt, Blenderbot, BlenderbotSmall, BLOOM, CamemBERT, CodeLlama, CodeGen, CPM-Ant, CTRL, Data2VecText, ELECTRA, ERNIE, Falcon, Fuyu, GIT, GPT-Sw3, OpenAI GPT-2, GPTBigCode, GPT Neo, GPT NeoX, GPT NeoX Japanese, GPT-J, LLaMA, Marian, mBART, MEGA, Megatron-BERT, Mistral, Mixtral, MPT, MusicGen, MVP, OpenLlama, OpenAI GPT, OPT, Pegasus, Persimmon, Phi, PLBart, ProphetNet, QDQBert, Qwen2, Reformer, RemBERT, RoBERTa, RoBERTa-PreLayerNorm, RoCBert, RoFormer, RWKV, Speech2Text2, Transformer-XL, TrOCR, Whisper, XGLM, XLM, XLM-ProphetNet, XLM-RoBERTa, XLM-RoBERTa-XL, XLNet, X-MOD

在开始之前，请确保您已安装所有必要的库。

pip install transformers datasets evaluate

我们鼓励您登录您的 Hugging Face 帐户，这样您就可以上传和与社区分享您的模型。在提示时，输入您的令牌以登录：

>>> from huggingface_hub import notebook_login

>>> notebook_login()

加载 ELI5 数据集

首先加载🤗数据集库中 r/askscience 子集的 ELI5 数据集的较小子集。这将让您有机会进行实验，并确保一切正常，然后再花更多时间在完整数据集上进行训练。

>>> from datasets import load_dataset

>>> eli5 = load_dataset("eli5", split="train_asks[:5000]")

使用train_test_split方法将数据集的train_asks拆分为训练集和测试集：

>>> eli5 = eli5.train_test_split(test_size=0.2)

然后看一个例子：

>>> eli5["train"][0]
{'answers': {'a_id': ['c3d1aib', 'c3d4lya'],
  'score': [6, 3],
  'text': ["The velocity needed to remain in orbit is equal to the square root of Newton's constant times the mass of earth divided by the distance from the center of the earth. I don't know the altitude of that specific mission, but they're usually around 300 km. That means he's going 7-8 km/s.\n\nIn space there are no other forces acting on either the shuttle or the guy, so they stay in the same position relative to each other. If he were to become unable to return to the ship, he would presumably run out of oxygen, or slowly fall into the atmosphere and burn up.",
   "Hope you don't mind me asking another question, but why aren't there any stars visible in this photo?"]},
 'answers_urls': {'url': []},
 'document': '',
 'q_id': 'nyxfp',
 'selftext': '_URL_0_\n\nThis was on the front page earlier and I have a few questions about it. Is it possible to calculate how fast the astronaut would be orbiting the earth? Also how does he stay close to the shuttle so that he can return safely, i.e is he orbiting at the same speed and can therefore stay next to it? And finally if his propulsion system failed, would he eventually re-enter the atmosphere and presumably die?',
 'selftext_urls': {'url': ['http://apod.nasa.gov/apod/image/1201/freeflyer_nasa_3000.jpg']},
 'subreddit': 'askscience',
 'title': 'Few questions about this space walk photograph.',
 'title_urls': {'url': []}}

虽然这看起来很多，但您实际上只对text字段感兴趣。语言建模任务的有趣之处在于您不需要标签（也称为无监督任务），因为下一个词就是标签。

预处理

www.youtube-nocookie.com/embed/ma1TrR7gE7I

下一步是加载一个 DistilGPT2 分词器来处理text子字段：

>>> from transformers import AutoTokenizer

>>> tokenizer = AutoTokenizer.from_pretrained("distilgpt2")

从上面的例子中，您会注意到text字段实际上是嵌套在answers中的。这意味着您需要使用flatten方法从其嵌套结构中提取text子字段：

>>> eli5 = eli5.flatten()
>>> eli5["train"][0]
{'answers.a_id': ['c3d1aib', 'c3d4lya'],
 'answers.score': [6, 3],
 'answers.text': ["The velocity needed to remain in orbit is equal to the square root of Newton's constant times the mass of earth divided by the distance from the center of the earth. I don't know the altitude of that specific mission, but they're usually around 300 km. That means he's going 7-8 km/s.\n\nIn space there are no other forces acting on either the shuttle or the guy, so they stay in the same position relative to each other. If he were to become unable to return to the ship, he would presumably run out of oxygen, or slowly fall into the atmosphere and burn up.",
  "Hope you don't mind me asking another question, but why aren't there any stars visible in this photo?"],
 'answers_urls.url': [],
 'document': '',
 'q_id': 'nyxfp',
 'selftext': '_URL_0_\n\nThis was on the front page earlier and I have a few questions about it. Is it possible to calculate how fast the astronaut would be orbiting the earth? Also how does he stay close to the shuttle so that he can return safely, i.e is he orbiting at the same speed and can therefore stay next to it? And finally if his propulsion system failed, would he eventually re-enter the atmosphere and presumably die?',
 'selftext_urls.url': ['http://apod.nasa.gov/apod/image/1201/freeflyer_nasa_3000.jpg'],
 'subreddit': 'askscience',
 'title': 'Few questions about this space walk photograph.',
 'title_urls.url': []}

现在，每个子字段都是一个单独的列，由answers前缀指示，而text字段现在是一个列表。不要单独对每个句子进行分词，而是将列表转换为字符串，以便可以联合对它们进行分词。

这是一个用于连接每个示例的字符串列表并对结果进行分词的第一个预处理函数：

>>> def preprocess_function(examples):
...     return tokenizer([" ".join(x) for x in examples["answers.text"]])

要在整个数据集上应用此预处理函数，请使用🤗数据集的map方法。通过设置batched=True以一次处理数据集的多个元素，并使用num_proc增加进程数量，可以加快map函数的速度。删除您不需要的任何列：

>>> tokenized_eli5 = eli5.map(
...     preprocess_function,
...     batched=True,
...     num_proc=4,
...     remove_columns=eli5["train"].column_names,
... )

该数据集包含令牌序列，但其中一些比模型的最大输入长度更长。

现在可以使用第二个预处理函数

连接所有序列
将连接的序列拆分为由block_size定义的较短块，该块应比最大输入长度短且足够短以适应您的 GPU RAM。

>>> block_size = 128

>>> def group_texts(examples):
...     # Concatenate all texts.
...     concatenated_examples = {k: sum(examples[k], []) for k in examples.keys()}
...     total_length = len(concatenated_examples[list(examples.keys())[0]])
...     # We drop the small remainder, we could add padding if the model supported it instead of this drop, you can
...     # customize this part to your needs.
...     if total_length >= block_size:
...         total_length = (total_length // block_size) * block_size
...     # Split by chunks of block_size.
...     result = {
...         k: [t[i : i + block_size] for i in range(0, total_length, block_size)]
...         for k, t in concatenated_examples.items()
...     }
...     result["labels"] = result["input_ids"].copy()
...     return result

在整个数据集上应用group_texts函数：

>>> lm_dataset = tokenized_eli5.map(group_texts, batched=True, num_proc=4)

现在使用 DataCollatorForLanguageModeling 创建一批示例。在整理过程中，将句子动态填充到批次中的最长长度，而不是将整个数据集填充到最大长度。

Pytorch 隐藏 Pytorch 内容

使用结束序列标记作为填充标记，并设置mlm=False。这将使用输入作为标签，向右移动一个元素：

>>> from transformers import DataCollatorForLanguageModeling

>>> tokenizer.pad_token = tokenizer.eos_token
>>> data_collator = DataCollatorForLanguageModeling(tokenizer=tokenizer, mlm=False)

TensorFlow 隐藏 TensorFlow 内容

使用结束序列标记作为填充标记，并设置mlm=False。这将使用输入作为标签，向右移动一个元素：

>>> from transformers import DataCollatorForLanguageModeling

>>> data_collator = DataCollatorForLanguageModeling(tokenizer=tokenizer, mlm=False, return_tensors="tf")

训练

Pytorch 隐藏 Pytorch 内容

如果您不熟悉使用 Trainer 微调模型，请查看基本教程!

您现在可以开始训练模型了！使用 AutoModelForCausalLM 加载 DistilGPT2：

>>> from transformers import AutoModelForCausalLM, TrainingArguments, Trainer

>>> model = AutoModelForCausalLM.from_pretrained("distilgpt2")

此时，只剩下三个步骤：

在 TrainingArguments 中定义您的训练超参数。唯一必需的参数是output_dir，指定保存模型的位置。通过设置push_to_hub=True将此模型推送到 Hub（您需要登录 Hugging Face 才能上传模型）。
将训练参数传递给 Trainer，以及模型、数据集和数据整理器。
调用 train()来微调您的模型。

>>> training_args = TrainingArguments(
...     output_dir="my_awesome_eli5_clm-model",
...     evaluation_strategy="epoch",
...     learning_rate=2e-5,
...     weight_decay=0.01,
...     push_to_hub=True,
... )

>>> trainer = Trainer(
...     model=model,
...     args=training_args,
...     train_dataset=lm_dataset["train"],
...     eval_dataset=lm_dataset["test"],
...     data_collator=data_collator,
... )

>>> trainer.train()

训练完成后，使用 evaluate()方法评估您的模型并获取其困惑度：

>>> import math

>>> eval_results = trainer.evaluate()
>>> print(f"Perplexity: {math.exp(eval_results['eval_loss']):.2f}")
Perplexity: 49.61

然后使用 push_to_hub()方法将您的模型分享到 Hub，这样每个人都可以使用您的模型：

>>> trainer.push_to_hub()

TensorFlow 隐藏 TensorFlow 内容

如果您不熟悉如何使用 Keras 微调模型，请查看基础教程！

要在 TensorFlow 中微调模型，请首先设置优化器函数、学习率调度和一些训练超参数：

>>> from transformers import create_optimizer, AdamWeightDecay

>>> optimizer = AdamWeightDecay(learning_rate=2e-5, weight_decay_rate=0.01)

然后，您可以使用 TFAutoModelForCausalLM 加载 DistilGPT2：

>>> from transformers import TFAutoModelForCausalLM

>>> model = TFAutoModelForCausalLM.from_pretrained("distilgpt2")

使用 prepare_tf_dataset()将数据集转换为tf.data.Dataset格式：

>>> tf_train_set = model.prepare_tf_dataset(
...     lm_dataset["train"],
...     shuffle=True,
...     batch_size=16,
...     collate_fn=data_collator,
... )

>>> tf_test_set = model.prepare_tf_dataset(
...     lm_dataset["test"],
...     shuffle=False,
...     batch_size=16,
...     collate_fn=data_collator,
... )

使用compile为训练配置模型。请注意，Transformers 模型都有一个默认的与任务相关的损失函数，因此除非您想要指定一个，否则不需要：

>>> import tensorflow as tf

>>> model.compile(optimizer=optimizer)  # No loss argument!

这可以通过在 PushToHubCallback 中指定将模型和标记器推送到何处来完成：

>>> from transformers.keras_callbacks import PushToHubCallback

>>> callback = PushToHubCallback(
...     output_dir="my_awesome_eli5_clm-model",
...     tokenizer=tokenizer,
... )

最后，您已经准备好开始训练您的模型了！使用fit调用您的训练和验证数据集，时代数量以及微调模型的回调：

>>> model.fit(x=tf_train_set, validation_data=tf_test_set, epochs=3, callbacks=[callback])

训练完成后，您的模型会自动上传到 Hub，这样每个人都可以使用它！

有关如何为因果语言建模微调模型的更深入示例，请查看相应的PyTorch 笔记本或TensorFlow 笔记本。

推理

很好，现在您已经微调了一个模型，可以用于推理！

想出一个您想要从中生成文本的提示：

>>> prompt = "Somatic hypermutation allows the immune system to"

尝试使用 pipeline()来进行推理是尝试微调模型的最简单方法。实例化一个用于文本生成的pipeline，并将文本传递给它：

>>> from transformers import pipeline

>>> generator = pipeline("text-generation", model="my_awesome_eli5_clm-model")
>>> generator(prompt)
[{'generated_text': "Somatic hypermutation allows the immune system to be able to effectively reverse the damage caused by an infection.\n\n\nThe damage caused by an infection is caused by the immune system's ability to perform its own self-correcting tasks."}]

Pytorch 隐藏 Pytorch 内容

对文本进行标记化，并将input_ids返回为 PyTorch 张量：

>>> from transformers import AutoTokenizer

>>> tokenizer = AutoTokenizer.from_pretrained("my_awesome_eli5_clm-model")
>>> inputs = tokenizer(prompt, return_tensors="pt").input_ids

使用 generate()方法生成文本。有关不同文本生成策略和控制生成的参数的更多详细信息，请查看文本生成策略页面。

>>> from transformers import AutoModelForCausalLM

>>> model = AutoModelForCausalLM.from_pretrained("my_awesome_eli5_clm-model")
>>> outputs = model.generate(inputs, max_new_tokens=100, do_sample=True, top_k=50, top_p=0.95)

将生成的标记 ID 解码回文本：

>>> tokenizer.batch_decode(outputs, skip_special_tokens=True)
["Somatic hypermutation allows the immune system to react to drugs with the ability to adapt to a different environmental situation. In other words, a system of 'hypermutation' can help the immune system to adapt to a different environmental situation or in some cases even a single life. In contrast, researchers at the University of Massachusetts-Boston have found that 'hypermutation' is much stronger in mice than in humans but can be found in humans, and that it's not completely unknown to the immune system. A study on how the immune system"]

TensorFlow 隐藏 TensorFlow 内容

对文本进行标记化，并将input_ids返回为 TensorFlow 张量：

>>> from transformers import AutoTokenizer

>>> tokenizer = AutoTokenizer.from_pretrained("my_awesome_eli5_clm-model")
>>> inputs = tokenizer(prompt, return_tensors="tf").input_ids

使用 generate()方法创建摘要。有关不同文本生成策略和控制生成的参数的更多详细信息，请查看文本生成策略页面。

>>> from transformers import TFAutoModelForCausalLM

>>> model = TFAutoModelForCausalLM.from_pretrained("my_awesome_eli5_clm-model")
>>> outputs = model.generate(input_ids=inputs, max_new_tokens=100, do_sample=True, top_k=50, top_p=0.95)

将生成的标记 ID 解码回文本：

>>> tokenizer.batch_decode(outputs, skip_special_tokens=True)
['Somatic hypermutation allows the immune system to detect the presence of other viruses as they become more prevalent. Therefore, researchers have identified a high proportion of human viruses. The proportion of virus-associated viruses in our study increases with age. Therefore, we propose a simple algorithm to detect the presence of these new viruses in our samples as a sign of improved immunity. A first study based on this algorithm, which will be published in Science on Friday, aims to show that this finding could translate into the development of a better vaccine that is more effective for']

遮蔽语言建模

原始文本：huggingface.co/docs/transformers/v4.37.2/en/tasks/masked_language_modeling

www.youtube-nocookie.com/embed/mqElG5QJWUg

遮蔽语言建模预测序列中的一个遮蔽标记，模型可以双向关注标记。这意味着模型可以完全访问左侧和右侧的标记。遮蔽语言建模非常适合需要对整个序列进行良好上下文理解的任务。BERT 就是一个遮蔽语言模型的例子。

本指南将向您展示如何：

在r/askscience ELI5 数据集的子集上对DistilRoBERTa进行微调。
使用您微调的模型进行推断。

您可以按照本指南中的相同步骤对其他架构进行遮蔽语言建模的微调。选择以下架构之一：

ALBERT, BART, BERT, BigBird, CamemBERT, ConvBERT, Data2VecText, DeBERTa, DeBERTa-v2, DistilBERT, ELECTRA, ERNIE, ESM, FlauBERT, FNet, Funnel Transformer, I-BERT, LayoutLM, Longformer, LUKE, mBART, MEGA, Megatron-BERT, MobileBERT, MPNet, MRA, MVP, Nezha, Nyströmformer, Perceiver, QDQBert, Reformer, RemBERT, RoBERTa, RoBERTa-PreLayerNorm, RoCBert, RoFormer, SqueezeBERT, TAPAS, Wav2Vec2, XLM, XLM-RoBERTa, XLM-RoBERTa-XL, X-MOD, YOSO

在开始之前，请确保已安装所有必要的库：

pip install transformers datasets evaluate

我们鼓励您登录您的 Hugging Face 帐户，这样您就可以上传和与社区分享您的模型。在提示时，输入您的令牌以登录：

>>> from huggingface_hub import notebook_login

>>> notebook_login()

加载 ELI5 数据集

首先加载来自🤗数据集库的 ELI5 数据集的 r/askscience 子集的较小子集。这将让您有机会进行实验，并确保一切正常，然后再花更多时间在完整数据集上进行训练。

>>> from datasets import load_dataset

>>> eli5 = load_dataset("eli5", split="train_asks[:5000]")

使用train_test_split方法将数据集的train_asks分割为训练集和测试集：

>>> eli5 = eli5.train_test_split(test_size=0.2)

然后看一个例子：

>>> eli5["train"][0]
{'answers': {'a_id': ['c3d1aib', 'c3d4lya'],
  'score': [6, 3],
  'text': ["The velocity needed to remain in orbit is equal to the square root of Newton's constant times the mass of earth divided by the distance from the center of the earth. I don't know the altitude of that specific mission, but they're usually around 300 km. That means he's going 7-8 km/s.\n\nIn space there are no other forces acting on either the shuttle or the guy, so they stay in the same position relative to each other. If he were to become unable to return to the ship, he would presumably run out of oxygen, or slowly fall into the atmosphere and burn up.",
   "Hope you don't mind me asking another question, but why aren't there any stars visible in this photo?"]},
 'answers_urls': {'url': []},
 'document': '',
 'q_id': 'nyxfp',
 'selftext': '_URL_0_\n\nThis was on the front page earlier and I have a few questions about it. Is it possible to calculate how fast the astronaut would be orbiting the earth? Also how does he stay close to the shuttle so that he can return safely, i.e is he orbiting at the same speed and can therefore stay next to it? And finally if his propulsion system failed, would he eventually re-enter the atmosphere and presumably die?',
 'selftext_urls': {'url': ['http://apod.nasa.gov/apod/image/1201/freeflyer_nasa_3000.jpg']},
 'subreddit': 'askscience',
 'title': 'Few questions about this space walk photograph.',
 'title_urls': {'url': []}}

虽然这看起来很多，但您实际上只对text字段感兴趣。语言建模任务的有趣之处在于您不需要标签（也称为无监督任务），因为下一个词就是标签。

预处理

www.youtube-nocookie.com/embed/8PmhEIXhBvI

对于遮蔽语言建模，下一步是加载一个 DistilRoBERTa 分词器来处理text子字段：

>>> from transformers import AutoTokenizer

>>> tokenizer = AutoTokenizer.from_pretrained("distilroberta-base")

从上面的示例中，您会注意到text字段实际上是嵌套在answers内部的。这意味着您需要使用flatten方法从其嵌套结构中提取text子字段：

>>> eli5 = eli5.flatten()
>>> eli5["train"][0]
{'answers.a_id': ['c3d1aib', 'c3d4lya'],
 'answers.score': [6, 3],
 'answers.text': ["The velocity needed to remain in orbit is equal to the square root of Newton's constant times the mass of earth divided by the distance from the center of the earth. I don't know the altitude of that specific mission, but they're usually around 300 km. That means he's going 7-8 km/s.\n\nIn space there are no other forces acting on either the shuttle or the guy, so they stay in the same position relative to each other. If he were to become unable to return to the ship, he would presumably run out of oxygen, or slowly fall into the atmosphere and burn up.",
  "Hope you don't mind me asking another question, but why aren't there any stars visible in this photo?"],
 'answers_urls.url': [],
 'document': '',
 'q_id': 'nyxfp',
 'selftext': '_URL_0_\n\nThis was on the front page earlier and I have a few questions about it. Is it possible to calculate how fast the astronaut would be orbiting the earth? Also how does he stay close to the shuttle so that he can return safely, i.e is he orbiting at the same speed and can therefore stay next to it? And finally if his propulsion system failed, would he eventually re-enter the atmosphere and presumably die?',
 'selftext_urls.url': ['http://apod.nasa.gov/apod/image/1201/freeflyer_nasa_3000.jpg'],
 'subreddit': 'askscience',
 'title': 'Few questions about this space walk photograph.',
 'title_urls.url': []}

现在，每个子字段都是一个单独的列，由answers前缀指示，而text字段现在是一个列表。不要单独对每个句子进行标记化，而是将列表转换为字符串，以便可以联合对它们进行标记化。

这是一个第一个预处理函数，用于连接每个示例的字符串列表并对结果进行标记化：

>>> def preprocess_function(examples):
...     return tokenizer([" ".join(x) for x in examples["answers.text"]])

要在整个数据集上应用此预处理函数，请使用🤗数据集map方法。通过设置batched=True以一次处理数据集的多个元素，并使用num_proc增加进程数量来加速map函数。删除您不需要的任何列：

>>> tokenized_eli5 = eli5.map(
...     preprocess_function,
...     batched=True,
...     num_proc=4,
...     remove_columns=eli5["train"].column_names,
... )

此数据集包含标记序列，但其中一些序列比模型的最大输入长度更长。

现在可以使用第二个预处理函数

连接所有序列
将连接的序列拆分成由block_size定义的较短块，该块应该既比最大输入长度短，又足够短以适应您的 GPU RAM。

>>> block_size = 128

>>> def group_texts(examples):
...     # Concatenate all texts.
...     concatenated_examples = {k: sum(examples[k], []) for k in examples.keys()}
...     total_length = len(concatenated_examples[list(examples.keys())[0]])
...     # We drop the small remainder, we could add padding if the model supported it instead of this drop, you can
...     # customize this part to your needs.
...     if total_length >= block_size:
...         total_length = (total_length // block_size) * block_size
...     # Split by chunks of block_size.
...     result = {
...         k: [t[i : i + block_size] for i in range(0, total_length, block_size)]
...         for k, t in concatenated_examples.items()
...     }
...     return result

在整个数据集上应用group_texts函数：

>>> lm_dataset = tokenized_eli5.map(group_texts, batched=True, num_proc=4)

现在使用 DataCollatorForLanguageModeling 创建一批示例。在整理过程中，最好动态填充句子到批次中的最长长度，而不是将整个数据集填充到最大长度。

Pytorch 隐藏 Pytorch 内容

使用结束序列标记作为填充标记，并指定mlm_probability以在每次迭代数据时随机屏蔽标记：

>>> from transformers import DataCollatorForLanguageModeling

>>> tokenizer.pad_token = tokenizer.eos_token
>>> data_collator = DataCollatorForLanguageModeling(tokenizer=tokenizer, mlm_probability=0.15)

TensorFlow 隐藏 TensorFlow 内容

使用结束序列标记作为填充标记，并指定mlm_probability以在每次迭代数据时随机屏蔽标记：

>>> from transformers import DataCollatorForLanguageModeling

>>> data_collator = DataCollatorForLanguageModeling(tokenizer=tokenizer, mlm_probability=0.15, return_tensors="tf")

训练

Pytorch 隐藏 Pytorch 内容

如果您不熟悉使用 Trainer 微调模型，请查看基本教程这里！

您现在可以开始训练您的模型了！使用 AutoModelForMaskedLM 加载 DistilRoBERTa：

>>> from transformers import AutoModelForMaskedLM

>>> model = AutoModelForMaskedLM.from_pretrained("distilroberta-base")

此时，只剩下三个步骤：

在 TrainingArguments 中定义您的训练超参数。唯一必需的参数是output_dir，它指定保存模型的位置。通过设置push_to_hub=True将此模型推送到 Hub（您需要登录 Hugging Face 才能上传模型）。
将训练参数传递给 Trainer，以及模型、数据集和数据整理器。
调用 train()来微调您的模型。

>>> training_args = TrainingArguments(
...     output_dir="my_awesome_eli5_mlm_model",
...     evaluation_strategy="epoch",
...     learning_rate=2e-5,
...     num_train_epochs=3,
...     weight_decay=0.01,
...     push_to_hub=True,
... )

>>> trainer = Trainer(
...     model=model,
...     args=training_args,
...     train_dataset=lm_dataset["train"],
...     eval_dataset=lm_dataset["test"],
...     data_collator=data_collator,
... )

>>> trainer.train()

训练完成后，使用 evaluate()方法评估您的模型并获得其困惑度：

>>> import math

>>> eval_results = trainer.evaluate()
>>> print(f"Perplexity: {math.exp(eval_results['eval_loss']):.2f}")
Perplexity: 8.76

然后使用 push_to_hub()方法将您的模型共享到 Hub，这样每个人都可以使用您的模型：

>>> trainer.push_to_hub()

TensorFlow 隐藏 TensorFlow 内容

如果您不熟悉使用 Keras 微调模型，请查看基本教程这里！

要在 TensorFlow 中微调模型，请首先设置优化器函数、学习率调度和一些训练超参数：

>>> from transformers import create_optimizer, AdamWeightDecay

>>> optimizer = AdamWeightDecay(learning_rate=2e-5, weight_decay_rate=0.01)

然后，您可以使用 TFAutoModelForMaskedLM 加载 DistilRoBERTa：

>>> from transformers import TFAutoModelForMaskedLM

>>> model = TFAutoModelForMaskedLM.from_pretrained("distilroberta-base")

使用 prepare_tf_dataset()将您的数据集转换为tf.data.Dataset格式：

>>> tf_train_set = model.prepare_tf_dataset(
...     lm_dataset["train"],
...     shuffle=True,
...     batch_size=16,
...     collate_fn=data_collator,
... )

>>> tf_test_set = model.prepare_tf_dataset(
...     lm_dataset["test"],
...     shuffle=False,
...     batch_size=16,
...     collate_fn=data_collator,
... )

使用compile配置模型进行训练。请注意，Transformers 模型都有一个默认的与任务相关的损失函数，因此除非您想要指定一个，否则不需要指定一个：

>>> import tensorflow as tf

>>> model.compile(optimizer=optimizer)  # No loss argument!

这可以通过在 PushToHubCallback 中指定将模型和标记器推送到何处来完成：

>>> from transformers.keras_callbacks import PushToHubCallback

>>> callback = PushToHubCallback(
...     output_dir="my_awesome_eli5_mlm_model",
...     tokenizer=tokenizer,
... )

最后，您已经准备好开始训练您的模型了！使用您的训练和验证数据集、时代数和回调来微调模型，调用fit：

>>> model.fit(x=tf_train_set, validation_data=tf_test_set, epochs=3, callbacks=[callback])

一旦训练完成，您的模型将自动上传到 Hub，以便每个人都可以使用它！

要了解如何为掩码语言建模微调模型的更深入示例，请查看相应的PyTorch 笔记本或TensorFlow 笔记本。

推理

很好，现在您已经微调了一个模型，可以用它进行推理了！

想出一些您希望模型用来填充空白的文本，并使用特殊的<mask>标记来指示空白：

>>> text = "The Milky Way is a <mask> galaxy."

尝试使用您微调过的模型进行推理的最简单方法是在 pipeline()中使用它。为填充掩码实例化一个pipeline，并将文本传递给它。如果需要，可以使用top_k参数指定要返回多少预测：

>>> from transformers import pipeline

>>> mask_filler = pipeline("fill-mask", "stevhliu/my_awesome_eli5_mlm_model")
>>> mask_filler(text, top_k=3)
[{'score': 0.5150994658470154,
  'token': 21300,
  'token_str': ' spiral',
  'sequence': 'The Milky Way is a spiral galaxy.'},
 {'score': 0.07087188959121704,
  'token': 2232,
  'token_str': ' massive',
  'sequence': 'The Milky Way is a massive galaxy.'},
 {'score': 0.06434620916843414,
  'token': 650,
  'token_str': ' small',
  'sequence': 'The Milky Way is a small galaxy.'}]

Pytorch 隐藏 Pytorch 内容

对文本进行标记化，并将input_ids作为 PyTorch 张量返回。您还需要指定<mask>标记的位置：

>>> from transformers import AutoTokenizer

>>> tokenizer = AutoTokenizer.from_pretrained("stevhliu/my_awesome_eli5_mlm_model")
>>> inputs = tokenizer(text, return_tensors="pt")
>>> mask_token_index = torch.where(inputs["input_ids"] == tokenizer.mask_token_id)[1]

将您的输入传递给模型，并返回掩码标记的logits：

>>> from transformers import AutoModelForMaskedLM

>>> model = AutoModelForMaskedLM.from_pretrained("stevhliu/my_awesome_eli5_mlm_model")
>>> logits = model(**inputs).logits
>>> mask_token_logits = logits[0, mask_token_index, :]

然后返回概率最高的三个掩码标记并将它们打印出来：

>>> top_3_tokens = torch.topk(mask_token_logits, 3, dim=1).indices[0].tolist()

>>> for token in top_3_tokens:
...     print(text.replace(tokenizer.mask_token, tokenizer.decode([token])))
The Milky Way is a spiral galaxy.
The Milky Way is a massive galaxy.
The Milky Way is a small galaxy.

TensorFlow 隐藏 TensorFlow 内容

对文本进行标记化，并将input_ids作为 TensorFlow 张量返回。您还需要指定<mask>标记的位置：

>>> from transformers import AutoTokenizer

>>> tokenizer = AutoTokenizer.from_pretrained("stevhliu/my_awesome_eli5_mlm_model")
>>> inputs = tokenizer(text, return_tensors="tf")
>>> mask_token_index = tf.where(inputs["input_ids"] == tokenizer.mask_token_id)[0, 1]

将您的输入传递给模型，并返回掩码标记的logits：

>>> from transformers import TFAutoModelForMaskedLM

>>> model = TFAutoModelForMaskedLM.from_pretrained("stevhliu/my_awesome_eli5_mlm_model")
>>> logits = model(**inputs).logits
>>> mask_token_logits = logits[0, mask_token_index, :]

然后返回概率最高的三个掩码标记并将它们打印出来：

>>> top_3_tokens = tf.math.top_k(mask_token_logits, 3).indices.numpy()

>>> for token in top_3_tokens:
...     print(text.replace(tokenizer.mask_token, tokenizer.decode([token])))
The Milky Way is a spiral galaxy.
The Milky Way is a massive galaxy.
The Milky Way is a small galaxy.

翻译

原始文本：huggingface.co/docs/transformers/v4.37.2/en/tasks/translation

www.youtube-nocookie.com/embed/1JvfrvZgi6c

翻译将一个语言的文本序列转换为另一种语言。它是您可以将其制定为序列到序列问题的几个任务之一，这是一个从输入返回某些输出的强大框架，如翻译或摘要。翻译系统通常用于不同语言文本之间的翻译，但也可以用于语音或文本到语音或语音到文本之间的某种组合。

本指南将向您展示如何：

在OPUS Books数据集的英语-法语子集上微调T5以将英语文本翻译成法语。
使用您微调的模型进行推理。

本教程中演示的任务由以下模型架构支持：

BART, BigBird-Pegasus, Blenderbot, BlenderbotSmall, Encoder decoder, FairSeq Machine-Translation, GPTSAN-japanese, LED, LongT5, M2M100, Marian, mBART, MT5, MVP, NLLB, NLLB-MOE, Pegasus, PEGASUS-X, PLBart, ProphetNet, SeamlessM4T, SeamlessM4Tv2, SwitchTransformers, T5, UMT5, XLM-ProphetNet

在开始之前，请确保您已安装所有必要的库：

pip install transformers datasets evaluate sacrebleu

我们鼓励您登录您的 Hugging Face 帐户，这样您就可以上传和与社区共享您的模型。在提示时，输入您的令牌以登录：

>>> from huggingface_hub import notebook_login

>>> notebook_login()

加载 OPUS Books 数据集

首先加载🤗数据集库中OPUS Books数据集的英语-法语子集：

>>> from datasets import load_dataset

>>> books = load_dataset("opus_books", "en-fr")

使用train_test_split方法将数据集分割为训练集和测试集：

>>> books = books["train"].train_test_split(test_size=0.2)

然后看一个例子：

>>> books["train"][0]
{'id': '90560',
 'translation': {'en': 'But this lofty plateau measured only a few fathoms, and soon we reentered Our Element.',
  'fr': 'Mais ce plateau élevé ne mesurait que quelques toises, et bientôt nous fûmes rentrés dans notre élément.'}}

translation：文本的英语和法语翻译。

预处理

www.youtube-nocookie.com/embed/XAR8jnZZuUs

下一步是加载 T5 标记器以处理英语-法语语言对：

>>> from transformers import AutoTokenizer

>>> checkpoint = "t5-small"
>>> tokenizer = AutoTokenizer.from_pretrained(checkpoint)

您要创建的预处理函数需要：

在输入前加上提示，以便 T5 知道这是一个翻译任务。一些能够执行多个 NLP 任务的模型需要为特定任务提供提示。
将输入（英语）和目标（法语）分别进行标记化，因为无法使用在英语词汇上预训练的标记器对法语文本进行标记化。
将序列截断为max_length参数设置的最大长度。

>>> source_lang = "en"
>>> target_lang = "fr"
>>> prefix = "translate English to French: "

>>> def preprocess_function(examples):
...     inputs = [prefix + example[source_lang] for example in examples["translation"]]
...     targets = [example[target_lang] for example in examples["translation"]]
...     model_inputs = tokenizer(inputs, text_target=targets, max_length=128, truncation=True)
...     return model_inputs

要在整个数据集上应用预处理函数，请使用🤗数据集map方法。您可以通过设置batched=True来加速map函数，以一次处理数据集的多个元素：

>>> tokenized_books = books.map(preprocess_function, batched=True)

现在使用 DataCollatorForSeq2Seq 创建一批示例。在整理过程中，将句子动态填充到批次中的最长长度，而不是将整个数据集填充到最大长度，这样更有效。

Pytorch 隐藏 Pytorch 内容

>>> from transformers import DataCollatorForSeq2Seq

>>> data_collator = DataCollatorForSeq2Seq(tokenizer=tokenizer, model=checkpoint)

TensorFlow 隐藏 TensorFlow 内容

>>> from transformers import DataCollatorForSeq2Seq

>>> data_collator = DataCollatorForSeq2Seq(tokenizer=tokenizer, model=checkpoint, return_tensors="tf")

评估

在训练过程中包含一个指标通常有助于评估模型的性能。您可以使用🤗Evaluate库快速加载评估方法。对于这个任务，加载SacreBLEU指标（查看🤗Evaluate 快速入门以了解如何加载和计算指标）：

>>> import evaluate

>>> metric = evaluate.load("sacrebleu")

然后创建一个函数，将您的预测和标签传递给compute以计算 SacreBLEU 分数：

>>> import numpy as np

>>> def postprocess_text(preds, labels):
...     preds = [pred.strip() for pred in preds]
...     labels = [[label.strip()] for label in labels]

...     return preds, labels

>>> def compute_metrics(eval_preds):
...     preds, labels = eval_preds
...     if isinstance(preds, tuple):
...         preds = preds[0]
...     decoded_preds = tokenizer.batch_decode(preds, skip_special_tokens=True)

...     labels = np.where(labels != -100, labels, tokenizer.pad_token_id)
...     decoded_labels = tokenizer.batch_decode(labels, skip_special_tokens=True)

...     decoded_preds, decoded_labels = postprocess_text(decoded_preds, decoded_labels)

...     result = metric.compute(predictions=decoded_preds, references=decoded_labels)
...     result = {"bleu": result["score"]}

...     prediction_lens = [np.count_nonzero(pred != tokenizer.pad_token_id) for pred in preds]
...     result["gen_len"] = np.mean(prediction_lens)
...     result = {k: round(v, 4) for k, v in result.items()}
...     return result

您的compute_metrics函数现在已经准备就绪，当您设置训练时会返回到它。

训练

Pytorch 隐藏 Pytorch 内容

如果您不熟悉如何使用 Trainer 微调模型，请查看这里的基本教程！

现在您已经准备好开始训练您的模型了！使用 AutoModelForSeq2SeqLM 加载 T5：

>>> from transformers import AutoModelForSeq2SeqLM, Seq2SeqTrainingArguments, Seq2SeqTrainer

>>> model = AutoModelForSeq2SeqLM.from_pretrained(checkpoint)

此时，只剩下三个步骤：

在 Seq2SeqTrainingArguments 中定义您的训练超参数。唯一必需的参数是output_dir，指定保存模型的位置。通过设置push_to_hub=True将此模型推送到 Hub（您需要登录 Hugging Face 才能上传模型）。在每个 epoch 结束时，Trainer 将评估 SacreBLEU 指标并保存训练检查点。
将训练参数传递给 Seq2SeqTrainer，同时还包括模型、数据集、分词器、数据整理器和compute_metrics函数。
调用 train()来微调您的模型。

>>> training_args = Seq2SeqTrainingArguments(
...     output_dir="my_awesome_opus_books_model",
...     evaluation_strategy="epoch",
...     learning_rate=2e-5,
...     per_device_train_batch_size=16,
...     per_device_eval_batch_size=16,
...     weight_decay=0.01,
...     save_total_limit=3,
...     num_train_epochs=2,
...     predict_with_generate=True,
...     fp16=True,
...     push_to_hub=True,
... )

>>> trainer = Seq2SeqTrainer(
...     model=model,
...     args=training_args,
...     train_dataset=tokenized_books["train"],
...     eval_dataset=tokenized_books["test"],
...     tokenizer=tokenizer,
...     data_collator=data_collator,
...     compute_metrics=compute_metrics,
... )

>>> trainer.train()

训练完成后，使用 push_to_hub()方法将模型共享到 Hub，以便每个人都可以使用您的模型：

>>> trainer.push_to_hub()

TensorFlow 隐藏 TensorFlow 内容

如果您不熟悉如何使用 Keras 微调模型，请查看这里的基本教程！

要在 TensorFlow 中微调模型，请首先设置优化器函数、学习率调度和一些训练超参数：

>>> from transformers import AdamWeightDecay

>>> optimizer = AdamWeightDecay(learning_rate=2e-5, weight_decay_rate=0.01)

然后您可以使用 TFAutoModelForSeq2SeqLM 加载 T5：

>>> from transformers import TFAutoModelForSeq2SeqLM

>>> model = TFAutoModelForSeq2SeqLM.from_pretrained(checkpoint)

使用 prepare_tf_dataset()将数据集转换为tf.data.Dataset格式：

>>> tf_train_set = model.prepare_tf_dataset(
...     tokenized_books["train"],
...     shuffle=True,
...     batch_size=16,
...     collate_fn=data_collator,
... )

>>> tf_test_set = model.prepare_tf_dataset(
...     tokenized_books["test"],
...     shuffle=False,
...     batch_size=16,
...     collate_fn=data_collator,
... )

使用compile配置模型进行训练。请注意，Transformers 模型都有一个默认的与任务相关的损失函数，因此除非您想要指定一个，否则不需要：

>>> import tensorflow as tf

>>> model.compile(optimizer=optimizer)  # No loss argument!

在开始训练之前，还有最后两件事要设置：从预测中计算 SacreBLEU 指标，并提供一种将模型推送到 Hub 的方法。这两个都可以通过使用 Keras 回调来完成。

将您的compute_metrics函数传递给 KerasMetricCallback：

>>> from transformers.keras_callbacks import KerasMetricCallback

>>> metric_callback = KerasMetricCallback(metric_fn=compute_metrics, eval_dataset=tf_validation_set)

指定将模型和分词器推送到 PushToHubCallback 的位置：

>>> from transformers.keras_callbacks import PushToHubCallback

>>> push_to_hub_callback = PushToHubCallback(
...     output_dir="my_awesome_opus_books_model",
...     tokenizer=tokenizer,
... )

然后将您的回调捆绑在一起：

>>> callbacks = [metric_callback, push_to_hub_callback]

最后，你已经准备好开始训练你的模型了！调用fit与你的训练和验证数据集，时代的数量，以及你的回调来微调模型：

>>> model.fit(x=tf_train_set, validation_data=tf_test_set, epochs=3, callbacks=callbacks)

一旦训练完成，你的模型会自动上传到 Hub，这样每个人都可以使用它！

有关如何为翻译微调模型的更深入示例，请查看相应的PyTorch 笔记本或TensorFlow 笔记本。

推理

很好，现在你已经微调了一个模型，你可以用它进行推理！

想出一些你想要翻译成另一种语言的文本。对于 T5，你需要根据你正在处理的任务来添加前缀。对于从英语翻译成法语，你应该像下面所示添加前缀：

>>> text = "translate English to French: Legumes share resources with nitrogen-fixing bacteria."

尝试使用微调模型进行推理的最简单方法是在 pipeline()中使用它。为翻译实例化一个pipeline与您的模型，并将文本传递给它：

>>> from transformers import pipeline

>>> translator = pipeline("translation", model="my_awesome_opus_books_model")
>>> translator(text)
[{'translation_text': 'Legumes partagent des ressources avec des bactéries azotantes.'}]

如果你愿意，你也可以手动复制pipeline的结果：

Pytorch 隐藏 Pytorch 内容

对文本进行标记化，并将input_ids返回为 PyTorch 张量：

>>> from transformers import AutoTokenizer

>>> tokenizer = AutoTokenizer.from_pretrained("my_awesome_opus_books_model")
>>> inputs = tokenizer(text, return_tensors="pt").input_ids

使用 generate()方法创建翻译。有关不同文本生成策略和控制生成的参数的更多详细信息，请查看 Text Generation API。

>>> from transformers import AutoModelForSeq2SeqLM

>>> model = AutoModelForSeq2SeqLM.from_pretrained("my_awesome_opus_books_model")
>>> outputs = model.generate(inputs, max_new_tokens=40, do_sample=True, top_k=30, top_p=0.95)

将生成的标记 ID 解码回文本：

>>> tokenizer.decode(outputs[0], skip_special_tokens=True)
'Les lignées partagent des ressources avec des bactéries enfixant l'azote.'

TensorFlow 隐藏 TensorFlow 内容

对文本进行标记化，并将input_ids返回为 TensorFlow 张量：

>>> from transformers import AutoTokenizer

>>> tokenizer = AutoTokenizer.from_pretrained("my_awesome_opus_books_model")
>>> inputs = tokenizer(text, return_tensors="tf").input_ids

使用 generate()方法创建翻译。有关不同文本生成策略和控制生成的参数的更多详细信息，请查看 Text Generation API。

>>> from transformers import TFAutoModelForSeq2SeqLM

>>> model = TFAutoModelForSeq2SeqLM.from_pretrained("my_awesome_opus_books_model")
>>> outputs = model.generate(inputs, max_new_tokens=40, do_sample=True, top_k=30, top_p=0.95)

将生成的标记 ID 解码回文本：

>>> tokenizer.decode(outputs[0], skip_special_tokens=True)
'Les lugumes partagent les ressources avec des bactéries fixatrices d'azote.'

摘要

原始文本：huggingface.co/docs/transformers/v4.37.2/en/tasks/summarization

www.youtube-nocookie.com/embed/yHnr5Dk2zCI

摘要创建文档或文章的简短版本，捕捉所有重要信息。除了翻译之外，这是另一个可以被制定为序列到序列任务的任务的例子。摘要可以是：

抽取式：从文档中提取最相关的信息。
生成式：生成捕捉最相关信息的新文本。

本指南将向您展示如何：

在BillSum数据集的加利福尼亚州议案子集上对T5进行微调，用于生成摘要。
使用您微调的模型进行推断。

本教程中展示的任务由以下模型架构支持：

在开始之前，请确保您已安装所有必要的库：

pip install transformers datasets evaluate rouge_score

我们鼓励您登录到您的 Hugging Face 账户，这样您就可以上传和分享您的模型给社区。在提示时，输入您的令牌以登录：

>>> from huggingface_hub import notebook_login

>>> notebook_login()

加载 BillSum 数据集

首先加载🤗数据集库中较小的加利福尼亚州议案子集：

>>> from datasets import load_dataset

>>> billsum = load_dataset("billsum", split="ca_test")

将数据集分割成训练集和测试集，使用train_test_split方法：

>>> billsum = billsum.train_test_split(test_size=0.2)

然后看一个例子：

>>> billsum["train"][0]
{'summary': 'Existing law authorizes state agencies to enter into contracts for the acquisition of goods or services upon approval by the Department of General Services. Existing law sets forth various requirements and prohibitions for those contracts, including, but not limited to, a prohibition on entering into contracts for the acquisition of goods or services of $100,000 or more with a contractor that discriminates between spouses and domestic partners or same-sex and different-sex couples in the provision of benefits. Existing law provides that a contract entered into in violation of those requirements and prohibitions is void and authorizes the state or any person acting on behalf of the state to bring a civil action seeking a determination that a contract is in violation and therefore void. Under existing law, a willful violation of those requirements and prohibitions is a misdemeanor.\nThis bill would also prohibit a state agency from entering into contracts for the acquisition of goods or services of $100,000 or more with a contractor that discriminates between employees on the basis of gender identity in the provision of benefits, as specified. By expanding the scope of a crime, this bill would impose a state-mandated local program.\nThe California Constitution requires the state to reimburse local agencies and school districts for certain costs mandated by the state. Statutory provisions establish procedures for making that reimbursement.\nThis bill would provide that no reimbursement is required by this act for a specified reason.',
 'text': 'The people of the State of California do enact as follows:\n\n\nSECTION 1.\nSection 10295.35 is added to the Public Contract Code, to read:\n10295.35.\n(a) (1) Notwithstanding any other law, a state agency shall not enter into any contract for the acquisition of goods or services in the amount of one hundred thousand dollars ($100,000) or more with a contractor that, in the provision of benefits, discriminates between employees on the basis of an employee’s or dependent’s actual or perceived gender identity, including, but not limited to, the employee’s or dependent’s identification as transgender.\n(2) For purposes of this section, “contract” includes contracts with a cumulative amount of one hundred thousand dollars ($100,000) or more per contractor in each fiscal year.\n(3) For purposes of this section, an employee health plan is discriminatory if the plan is not consistent with Section 1365.5 of the Health and Safety Code and Section 10140 of the Insurance Code.\n(4) The requirements of this section shall apply only to those portions of a contractor’s operations that occur under any of the following conditions:\n(A) Within the state.\n(B) On real property outside the state if the property is owned by the state or if the state has a right to occupy the property, and if the contractor’s presence at that location is connected to a contract with the state.\n(C) Elsewhere in the United States where work related to a state contract is being performed.\n(b) Contractors shall treat as confidential, to the maximum extent allowed by law or by the requirement of the contractor’s insurance provider, any request by an employee or applicant for employment benefits or any documentation of eligibility for benefits submitted by an employee or applicant for employment.\n(c) After taking all reasonable measures to find a contractor that complies with this section, as determined by the state agency, the requirements of this section may be waived under any of the following circumstances:\n(1) There is only one prospective contractor willing to enter into a specific contract with the state agency.\n(2) The contract is necessary to respond to an emergency, as determined by the state agency, that endangers the public health, welfare, or safety, or the contract is necessary for the provision of essential services, and no entity that complies with the requirements of this section capable of responding to the emergency is immediately available.\n(3) The requirements of this section violate, or are inconsistent with, the terms or conditions of a grant, subvention, or agreement, if the agency has made a good faith attempt to change the terms or conditions of any grant, subvention, or agreement to authorize application of this section.\n(4) The contractor is providing wholesale or bulk water, power, or natural gas, the conveyance or transmission of the same, or ancillary services, as required for ensuring reliable services in accordance with good utility practice, if the purchase of the same cannot practically be accomplished through the standard competitive bidding procedures and the contractor is not providing direct retail services to end users.\n(d) (1) A contractor shall not be deemed to discriminate in the provision of benefits if the contractor, in providing the benefits, pays the actual costs incurred in obtaining the benefit.\n(2) If a contractor is unable to provide a certain benefit, despite taking reasonable measures to do so, the contractor shall not be deemed to discriminate in the provision of benefits.\n(e) (1) Every contract subject to this chapter shall contain a statement by which the contractor certifies that the contractor is in compliance with this section.\n(2) The department or other contracting agency shall enforce this section pursuant to its existing enforcement powers.\n(3) (A) If a contractor falsely certifies that it is in compliance with this section, the contract with that contractor shall be subject to Article 9 (commencing with Section 10420), unless, within a time period specified by the department or other contracting agency, the contractor provides to the department or agency proof that it has complied, or is in the process of complying, with this section.\n(B) The application of the remedies or penalties contained in Article 9 (commencing with Section 10420) to a contract subject to this chapter shall not preclude the application of any existing remedies otherwise available to the department or other contracting agency under its existing enforcement powers.\n(f) Nothing in this section is intended to regulate the contracting practices of any local jurisdiction.\n(g) This section shall be construed so as not to conflict with applicable federal laws, rules, or regulations. In the event that a court or agency of competent jurisdiction holds that federal law, rule, or regulation invalidates any clause, sentence, paragraph, or section of this code or the application thereof to any person or circumstances, it is the intent of the state that the court or agency sever that clause, sentence, paragraph, or section so that the remainder of this section shall remain in effect.\nSEC. 2.\nSection 10295.35 of the Public Contract Code shall not be construed to create any new enforcement authority or responsibility in the Department of General Services or any other contracting agency.\nSEC. 3.\nNo reimbursement is required by this act pursuant to Section 6 of Article XIII\u2009B of the California Constitution because the only costs that may be incurred by a local agency or school district will be incurred because this act creates a new crime or infraction, eliminates a crime or infraction, or changes the penalty for a crime or infraction, within the meaning of Section 17556 of the Government Code, or changes the definition of a crime within the meaning of Section 6 of Article XIII\u2009B of the California Constitution.',
 'title': 'An act to add Section 10295.35 to the Public Contract Code, relating to public contracts.'}

有两个字段您将要使用：

text：将成为模型输入的议案文本。
summary：text的简化版本，将成为模型的目标。

预处理

下一步是加载 T5 分词器来处理text和summary：

>>> from transformers import AutoTokenizer

>>> checkpoint = "t5-small"
>>> tokenizer = AutoTokenizer.from_pretrained(checkpoint)

您要创建的预处理函数需要：

在输入前加上提示，以便 T5 知道这是一个摘要任务。一些能够执行多个 NLP 任务的模型需要为特定任务提供提示。
在标记标签时使用关键字text_target参数。
截断序列，使其不超过由max_length参数设置的最大长度。

>>> prefix = "summarize: "

>>> def preprocess_function(examples):
...     inputs = [prefix + doc for doc in examples["text"]]
...     model_inputs = tokenizer(inputs, max_length=1024, truncation=True)

...     labels = tokenizer(text_target=examples["summary"], max_length=128, truncation=True)

...     model_inputs["labels"] = labels["input_ids"]
...     return model_inputs

要在整个数据集上应用预处理函数，使用🤗数据集的map方法。通过设置batched=True来加速map函数，以一次处理数据集的多个元素：

>>> tokenized_billsum = billsum.map(preprocess_function, batched=True)

现在使用 DataCollatorForSeq2Seq 创建一批示例。在整理过程中，将句子动态填充到批次中的最长长度，而不是将整个数据集填充到最大长度。

Pytorch 隐藏 Pytorch 内容

>>> from transformers import DataCollatorForSeq2Seq

>>> data_collator = DataCollatorForSeq2Seq(tokenizer=tokenizer, model=checkpoint)

TensorFlow 隐藏 TensorFlow 内容

>>> from transformers import DataCollatorForSeq2Seq

>>> data_collator = DataCollatorForSeq2Seq(tokenizer=tokenizer, model=checkpoint, return_tensors="tf")

评估

在训练过程中包含一个指标通常有助于评估模型的性能。您可以使用🤗 Evaluate库快速加载一个评估方法。对于这个任务，加载ROUGE指标（查看🤗 Evaluate 快速入门以了解如何加载和计算指标）：

>>> import evaluate

>>> rouge = evaluate.load("rouge")

然后创建一个函数，将您的预测和标签传递给compute以计算 ROUGE 指标：

>>> import numpy as np

>>> def compute_metrics(eval_pred):
...     predictions, labels = eval_pred
...     decoded_preds = tokenizer.batch_decode(predictions, skip_special_tokens=True)
...     labels = np.where(labels != -100, labels, tokenizer.pad_token_id)
...     decoded_labels = tokenizer.batch_decode(labels, skip_special_tokens=True)

...     result = rouge.compute(predictions=decoded_preds, references=decoded_labels, use_stemmer=True)

...     prediction_lens = [np.count_nonzero(pred != tokenizer.pad_token_id) for pred in predictions]
...     result["gen_len"] = np.mean(prediction_lens)

...     return {k: round(v, 4) for k, v in result.items()}

您的compute_metrics函数现在已经准备就绪，当您设置训练时会返回到它。

训练

Pytorch 隐藏 Pytorch 内容

如果您不熟悉使用 Trainer 微调模型，请查看基本教程这里！

现在您已经准备好开始训练您的模型了！使用 AutoModelForSeq2SeqLM 加载 T5：

>>> from transformers import AutoModelForSeq2SeqLM, Seq2SeqTrainingArguments, Seq2SeqTrainer

>>> model = AutoModelForSeq2SeqLM.from_pretrained(checkpoint)

此时，只剩下三个步骤：

在 Seq2SeqTrainingArguments 中定义您的训练超参数。唯一必需的参数是output_dir，它指定了保存模型的位置。您可以通过设置push_to_hub=True将此模型推送到 Hub（您需要登录 Hugging Face 才能上传模型）。在每个 epoch 结束时，Trainer 将评估 ROUGE 指标并保存训练检查点。
将训练参数传递给 Seq2SeqTrainer，同时还要传递模型、数据集、分词器、数据整理器和compute_metrics函数。
调用 train()来微调您的模型。

>>> training_args = Seq2SeqTrainingArguments(
...     output_dir="my_awesome_billsum_model",
...     evaluation_strategy="epoch",
...     learning_rate=2e-5,
...     per_device_train_batch_size=16,
...     per_device_eval_batch_size=16,
...     weight_decay=0.01,
...     save_total_limit=3,
...     num_train_epochs=4,
...     predict_with_generate=True,
...     fp16=True,
...     push_to_hub=True,
... )

>>> trainer = Seq2SeqTrainer(
...     model=model,
...     args=training_args,
...     train_dataset=tokenized_billsum["train"],
...     eval_dataset=tokenized_billsum["test"],
...     tokenizer=tokenizer,
...     data_collator=data_collator,
...     compute_metrics=compute_metrics,
... )

>>> trainer.train()

训练完成后，使用 push_to_hub()方法将您的模型共享到 Hub，以便每个人都可以使用您的模型：

>>> trainer.push_to_hub()

TensorFlow 隐藏 TensorFlow 内容

如果您不熟悉使用 Keras 微调模型，请查看基本教程这里！

要微调 TensorFlow 模型，首先设置一个优化器函数、学习率调度和一些训练超参数：

>>> from transformers import create_optimizer, AdamWeightDecay

>>> optimizer = AdamWeightDecay(learning_rate=2e-5, weight_decay_rate=0.01)

然后您可以使用 TFAutoModelForSeq2SeqLM 加载 T5：

>>> from transformers import TFAutoModelForSeq2SeqLM

>>> model = TFAutoModelForSeq2SeqLM.from_pretrained(checkpoint)

使用 prepare_tf_dataset()将您的数据集转换为tf.data.Dataset格式：

>>> tf_train_set = model.prepare_tf_dataset(
...     tokenized_billsum["train"],
...     shuffle=True,
...     batch_size=16,
...     collate_fn=data_collator,
... )

>>> tf_test_set = model.prepare_tf_dataset(
...     tokenized_billsum["test"],
...     shuffle=False,
...     batch_size=16,
...     collate_fn=data_collator,
... )

使用compile为训练配置模型。请注意，Transformers 模型都有一个默认的与任务相关的损失函数，因此除非您想要指定一个，否则不需要：

>>> import tensorflow as tf

>>> model.compile(optimizer=optimizer)  # No loss argument!

在开始训练之前，还有最后两件事要设置：从预测中计算 ROUGE 分数，并提供一种将模型推送到 Hub 的方法。这两个都可以通过使用 Keras 回调来完成。

将您的compute_metrics函数传递给 KerasMetricCallback：

>>> from transformers.keras_callbacks import KerasMetricCallback

>>> metric_callback = KerasMetricCallback(metric_fn=compute_metrics, eval_dataset=tf_validation_set)

在 PushToHubCallback 中指定要推送模型和分词器的位置：

>>> from transformers.keras_callbacks import PushToHubCallback

>>> push_to_hub_callback = PushToHubCallback(
...     output_dir="my_awesome_billsum_model",
...     tokenizer=tokenizer,
... )

然后将您的回调捆绑在一起：

>>> callbacks = [metric_callback, push_to_hub_callback]

最后，您已经准备好开始训练您的模型了！使用您的训练和验证数据集、epoch 数量和回调函数调用fit来微调模型：

>>> model.fit(x=tf_train_set, validation_data=tf_test_set, epochs=3, callbacks=callbacks)

训练完成后，您的模型将自动上传到 Hub，以便每个人都可以使用它！

有关如何为摘要微调模型的更深入示例，请查看相应的 PyTorch 笔记本或 TensorFlow 笔记本。

推理

很好，现在您已经对模型进行了微调，可以用于推理了！

想出一些您想要总结的文本。对于 T5，您需要根据您正在处理的任务为输入添加前缀。对于摘要，您应该像下面所示为输入添加前缀：

>>> text = "summarize: The Inflation Reduction Act lowers prescription drug costs, health care costs, and energy costs. It's the most aggressive action on tackling the climate crisis in American history, which will lift up American workers and create good-paying, union jobs across the country. It'll lower the deficit and ask the ultra-wealthy and corporations to pay their fair share. And no one making under $400,000 per year will pay a penny more in taxes."

尝试使用微调后的模型进行推理的最简单方法是在 pipeline() 中使用它。为摘要实例化一个带有您的模型的 pipeline，并将文本传递给它：

>>> from transformers import pipeline

>>> summarizer = pipeline("summarization", model="stevhliu/my_awesome_billsum_model")
>>> summarizer(text)
[{"summary_text": "The Inflation Reduction Act lowers prescription drug costs, health care costs, and energy costs. It's the most aggressive action on tackling the climate crisis in American history, which will lift up American workers and create good-paying, union jobs across the country."}]

如果愿意，您也可以手动复制 pipeline 的结果：

Pytorch 隐藏 Pytorch 内容

对文本进行标记化，并将 input_ids 返回为 PyTorch 张量：

>>> from transformers import AutoTokenizer

>>> tokenizer = AutoTokenizer.from_pretrained("stevhliu/my_awesome_billsum_model")
>>> inputs = tokenizer(text, return_tensors="pt").input_ids

使用 generate() 方法来创建摘要。有关不同文本生成策略和控制生成的参数的更多详细信息，请查看 Text Generation API。

>>> from transformers import AutoModelForSeq2SeqLM

>>> model = AutoModelForSeq2SeqLM.from_pretrained("stevhliu/my_awesome_billsum_model")
>>> outputs = model.generate(inputs, max_new_tokens=100, do_sample=False)

将生成的标记 id 解码回文本：

>>> tokenizer.decode(outputs[0], skip_special_tokens=True)
'the inflation reduction act lowers prescription drug costs, health care costs, and energy costs. it's the most aggressive action on tackling the climate crisis in american history. it will ask the ultra-wealthy and corporations to pay their fair share.'

TensorFlow 隐藏 TensorFlow 内容

对文本进行标记化，并将 input_ids 返回为 TensorFlow 张量：

>>> from transformers import AutoTokenizer

>>> tokenizer = AutoTokenizer.from_pretrained("stevhliu/my_awesome_billsum_model")
>>> inputs = tokenizer(text, return_tensors="tf").input_ids

使用 generate() 方法来创建摘要。有关不同文本生成策略和控制生成的参数的更多详细信息，请查看 Text Generation API。

>>> from transformers import TFAutoModelForSeq2SeqLM

>>> model = TFAutoModelForSeq2SeqLM.from_pretrained("stevhliu/my_awesome_billsum_model")
>>> outputs = model.generate(inputs, max_new_tokens=100, do_sample=False)

将生成的标记 id 解码回文本：

>>> tokenizer.decode(outputs[0], skip_special_tokens=True)
'the inflation reduction act lowers prescription drug costs, health care costs, and energy costs. it's the most aggressive action on tackling the climate crisis in american history. it will ask the ultra-wealthy and corporations to pay their fair share.'

多项选择

原始文本：huggingface.co/docs/transformers/v4.37.2/en/tasks/multiple_choice

多项选择任务类似于问题回答，不同之处在于提供了几个候选答案以及上下文，模型经过训练后会选择正确的答案。

本指南将向您展示如何：

在SWAG数据集的regular配置上对BERT进行微调，以在给定多个选项和一些上下文的情况下选择最佳答案。
使用您微调过的模型进行推理。

本教程中演示的任务由以下模型架构支持：

ALBERT、BERT、BigBird、CamemBERT、CANINE、ConvBERT、Data2VecText、DeBERTa-v2、DistilBERT、ELECTRA、ERNIE、ErnieM、FlauBERT、FNet、Funnel Transformer、I-BERT、Longformer、LUKE、MEGA、Megatron-BERT、MobileBERT、MPNet、MRA、Nezha、Nyströmformer、QDQBert、RemBERT、RoBERTa、RoBERTa-PreLayerNorm、RoCBert、RoFormer、SqueezeBERT、XLM、XLM-RoBERTa、XLM-RoBERTa-XL、XLNet、X-MOD、YOSO

在开始之前，请确保已安装所有必要的库：

pip install transformers datasets evaluate

我们鼓励您登录您的 Hugging Face 帐户，这样您就可以上传和与社区分享您的模型。在提示时，输入您的令牌以登录：

>>> from huggingface_hub import notebook_login

>>> notebook_login()

加载 SWAG 数据集

首先加载🤗 Datasets 库中 SWAG 数据集的regular配置：

>>> from datasets import load_dataset

>>> swag = load_dataset("swag", "regular")

然后看一个例子：

>>> swag["train"][0]
{'ending0': 'passes by walking down the street playing their instruments.',
 'ending1': 'has heard approaching them.',
 'ending2': "arrives and they're outside dancing and asleep.",
 'ending3': 'turns the lead singer watches the performance.',
 'fold-ind': '3416',
 'gold-source': 'gold',
 'label': 0,
 'sent1': 'Members of the procession walk down the street holding small horn brass instruments.',
 'sent2': 'A drum line',
 'startphrase': 'Members of the procession walk down the street holding small horn brass instruments. A drum line',
 'video-id': 'anetv_jkn6uvmqwh4'}

虽然这里看起来有很多字段，但实际上非常简单：

sent1和sent2：这些字段显示句子如何开始，如果将它们放在一起，您将得到startphrase字段。
ending：建议一个可能的句子结尾，但只有一个是正确的。
label：标识正确的句子结尾。

预处理

下一步是加载 BERT 分词器，以处理句子开头和四个可能的结尾：

>>> from transformers import AutoTokenizer

>>> tokenizer = AutoTokenizer.from_pretrained("bert-base-uncased")

您要创建的预处理函数需要：

复制sent1字段四次，并将每个字段与sent2组合，以重新创建句子的开头方式。
将sent2与四个可能的句子结尾中的每一个结合起来。
将这两个列表展平，以便对它们进行标记化，然后在之后将它们展开，以便每个示例都有相应的input_ids、attention_mask和labels字段。

>>> ending_names = ["ending0", "ending1", "ending2", "ending3"]

>>> def preprocess_function(examples):
...     first_sentences = [[context] * 4 for context in examples["sent1"]]
...     question_headers = examples["sent2"]
...     second_sentences = [
...         [f"{header} {examples[end][i]}" for end in ending_names] for i, header in enumerate(question_headers)
...     ]

...     first_sentences = sum(first_sentences, [])
...     second_sentences = sum(second_sentences, [])

...     tokenized_examples = tokenizer(first_sentences, second_sentences, truncation=True)
...     return {k: [v[i : i + 4] for i in range(0, len(v), 4)] for k, v in tokenized_examples.items()}

要在整个数据集上应用预处理函数，请使用🤗 Datasets map方法。您可以通过设置batched=True来加速map函数，以一次处理数据集的多个元素：

tokenized_swag = swag.map(preprocess_function, batched=True)

🤗 Transformers 没有用于多项选择的数据整理器，因此您需要调整 DataCollatorWithPadding 以创建一批示例。在整理期间，将句子动态填充到批次中的最长长度，而不是将整个数据集填充到最大长度，这样更有效。

DataCollatorForMultipleChoice将所有模型输入展平，应用填充，然后展平结果：

PytorchHide Pytorch 内容

>>> from dataclasses import dataclass
>>> from transformers.tokenization_utils_base import PreTrainedTokenizerBase, PaddingStrategy
>>> from typing import Optional, Union
>>> import torch

>>> @dataclass
... class DataCollatorForMultipleChoice:
...     """
...     Data collator that will dynamically pad the inputs for multiple choice received.
...     """

...     tokenizer: PreTrainedTokenizerBase
...     padding: Union[bool, str, PaddingStrategy] = True
...     max_length: Optional[int] = None
...     pad_to_multiple_of: Optional[int] = None

...     def __call__(self, features):
...         label_name = "label" if "label" in features[0].keys() else "labels"
...         labels = [feature.pop(label_name) for feature in features]
...         batch_size = len(features)
...         num_choices = len(features[0]["input_ids"])
...         flattened_features = [
...             [{k: v[i] for k, v in feature.items()} for i in range(num_choices)] for feature in features
...         ]
...         flattened_features = sum(flattened_features, [])

...         batch = self.tokenizer.pad(
...             flattened_features,
...             padding=self.padding,
...             max_length=self.max_length,
...             pad_to_multiple_of=self.pad_to_multiple_of,
...             return_tensors="pt",
...         )

...         batch = {k: v.view(batch_size, num_choices, -1) for k, v in batch.items()}
...         batch["labels"] = torch.tensor(labels, dtype=torch.int64)
...         return batch

TensorFlowHide TensorFlow 内容

>>> from dataclasses import dataclass
>>> from transformers.tokenization_utils_base import PreTrainedTokenizerBase, PaddingStrategy
>>> from typing import Optional, Union
>>> import tensorflow as tf

>>> @dataclass
... class DataCollatorForMultipleChoice:
...     """
...     Data collator that will dynamically pad the inputs for multiple choice received.
...     """

...     tokenizer: PreTrainedTokenizerBase
...     padding: Union[bool, str, PaddingStrategy] = True
...     max_length: Optional[int] = None
...     pad_to_multiple_of: Optional[int] = None

...     def __call__(self, features):
...         label_name = "label" if "label" in features[0].keys() else "labels"
...         labels = [feature.pop(label_name) for feature in features]
...         batch_size = len(features)
...         num_choices = len(features[0]["input_ids"])
...         flattened_features = [
...             [{k: v[i] for k, v in feature.items()} for i in range(num_choices)] for feature in features
...         ]
...         flattened_features = sum(flattened_features, [])

...         batch = self.tokenizer.pad(
...             flattened_features,
...             padding=self.padding,
...             max_length=self.max_length,
...             pad_to_multiple_of=self.pad_to_multiple_of,
...             return_tensors="tf",
...         )

...         batch = {k: tf.reshape(v, (batch_size, num_choices, -1)) for k, v in batch.items()}
...         batch["labels"] = tf.convert_to_tensor(labels, dtype=tf.int64)
...         return batch

评估

在训练过程中包含一个度量通常有助于评估模型的性能。您可以使用🤗 Evaluate库快速加载评估方法。对于此任务，加载accuracy度量（请参阅🤗 Evaluate quick tour以了解如何加载和计算度量）：

>>> import evaluate

>>> accuracy = evaluate.load("accuracy")

然后创建一个函数，将您的预测和标签传递给compute来计算准确性：

>>> import numpy as np

>>> def compute_metrics(eval_pred):
...     predictions, labels = eval_pred
...     predictions = np.argmax(predictions, axis=1)
...     return accuracy.compute(predictions=predictions, references=labels)

您的compute_metrics函数现在已经准备就绪，在设置训练时将返回到它。

训练

PytorchHide Pytorch 内容

如果您不熟悉使用 Trainer 微调模型，请查看这里的基本教程 here!

您现在可以开始训练您的模型了！使用 AutoModelForMultipleChoice 加载 BERT：

>>> from transformers import AutoModelForMultipleChoice, TrainingArguments, Trainer

>>> model = AutoModelForMultipleChoice.from_pretrained("bert-base-uncased")

此时，只剩下三个步骤：

在 TrainingArguments 中定义您的训练超参数。唯一必需的参数是output_dir，它指定保存模型的位置。您将通过设置push_to_hub=True将此模型推送到 Hub（您需要登录 Hugging Face 才能上传模型）。在每个时代结束时，Trainer 将评估准确性并保存训练检查点。
将训练参数传递给 Trainer，同时还包括模型、数据集、标记器、数据整理器和compute_metrics函数。
调用 train()来微调您的模型。

>>> training_args = TrainingArguments(
...     output_dir="my_awesome_swag_model",
...     evaluation_strategy="epoch",
...     save_strategy="epoch",
...     load_best_model_at_end=True,
...     learning_rate=5e-5,
...     per_device_train_batch_size=16,
...     per_device_eval_batch_size=16,
...     num_train_epochs=3,
...     weight_decay=0.01,
...     push_to_hub=True,
... )

>>> trainer = Trainer(
...     model=model,
...     args=training_args,
...     train_dataset=tokenized_swag["train"],
...     eval_dataset=tokenized_swag["validation"],
...     tokenizer=tokenizer,
...     data_collator=DataCollatorForMultipleChoice(tokenizer=tokenizer),
...     compute_metrics=compute_metrics,
... )

>>> trainer.train()

训练完成后，使用 push_to_hub()方法将您的模型共享到 Hub，以便每个人都可以使用您的模型：

>>> trainer.push_to_hub()

TensorFlowHide TensorFlow 内容

如果您不熟悉使用 Keras 微调模型，请查看这里的基本教程 here!

要在 TensorFlow 中微调模型，请首先设置优化器函数、学习率调度和一些训练超参数：

>>> from transformers import create_optimizer

>>> batch_size = 16
>>> num_train_epochs = 2
>>> total_train_steps = (len(tokenized_swag["train"]) // batch_size) * num_train_epochs
>>> optimizer, schedule = create_optimizer(init_lr=5e-5, num_warmup_steps=0, num_train_steps=total_train_steps)

然后，您可以使用 TFAutoModelForMultipleChoice 加载 BERT：

>>> from transformers import TFAutoModelForMultipleChoice

>>> model = TFAutoModelForMultipleChoice.from_pretrained("bert-base-uncased")

使用 prepare_tf_dataset()将数据集转换为tf.data.Dataset格式：

>>> data_collator = DataCollatorForMultipleChoice(tokenizer=tokenizer)
>>> tf_train_set = model.prepare_tf_dataset(
...     tokenized_swag["train"],
...     shuffle=True,
...     batch_size=batch_size,
...     collate_fn=data_collator,
... )

>>> tf_validation_set = model.prepare_tf_dataset(
...     tokenized_swag["validation"],
...     shuffle=False,
...     batch_size=batch_size,
...     collate_fn=data_collator,
... )

使用compile配置模型进行训练。请注意，Transformers 模型都具有默认的与任务相关的损失函数，因此除非您想要指定一个，否则不需要：

>>> model.compile(optimizer=optimizer)  # No loss argument!

在开始训练之前，设置最后两件事是从预测中计算准确性，并提供一种将模型推送到 Hub 的方法。这两个都可以使用 Keras callbacks 来完成。

将您的compute_metrics函数传递给 KerasMetricCallback：

>>> from transformers.keras_callbacks import KerasMetricCallback

>>> metric_callback = KerasMetricCallback(metric_fn=compute_metrics, eval_dataset=tf_validation_set)

指定在 PushToHubCallback 中推送您的模型和分词器的位置：

>>> from transformers.keras_callbacks import PushToHubCallback

>>> push_to_hub_callback = PushToHubCallback(
...     output_dir="my_awesome_model",
...     tokenizer=tokenizer,
... )

然后将您的回调捆绑在一起：

>>> callbacks = [metric_callback, push_to_hub_callback]

最后，您已经准备好开始训练您的模型了！调用fit，使用您的训练和验证数据集、时代数和回调来微调模型：

>>> model.fit(x=tf_train_set, validation_data=tf_validation_set, epochs=2, callbacks=callbacks)

训练完成后，您的模型将自动上传到 Hub，以便每个人都可以使用它！

要了解如何为多项选择微调模型的更深入示例，请查看相应的PyTorch 笔记本或TensorFlow 笔记本。

推理

很好，现在您已经对模型进行了微调，可以用于推理！

想出一些文本和两个候选答案：

>>> prompt = "France has a bread law, Le Décret Pain, with strict rules on what is allowed in a traditional baguette."
>>> candidate1 = "The law does not apply to croissants and brioche."
>>> candidate2 = "The law applies to baguettes."

Pytorch 隐藏 Pytorch 内容

将每个提示和候选答案对进行标记化，并返回 PyTorch 张量。您还应该创建一些标签：

>>> from transformers import AutoTokenizer

>>> tokenizer = AutoTokenizer.from_pretrained("my_awesome_swag_model")
>>> inputs = tokenizer([[prompt, candidate1], [prompt, candidate2]], return_tensors="pt", padding=True)
>>> labels = torch.tensor(0).unsqueeze(0)

将您的输入和标签传递给模型，并返回logits：

>>> from transformers import AutoModelForMultipleChoice

>>> model = AutoModelForMultipleChoice.from_pretrained("my_awesome_swag_model")
>>> outputs = model(**{k: v.unsqueeze(0) for k, v in inputs.items()}, labels=labels)
>>> logits = outputs.logits

获取具有最高概率的类：

>>> predicted_class = logits.argmax().item()
>>> predicted_class
'0'

TensorFlow 隐藏 TensorFlow 内容

对每个提示和候选答案对进行标记化，并返回 TensorFlow 张量：

>>> from transformers import AutoTokenizer

>>> tokenizer = AutoTokenizer.from_pretrained("my_awesome_swag_model")
>>> inputs = tokenizer([[prompt, candidate1], [prompt, candidate2]], return_tensors="tf", padding=True)

将您的输入传递给模型，并返回logits：

>>> from transformers import TFAutoModelForMultipleChoice

>>> model = TFAutoModelForMultipleChoice.from_pretrained("my_awesome_swag_model")
>>> inputs = {k: tf.expand_dims(v, 0) for k, v in inputs.items()}
>>> outputs = model(inputs)
>>> logits = outputs.logits

获取具有最高概率的类：

>>> predicted_class = int(tf.math.argmax(logits, axis=-1)[0])
>>> predicted_class
'0'

音频

音频分类

原始文本：huggingface.co/docs/transformers/v4.37.2/en/tasks/audio_classification

www.youtube-nocookie.com/embed/KWwzcmG98Ds

音频分类 - 就像文本一样 - 从输入数据中分配一个类标签输出。唯一的区别是，您有原始音频波形而不是文本输入。音频分类的一些实际应用包括识别说话者意图、语言分类，甚至通过声音识别动物物种。

本指南将向您展示如何：

在MInDS-14数据集上对Wav2Vec2进行微调，以分类说话者意图。
使用您微调的模型进行推断。

本教程中所示的任务由以下模型架构支持：

音频频谱变换器、Data2VecAudio、Hubert、SEW、SEW-D、UniSpeech、UniSpeechSat、Wav2Vec2、Wav2Vec2-BERT、Wav2Vec2-Conformer、WavLM、Whisper

在开始之前，请确保已安装所有必要的库：

pip install transformers datasets evaluate

我们鼓励您登录您的 Hugging Face 帐户，这样您就可以上传和分享您的模型给社区。在提示时，输入您的令牌以登录：

>>> from huggingface_hub import notebook_login

>>> notebook_login()

加载 MInDS-14 数据集

首先从🤗数据集库中加载 MInDS-14 数据集：

>>> from datasets import load_dataset, Audio

>>> minds = load_dataset("PolyAI/minds14", name="en-US", split="train")

使用train_test_split方法将数据集的train拆分为较小的训练集和测试集。这将让您有机会进行实验，并确保一切正常，然后再花更多时间处理完整数据集。

>>> minds = minds.train_test_split(test_size=0.2)

然后查看数据集：

>>> minds
DatasetDict({
    train: Dataset({
        features: ['path', 'audio', 'transcription', 'english_transcription', 'intent_class', 'lang_id'],
        num_rows: 450
    })
    test: Dataset({
        features: ['path', 'audio', 'transcription', 'english_transcription', 'intent_class', 'lang_id'],
        num_rows: 113
    })
})

虽然数据集包含许多有用信息，比如lang_id和english_transcription，但在本指南中，您将专注于audio和intent_class。使用remove_columns方法删除其他列：

>>> minds = minds.remove_columns(["path", "transcription", "english_transcription", "lang_id"])

现在看一个示例：

>>> minds["train"][0]
{'audio': {'array': array([ 0.        ,  0.        ,  0.        , ..., -0.00048828,
         -0.00024414, -0.00024414], dtype=float32),
  'path': '/root/.cache/huggingface/datasets/downloads/extracted/f14948e0e84be638dd7943ac36518a4cf3324e8b7aa331c5ab11541518e9368c/en-US~APP_ERROR/602b9a5fbb1e6d0fbce91f52.wav',
  'sampling_rate': 8000},
 'intent_class': 2}

有两个领域：

audio：必须调用的语音信号的一维array，以加载和重新采样音频文件。
intent_class：表示说话者意图的类别 ID。

为了让模型更容易从标签 ID 中获取标签名称，创建一个将标签名称映射到整数以及反之的字典：

>>> labels = minds["train"].features["intent_class"].names
>>> label2id, id2label = dict(), dict()
>>> for i, label in enumerate(labels):
...     label2id[label] = str(i)
...     id2label[str(i)] = label

现在您可以将标签 ID 转换为标签名称：

>>> id2label[str(2)]
'app_error'

预处理

下一步是加载 Wav2Vec2 特征提取器来处理音频信号：

>>> from transformers import AutoFeatureExtractor

>>> feature_extractor = AutoFeatureExtractor.from_pretrained("facebook/wav2vec2-base")

MInDS-14 数据集的采样率为 8000khz（您可以在其数据集卡片中找到此信息），这意味着您需要将数据集重新采样为 16000kHz 以使用预训练的 Wav2Vec2 模型：

>>> minds = minds.cast_column("audio", Audio(sampling_rate=16_000))
>>> minds["train"][0]
{'audio': {'array': array([ 2.2098757e-05,  4.6582241e-05, -2.2803260e-05, ...,
         -2.8419291e-04, -2.3305941e-04, -1.1425107e-04], dtype=float32),
  'path': '/root/.cache/huggingface/datasets/downloads/extracted/f14948e0e84be638dd7943ac36518a4cf3324e8b7aa331c5ab11541518e9368c/en-US~APP_ERROR/602b9a5fbb1e6d0fbce91f52.wav',
  'sampling_rate': 16000},
 'intent_class': 2}

现在创建一个预处理函数，该函数：

调用audio列进行加载，并在必要时重新采样音频文件。
检查音频文件的采样率是否与模型预训练时的音频数据的采样率匹配。您可以在 Wav2Vec2 的模型卡片中找到此信息。
设置最大输入长度以批处理更长的输入而不截断它们。

>>> def preprocess_function(examples):
...     audio_arrays = [x["array"] for x in examples["audio"]]
...     inputs = feature_extractor(
...         audio_arrays, sampling_rate=feature_extractor.sampling_rate, max_length=16000, truncation=True
...     )
...     return inputs

要在整个数据集上应用预处理函数，请使用🤗 Datasets map函数。您可以通过设置batched=True来加速map，以一次处理数据集的多个元素。删除您不需要的列，并将intent_class重命名为label，因为这是模型期望的名称：

>>> encoded_minds = minds.map(preprocess_function, remove_columns="audio", batched=True)
>>> encoded_minds = encoded_minds.rename_column("intent_class", "label")

评估

在训练过程中包含一个度量通常有助于评估模型的性能。您可以通过🤗 Evaluate库快速加载一个评估方法。对于这个任务，加载accuracy度量（查看🤗 Evaluate quick tour以了解如何加载和计算度量）：

>>> import evaluate

>>> accuracy = evaluate.load("accuracy")

然后创建一个函数，将您的预测和标签传递给compute以计算准确性：

>>> import numpy as np

>>> def compute_metrics(eval_pred):
...     predictions = np.argmax(eval_pred.predictions, axis=1)
...     return accuracy.compute(predictions=predictions, references=eval_pred.label_ids)

您的compute_metrics函数现在已经准备就绪，当您设置训练时将返回到它。

训练

Pytorch 隐藏 Pytorch 内容

如果您不熟悉使用 Trainer 微调模型，请查看这里的基本教程[../training#train-with-pytorch-trainer]！

现在您已经准备好开始训练您的模型了！使用 AutoModelForAudioClassification 加载 Wav2Vec2，以及预期标签的数量和标签映射：

>>> from transformers import AutoModelForAudioClassification, TrainingArguments, Trainer

>>> num_labels = len(id2label)
>>> model = AutoModelForAudioClassification.from_pretrained(
...     "facebook/wav2vec2-base", num_labels=num_labels, label2id=label2id, id2label=id2label
... )

此时，只剩下三个步骤：

在 TrainingArguments 中定义您的训练超参数。唯一必需的参数是output_dir，指定保存模型的位置。通过设置push_to_hub=True将此模型推送到 Hub（您需要登录 Hugging Face 才能上传模型）。在每个时代结束时，Trainer 将评估准确性并保存训练检查点。
将训练参数传递给 Trainer，以及模型、数据集、分词器、数据整理器和compute_metrics函数。
调用 train()来微调您的模型。

>>> training_args = TrainingArguments(
...     output_dir="my_awesome_mind_model",
...     evaluation_strategy="epoch",
...     save_strategy="epoch",
...     learning_rate=3e-5,
...     per_device_train_batch_size=32,
...     gradient_accumulation_steps=4,
...     per_device_eval_batch_size=32,
...     num_train_epochs=10,
...     warmup_ratio=0.1,
...     logging_steps=10,
...     load_best_model_at_end=True,
...     metric_for_best_model="accuracy",
...     push_to_hub=True,
... )

>>> trainer = Trainer(
...     model=model,
...     args=training_args,
...     train_dataset=encoded_minds["train"],
...     eval_dataset=encoded_minds["test"],
...     tokenizer=feature_extractor,
...     compute_metrics=compute_metrics,
... )

>>> trainer.train()

训练完成后，使用 push_to_hub()方法将您的模型共享到 Hub，这样每个人都可以使用您的模型：

>>> trainer.push_to_hub()

要了解如何为音频分类微调模型的更深入示例，请查看相应的PyTorch 笔记本。

推理

现在，您已经微调了一个模型，可以用它进行推理了！

加载您想要进行推理的音频文件。记得重新采样音频文件的采样率，以匹配模型的采样率（如果需要）！

>>> from datasets import load_dataset, Audio

>>> dataset = load_dataset("PolyAI/minds14", name="en-US", split="train")
>>> dataset = dataset.cast_column("audio", Audio(sampling_rate=16000))
>>> sampling_rate = dataset.features["audio"].sampling_rate
>>> audio_file = dataset[0]["audio"]["path"]

尝试使用一个 pipeline()来进行推理的最简单方法是在其中使用微调后的模型。使用您的模型实例化一个用于音频分类的pipeline，并将音频文件传递给它：

>>> from transformers import pipeline

>>> classifier = pipeline("audio-classification", model="stevhliu/my_awesome_minds_model")
>>> classifier(audio_file)
[
    {'score': 0.09766869246959686, 'label': 'cash_deposit'},
    {'score': 0.07998877018690109, 'label': 'app_error'},
    {'score': 0.0781070664525032, 'label': 'joint_account'},
    {'score': 0.07667109370231628, 'label': 'pay_bill'},
    {'score': 0.0755252093076706, 'label': 'balance'}
]

如果您愿意，也可以手动复制pipeline的结果：

Pytorch 隐藏 Pytorch 内容

加载一个特征提取器来预处理音频文件，并将input返回为 PyTorch 张量：

>>> from transformers import AutoFeatureExtractor

>>> feature_extractor = AutoFeatureExtractor.from_pretrained("stevhliu/my_awesome_minds_model")
>>> inputs = feature_extractor(dataset[0]["audio"]["array"], sampling_rate=sampling_rate, return_tensors="pt")

将您的输入传递给模型并返回 logits：

>>> from transformers import AutoModelForAudioClassification

>>> model = AutoModelForAudioClassification.from_pretrained("stevhliu/my_awesome_minds_model")
>>> with torch.no_grad():
...     logits = model(**inputs).logits

获取具有最高概率的类，并使用模型的id2label映射将其转换为标签：

>>> import torch

>>> predicted_class_ids = torch.argmax(logits).item()
>>> predicted_label = model.config.id2label[predicted_class_ids]
>>> predicted_label
'cash_deposit'

自动语音识别

原文链接：huggingface.co/docs/transformers/v4.37.2/en/tasks/asr

www.youtube-nocookie.com/embed/TksaY_FDgnk

自动语音识别（ASR）将语音信号转换为文本，将一系列音频输入映射到文本输出。虚拟助手如 Siri 和 Alexa 使用 ASR 模型帮助用户日常，还有许多其他有用的用户界面应用，如实时字幕和会议记录。

本指南将向您展示如何：

在MInDS-14数据集上对Wav2Vec2进行微调，将音频转录为文本。
使用您微调的模型进行推理。

本教程中演示的任务由以下模型架构支持：

Data2VecAudio, Hubert, M-CTC-T, SEW, SEW-D, UniSpeech, UniSpeechSat, Wav2Vec2, Wav2Vec2-BERT, Wav2Vec2-Conformer, WavLM

在开始之前，请确保已安装所有必要的库：

pip install transformers datasets evaluate jiwer

我们鼓励您登录您的 Hugging Face 账户，这样您就可以上传和与社区分享您的模型。在提示时，输入您的令牌以登录：

>>> from huggingface_hub import notebook_login

>>> notebook_login()

加载 MInDS-14 数据集

首先加载来自🤗数据集库的MInDS-14数据集的较小子集。这将让您有机会进行实验，并确保一切正常，然后再花更多时间在完整数据集上进行训练。

>>> from datasets import load_dataset, Audio

>>> minds = load_dataset("PolyAI/minds14", name="en-US", split="train[:100]")

使用~Dataset.train_test_split方法将数据集的train拆分为训练集和测试集：

>>> minds = minds.train_test_split(test_size=0.2)

然后查看数据集：

>>> minds
DatasetDict({
    train: Dataset({
        features: ['path', 'audio', 'transcription', 'english_transcription', 'intent_class', 'lang_id'],
        num_rows: 16
    })
    test: Dataset({
        features: ['path', 'audio', 'transcription', 'english_transcription', 'intent_class', 'lang_id'],
        num_rows: 4
    })
})

虽然数据集包含许多有用信息，如lang_id和english_transcription，但在本指南中，您将专注于audio和transcription。使用remove_columns方法删除其他列：

>>> minds = minds.remove_columns(["english_transcription", "intent_class", "lang_id"])

再次查看示例：

>>> minds["train"][0]
{'audio': {'array': array([-0.00024414,  0.        ,  0.        , ...,  0.00024414,
          0.00024414,  0.00024414], dtype=float32),
  'path': '/root/.cache/huggingface/datasets/downloads/extracted/f14948e0e84be638dd7943ac36518a4cf3324e8b7aa331c5ab11541518e9368c/en-US~APP_ERROR/602ba9e2963e11ccd901cd4f.wav',
  'sampling_rate': 8000},
 'path': '/root/.cache/huggingface/datasets/downloads/extracted/f14948e0e84be638dd7943ac36518a4cf3324e8b7aa331c5ab11541518e9368c/en-US~APP_ERROR/602ba9e2963e11ccd901cd4f.wav',
 'transcription': "hi I'm trying to use the banking app on my phone and currently my checking and savings account balance is not refreshing"}

有两个字段：

audio：必须调用的语音信号的一维array，用于加载和重采样音频文件。
transcription：目标文本。

预处理

接下来的步骤是加载一个 Wav2Vec2 处理器来处理音频信号：

>>> from transformers import AutoProcessor

>>> processor = AutoProcessor.from_pretrained("facebook/wav2vec2-base")

MInDS-14 数据集的采样率为 8000kHz（您可以在其数据集卡片中找到此信息），这意味着您需要将数据集重采样为 16000kHz 以使用预训练的 Wav2Vec2 模型：

>>> minds = minds.cast_column("audio", Audio(sampling_rate=16_000))
>>> minds["train"][0]
{'audio': {'array': array([-2.38064706e-04, -1.58618059e-04, -5.43987835e-06, ...,
          2.78103951e-04,  2.38446111e-04,  1.18740834e-04], dtype=float32),
  'path': '/root/.cache/huggingface/datasets/downloads/extracted/f14948e0e84be638dd7943ac36518a4cf3324e8b7aa331c5ab11541518e9368c/en-US~APP_ERROR/602ba9e2963e11ccd901cd4f.wav',
  'sampling_rate': 16000},
 'path': '/root/.cache/huggingface/datasets/downloads/extracted/f14948e0e84be638dd7943ac36518a4cf3324e8b7aa331c5ab11541518e9368c/en-US~APP_ERROR/602ba9e2963e11ccd901cd4f.wav',
 'transcription': "hi I'm trying to use the banking app on my phone and currently my checking and savings account balance is not refreshing"}

如上所示的transcription，文本包含大小写混合的字符。Wav2Vec2 分词器只训练大写字符，所以您需要确保文本与分词器的词汇匹配：

>>> def uppercase(example):
...     return {"transcription": example["transcription"].upper()}

>>> minds = minds.map(uppercase)

现在创建一个预处理函数，它：

调用audio列加载和重采样音频文件。
从音频文件中提取input_values并使用处理器对transcription列进行标记。

>>> def prepare_dataset(batch):
...     audio = batch["audio"]
...     batch = processor(audio["array"], sampling_rate=audio["sampling_rate"], text=batch["transcription"])
...     batch["input_length"] = len(batch["input_values"][0])
...     return batch

要在整个数据集上应用预处理函数，使用🤗数据集map函数。您可以通过增加num_proc参数来加快map的速度。使用remove_columns方法删除不需要的列：

>>> encoded_minds = minds.map(prepare_dataset, remove_columns=minds.column_names["train"], num_proc=4)

🤗 Transformers 没有用于 ASR 的数据整理器，因此您需要调整 DataCollatorWithPadding 以创建一批示例。它还会动态填充您的文本和标签到其批次中最长元素的长度（而不是整个数据集），以使它们具有统一的长度。虽然可以通过在tokenizer函数中设置padding=True来填充文本，但动态填充更有效。

与其他数据整理器不同，这个特定的数据整理器需要对input_values和labels应用不同的填充方法：

>>> import torch

>>> from dataclasses import dataclass, field
>>> from typing import Any, Dict, List, Optional, Union

>>> @dataclass
... class DataCollatorCTCWithPadding:
...     processor: AutoProcessor
...     padding: Union[bool, str] = "longest"

...     def __call__(self, features: List[Dict[str, Union[List[int], torch.Tensor]]]) -> Dict[str, torch.Tensor]:
...         # split inputs and labels since they have to be of different lengths and need
...         # different padding methods
...         input_features = [{"input_values": feature["input_values"][0]} for feature in features]
...         label_features = [{"input_ids": feature["labels"]} for feature in features]

...         batch = self.processor.pad(input_features, padding=self.padding, return_tensors="pt")

...         labels_batch = self.processor.pad(labels=label_features, padding=self.padding, return_tensors="pt")

...         # replace padding with -100 to ignore loss correctly
...         labels = labels_batch["input_ids"].masked_fill(labels_batch.attention_mask.ne(1), -100)

...         batch["labels"] = labels

...         return batch

现在实例化您的DataCollatorForCTCWithPadding：

>>> data_collator = DataCollatorCTCWithPadding(processor=processor, padding="longest")

评估

在训练过程中包含一个指标通常有助于评估模型的性能。您可以使用🤗 Evaluate库快速加载一个评估方法。对于这个任务，加载word error rate (WER)指标（查看🤗 Evaluate 快速入门以了解如何加载和计算指标）：

>>> import evaluate

>>> wer = evaluate.load("wer")

然后创建一个函数，将您的预测和标签传递给compute以计算 WER：

>>> import numpy as np

>>> def compute_metrics(pred):
...     pred_logits = pred.predictions
...     pred_ids = np.argmax(pred_logits, axis=-1)

...     pred.label_ids[pred.label_ids == -100] = processor.tokenizer.pad_token_id

...     pred_str = processor.batch_decode(pred_ids)
...     label_str = processor.batch_decode(pred.label_ids, group_tokens=False)

...     wer = wer.compute(predictions=pred_str, references=label_str)

...     return {"wer": wer}

您的compute_metrics函数已经准备就绪，当您设置训练时会返回到它。

训练

PytorchHide Pytorch 内容

如果您不熟悉使用 Trainer 微调模型，请查看这里的基本教程[training#train-with-pytorch-trainer]！

现在您已经准备好开始训练您的模型了！使用 AutoModelForCTC 加载 Wav2Vec2。指定要应用的减少量，使用ctc_loss_reduction参数。通常最好使用平均值而不是默认的求和：

>>> from transformers import AutoModelForCTC, TrainingArguments, Trainer

>>> model = AutoModelForCTC.from_pretrained(
...     "facebook/wav2vec2-base",
...     ctc_loss_reduction="mean",
...     pad_token_id=processor.tokenizer.pad_token_id,
... )

此时，只剩下三个步骤：

在 TrainingArguments 中定义您的训练超参数。唯一必需的参数是output_dir，指定保存模型的位置。通过设置push_to_hub=True将此模型推送到 Hub（您需要登录 Hugging Face 才能上传模型）。在每个时代结束时，Trainer 将评估 WER 并保存训练检查点。
将训练参数传递给 Trainer，同时还需要传递模型、数据集、分词器、数据整理器和compute_metrics函数。
调用 train()来微调您的模型。

>>> training_args = TrainingArguments(
...     output_dir="my_awesome_asr_mind_model",
...     per_device_train_batch_size=8,
...     gradient_accumulation_steps=2,
...     learning_rate=1e-5,
...     warmup_steps=500,
...     max_steps=2000,
...     gradient_checkpointing=True,
...     fp16=True,
...     group_by_length=True,
...     evaluation_strategy="steps",
...     per_device_eval_batch_size=8,
...     save_steps=1000,
...     eval_steps=1000,
...     logging_steps=25,
...     load_best_model_at_end=True,
...     metric_for_best_model="wer",
...     greater_is_better=False,
...     push_to_hub=True,
... )

>>> trainer = Trainer(
...     model=model,
...     args=training_args,
...     train_dataset=encoded_minds["train"],
...     eval_dataset=encoded_minds["test"],
...     tokenizer=processor,
...     data_collator=data_collator,
...     compute_metrics=compute_metrics,
... )

>>> trainer.train()

训练完成后，使用 push_to_hub()方法将您的模型共享到 Hub，以便每个人都可以使用您的模型：

>>> trainer.push_to_hub()

要了解如何为自动语音识别微调模型的更深入示例，请查看这篇博客post以获取英语 ASR，以及这篇post以获取多语言 ASR。

推理

很好，现在您已经微调了一个模型，可以用它进行推理！

加载要运行推理的音频文件。记得重新采样音频文件的采样率以匹配模型的采样率（如果需要的话）！

>>> from datasets import load_dataset, Audio

>>> dataset = load_dataset("PolyAI/minds14", "en-US", split="train")
>>> dataset = dataset.cast_column("audio", Audio(sampling_rate=16000))
>>> sampling_rate = dataset.features["audio"].sampling_rate
>>> audio_file = dataset[0]["audio"]["path"]

尝试使用 pipeline()来进行推理是尝试您微调模型的最简单方法。使用您的模型实例化一个用于自动语音识别的pipeline，并将音频文件传递给它：

>>> from transformers import pipeline

>>> transcriber = pipeline("automatic-speech-recognition", model="stevhliu/my_awesome_asr_minds_model")
>>> transcriber(audio_file)
{'text': 'I WOUD LIKE O SET UP JOINT ACOUNT WTH Y PARTNER'}

转录结果还不错，但可以更好！尝试在更多示例上微调您的模型，以获得更好的结果！

如果您愿意，也可以手动复制pipeline的结果：

Pytorch 隐藏 Pytorch 内容

加载处理器以预处理音频文件和转录，并将input返回为 PyTorch 张量：

>>> from transformers import AutoProcessor

>>> processor = AutoProcessor.from_pretrained("stevhliu/my_awesome_asr_mind_model")
>>> inputs = processor(dataset[0]["audio"]["array"], sampling_rate=sampling_rate, return_tensors="pt")

将您的输入传递给模型并返回 logits：

>>> from transformers import AutoModelForCTC

>>> model = AutoModelForCTC.from_pretrained("stevhliu/my_awesome_asr_mind_model")
>>> with torch.no_grad():
...     logits = model(**inputs).logits

获取具有最高概率的预测input_ids，并使用处理器将预测的input_ids解码回文本：

>>> import torch

>>> predicted_ids = torch.argmax(logits, dim=-1)
>>> transcription = processor.batch_decode(predicted_ids)
>>> transcription
['I WOUL LIKE O SET UP JOINT ACOUNT WTH Y PARTNER']

计算机视觉

图像分类

原始文本：huggingface.co/docs/transformers/v4.37.2/en/tasks/image_classification

www.youtube-nocookie.com/embed/tjAIM7BOYhw

图像分类为图像分配一个标签或类别。与文本或音频分类不同，输入是组成图像的像素值。图像分类有许多应用，例如在自然灾害后检测损坏、监测作物健康或帮助筛查医学图像中的疾病迹象。

本指南说明了如何：

在Food-101数据集上对 ViT 进行微调，以对图像中的食物项目进行分类。
使用您微调的模型进行推断。

本教程中所示的任务由以下模型架构支持：

BEiT、BiT、ConvNeXT、ConvNeXTV2、CvT、Data2VecVision、DeiT、DiNAT、DINOv2、EfficientFormer、EfficientNet、FocalNet、ImageGPT、LeViT、MobileNetV1、MobileNetV2、MobileViT、MobileViTV2、NAT、Perceiver、PoolFormer、PVT、RegNet、ResNet、SegFormer、SwiftFormer、Swin Transformer、Swin Transformer V2、VAN、ViT、ViT Hybrid、ViTMSN

在开始之前，请确保您已安装所有必要的库：

pip install transformers datasets evaluate

我们鼓励您登录您的 Hugging Face 帐户，以便上传和与社区分享您的模型。在提示时，输入您的令牌以登录：

>>> from huggingface_hub import notebook_login

>>> notebook_login()

加载 Food-101 数据集

首先从🤗数据集库中加载 Food-101 数据集的一个较小子集。这将让您有机会进行实验，并确保一切正常，然后再花更多时间在完整数据集上进行训练。

>>> from datasets import load_dataset

>>> food = load_dataset("food101", split="train[:5000]")

使用train_test_split方法将数据集的train拆分为训练集和测试集：

>>> food = food.train_test_split(test_size=0.2)

然后看一个示例：

>>> food["train"][0]
{'image': <PIL.JpegImagePlugin.JpegImageFile image mode=RGB size=512x512 at 0x7F52AFC8AC50>,
 'label': 79}

数据集中的每个示例都有两个字段：

image：食物项目的 PIL 图像
label：食物项目的标签类别

为了使模型更容易从标签 ID 获取标签名称，创建一个将标签名称映射到整数及反之的字典：

>>> labels = food["train"].features["label"].names
>>> label2id, id2label = dict(), dict()
>>> for i, label in enumerate(labels):
...     label2id[label] = str(i)
...     id2label[str(i)] = label

现在您可以将标签 ID 转换为标签名称：

>>> id2label[str(79)]
'prime_rib'

预处理

下一步是加载一个 ViT 图像处理器，将图像处理为张量：

>>> from transformers import AutoImageProcessor

>>> checkpoint = "google/vit-base-patch16-224-in21k"
>>> image_processor = AutoImageProcessor.from_pretrained(checkpoint)

PytorchHide Pytorch content

对图像应用一些图像转换，使模型更具抗过拟合能力。在这里，您将使用 torchvision 的transforms模块，但您也可以使用您喜欢的任何图像库。

裁剪图像的随机部分，调整大小，并使用图像的均值和标准差进行归一化：

>>> from torchvision.transforms import RandomResizedCrop, Compose, Normalize, ToTensor

>>> normalize = Normalize(mean=image_processor.image_mean, std=image_processor.image_std)
>>> size = (
...     image_processor.size["shortest_edge"]
...     if "shortest_edge" in image_processor.size
...     else (image_processor.size["height"], image_processor.size["width"])
... )
>>> _transforms = Compose([RandomResizedCrop(size), ToTensor(), normalize])

然后创建一个预处理函数来应用转换并返回pixel_values - 图像的模型输入：

>>> def transforms(examples):
...     examples["pixel_values"] = [_transforms(img.convert("RGB")) for img in examples["image"]]
...     del examples["image"]
...     return examples

要在整个数据集上应用预处理函数，请使用🤗数据集的with_transform方法。当加载数据集的元素时，转换会即时应用：

>>> food = food.with_transform(transforms)

现在使用 DefaultDataCollator 创建一批示例。与🤗 Transformers 中的其他数据整理器不同，DefaultDataCollator不会应用额外的预处理，如填充。

>>> from transformers import DefaultDataCollator

>>> data_collator = DefaultDataCollator()

TensorFlow 隐藏 TensorFlow 内容

为了避免过拟合并使模型更加健壮，在数据集的训练部分添加一些数据增强。在这里，我们使用 Keras 预处理层来定义训练数据（包括数据增强）的转换，以及验证数据（仅中心裁剪、调整大小和归一化）的转换。您可以使用tf.image或您喜欢的任何其他库。

>>> from tensorflow import keras
>>> from tensorflow.keras import layers

>>> size = (image_processor.size["height"], image_processor.size["width"])

>>> train_data_augmentation = keras.Sequential(
...     [
...         layers.RandomCrop(size[0], size[1]),
...         layers.Rescaling(scale=1.0 / 127.5, offset=-1),
...         layers.RandomFlip("horizontal"),
...         layers.RandomRotation(factor=0.02),
...         layers.RandomZoom(height_factor=0.2, width_factor=0.2),
...     ],
...     name="train_data_augmentation",
... )

>>> val_data_augmentation = keras.Sequential(
...     [
...         layers.CenterCrop(size[0], size[1]),
...         layers.Rescaling(scale=1.0 / 127.5, offset=-1),
...     ],
...     name="val_data_augmentation",
... )

接下来，创建函数将适当的转换应用于一批图像，而不是一次一个图像。

>>> import numpy as np
>>> import tensorflow as tf
>>> from PIL import Image

>>> def convert_to_tf_tensor(image: Image):
...     np_image = np.array(image)
...     tf_image = tf.convert_to_tensor(np_image)
...     # `expand_dims()` is used to add a batch dimension since
...     # the TF augmentation layers operates on batched inputs.
...     return tf.expand_dims(tf_image, 0)

>>> def preprocess_train(example_batch):
...     """Apply train_transforms across a batch."""
...     images = [
...         train_data_augmentation(convert_to_tf_tensor(image.convert("RGB"))) for image in example_batch["image"]
...     ]
...     example_batch["pixel_values"] = [tf.transpose(tf.squeeze(image)) for image in images]
...     return example_batch

... def preprocess_val(example_batch):
...     """Apply val_transforms across a batch."""
...     images = [
...         val_data_augmentation(convert_to_tf_tensor(image.convert("RGB"))) for image in example_batch["image"]
...     ]
...     example_batch["pixel_values"] = [tf.transpose(tf.squeeze(image)) for image in images]
...     return example_batch

使用🤗 Datasets set_transform在运行时应用转换：

food["train"].set_transform(preprocess_train)
food["test"].set_transform(preprocess_val)

作为最后的预处理步骤，使用DefaultDataCollator创建一批示例。与🤗 Transformers 中的其他数据整理器不同，DefaultDataCollator不会应用额外的预处理，如填充。

>>> from transformers import DefaultDataCollator

>>> data_collator = DefaultDataCollator(return_tensors="tf")

评估

在训练过程中包含一个度量通常有助于评估模型的性能。您可以使用🤗 Evaluate库快速加载评估方法。对于此任务，加载accuracy度量（查看🤗 Evaluate 快速导览以了解如何加载和计算度量）：

>>> import evaluate

>>> accuracy = evaluate.load("accuracy")

然后创建一个函数，将您的预测和标签传递给compute以计算准确性：

>>> import numpy as np

>>> def compute_metrics(eval_pred):
...     predictions, labels = eval_pred
...     predictions = np.argmax(predictions, axis=1)
...     return accuracy.compute(predictions=predictions, references=labels)

您的compute_metrics函数现在已经准备就绪，当您设置训练时会返回到它。

训练

Pytorch 隐藏 Pytorch 内容

如果您不熟悉使用 Trainer 对模型进行微调，请查看基本教程这里！

现在您可以开始训练您的模型了！使用 AutoModelForImageClassification 加载 ViT。指定标签数量以及预期标签数量和标签映射：

>>> from transformers import AutoModelForImageClassification, TrainingArguments, Trainer

>>> model = AutoModelForImageClassification.from_pretrained(
...     checkpoint,
...     num_labels=len(labels),
...     id2label=id2label,
...     label2id=label2id,
... )

在这一点上，只剩下三个步骤：

在 TrainingArguments 中定义您的训练超参数。重要的是不要删除未使用的列，因为那会删除image列。没有image列，您就无法创建pixel_values。设置remove_unused_columns=False以防止这种行为！唯一的其他必需参数是output_dir，指定保存模型的位置。通过设置push_to_hub=True将此模型推送到 Hub（您需要登录 Hugging Face 才能上传您的模型）。在每个 epoch 结束时，Trainer 将评估准确性并保存训练检查点。
将训练参数传递给 Trainer，以及模型、数据集、分词器、数据整理器和compute_metrics函数。
调用 train()来微调您的模型。

>>> training_args = TrainingArguments(
...     output_dir="my_awesome_food_model",
...     remove_unused_columns=False,
...     evaluation_strategy="epoch",
...     save_strategy="epoch",
...     learning_rate=5e-5,
...     per_device_train_batch_size=16,
...     gradient_accumulation_steps=4,
...     per_device_eval_batch_size=16,
...     num_train_epochs=3,
...     warmup_ratio=0.1,
...     logging_steps=10,
...     load_best_model_at_end=True,
...     metric_for_best_model="accuracy",
...     push_to_hub=True,
... )

>>> trainer = Trainer(
...     model=model,
...     args=training_args,
...     data_collator=data_collator,
...     train_dataset=food["train"],
...     eval_dataset=food["test"],
...     tokenizer=image_processor,
...     compute_metrics=compute_metrics,
... )

>>> trainer.train()

训练完成后，使用 push_to_hub()方法将您的模型共享到 Hub，这样每个人都可以使用您的模型：

>>> trainer.push_to_hub()

TensorFlow 隐藏 TensorFlow 内容

如果您不熟悉使用 Keras 微调模型，请先查看基本教程！

要在 TensorFlow 中微调模型，请按照以下步骤进行：

定义训练超参数，并设置优化器和学习率调度。
实例化一个预训练模型。
将🤗数据集转换为tf.data.Dataset。
编译您的模型。
添加回调并使用fit()方法运行训练。
将您的模型上传到🤗 Hub 以与社区共享。

首先定义超参数、优化器和学习率调度：

>>> from transformers import create_optimizer

>>> batch_size = 16
>>> num_epochs = 5
>>> num_train_steps = len(food["train"]) * num_epochs
>>> learning_rate = 3e-5
>>> weight_decay_rate = 0.01

>>> optimizer, lr_schedule = create_optimizer(
...     init_lr=learning_rate,
...     num_train_steps=num_train_steps,
...     weight_decay_rate=weight_decay_rate,
...     num_warmup_steps=0,
... )

然后，使用 TFAutoModelForImageClassification 加载 ViT 以及标签映射：

>>> from transformers import TFAutoModelForImageClassification

>>> model = TFAutoModelForImageClassification.from_pretrained(
...     checkpoint,
...     id2label=id2label,
...     label2id=label2id,
... )

使用to_tf_dataset和您的data_collator将数据集转换为tf.data.Dataset格式：

>>> # converting our train dataset to tf.data.Dataset
>>> tf_train_dataset = food["train"].to_tf_dataset(
...     columns="pixel_values", label_cols="label", shuffle=True, batch_size=batch_size, collate_fn=data_collator
... )

>>> # converting our test dataset to tf.data.Dataset
>>> tf_eval_dataset = food["test"].to_tf_dataset(
...     columns="pixel_values", label_cols="label", shuffle=True, batch_size=batch_size, collate_fn=data_collator
... )

使用compile()配置模型进行训练：

>>> from tensorflow.keras.losses import SparseCategoricalCrossentropy

>>> loss = tf.keras.losses.SparseCategoricalCrossentropy(from_logits=True)
>>> model.compile(optimizer=optimizer, loss=loss)

要从预测中计算准确性并将模型推送到🤗 Hub，请使用 Keras 回调。将您的compute_metrics函数传递给 KerasMetricCallback，并使用 PushToHubCallback 上传模型：

>>> from transformers.keras_callbacks import KerasMetricCallback, PushToHubCallback

>>> metric_callback = KerasMetricCallback(metric_fn=compute_metrics, eval_dataset=tf_eval_dataset)
>>> push_to_hub_callback = PushToHubCallback(
...     output_dir="food_classifier",
...     tokenizer=image_processor,
...     save_strategy="no",
... )
>>> callbacks = [metric_callback, push_to_hub_callback]

最后，您已经准备好训练您的模型了！使用您的训练和验证数据集、时代数和回调来微调模型调用fit()：

>>> model.fit(tf_train_dataset, validation_data=tf_eval_dataset, epochs=num_epochs, callbacks=callbacks)
Epoch 1/5
250/250 [==============================] - 313s 1s/step - loss: 2.5623 - val_loss: 1.4161 - accuracy: 0.9290
Epoch 2/5
250/250 [==============================] - 265s 1s/step - loss: 0.9181 - val_loss: 0.6808 - accuracy: 0.9690
Epoch 3/5
250/250 [==============================] - 252s 1s/step - loss: 0.3910 - val_loss: 0.4303 - accuracy: 0.9820
Epoch 4/5
250/250 [==============================] - 251s 1s/step - loss: 0.2028 - val_loss: 0.3191 - accuracy: 0.9900
Epoch 5/5
250/250 [==============================] - 238s 949ms/step - loss: 0.1232 - val_loss: 0.3259 - accuracy: 0.9890

恭喜！您已经对模型进行了微调，并在🤗 Hub 上共享。现在您可以用它进行推理！

要了解如何为图像分类微调模型的更深入示例，请查看相应的PyTorch 笔记本。

推理

太棒了，现在您已经对模型进行了微调，可以用于推理！

加载要运行推理的图像：

>>> ds = load_dataset("food101", split="validation[:10]")
>>> image = ds["image"][0]

尝试使用您微调的模型进行推理的最简单方法是在 pipeline()中使用它。使用您的模型实例化一个用于图像分类的pipeline，并将图像传递给它：

>>> from transformers import pipeline

>>> classifier = pipeline("image-classification", model="my_awesome_food_model")
>>> classifier(image)
[{'score': 0.31856709718704224, 'label': 'beignets'},
 {'score': 0.015232225880026817, 'label': 'bruschetta'},
 {'score': 0.01519392803311348, 'label': 'chicken_wings'},
 {'score': 0.013022331520915031, 'label': 'pork_chop'},
 {'score': 0.012728818692266941, 'label': 'prime_rib'}]

如果愿意，您也可以手动复制pipeline的结果：

PytorchHide Pytorch 内容

加载图像处理器以预处理图像并将input返回为 PyTorch 张量：

>>> from transformers import AutoImageProcessor
>>> import torch

>>> image_processor = AutoImageProcessor.from_pretrained("my_awesome_food_model")
>>> inputs = image_processor(image, return_tensors="pt")

将输入传递给模型并返回 logits：

>>> from transformers import AutoModelForImageClassification

>>> model = AutoModelForImageClassification.from_pretrained("my_awesome_food_model")
>>> with torch.no_grad():
...     logits = model(**inputs).logits

获取具有最高概率的预测标签，并使用模型的id2label映射将其转换为标签：

>>> predicted_label = logits.argmax(-1).item()
>>> model.config.id2label[predicted_label]
'beignets'

TensorFlowHide TensorFlow 内容

加载图像处理器以预处理图像并将input返回为 TensorFlow 张量：

>>> from transformers import AutoImageProcessor

>>> image_processor = AutoImageProcessor.from_pretrained("MariaK/food_classifier")
>>> inputs = image_processor(image, return_tensors="tf")

将输入传递给模型并返回 logits：

>>> from transformers import TFAutoModelForImageClassification

>>> model = TFAutoModelForImageClassification.from_pretrained("MariaK/food_classifier")
>>> logits = model(**inputs).logits

获取具有最高概率的预测标签，并使用模型的id2label映射将其转换为标签：

>>> predicted_class_id = int(tf.math.argmax(logits, axis=-1)[0])
>>> model.config.id2label[predicted_class_id]
'beignets'

图像分割

原始文本：huggingface.co/docs/transformers/v4.37.2/en/tasks/semantic_segmentation

www.youtube-nocookie.com/embed/dKE8SIt9C-w

图像分割模型将图像中对应不同感兴趣区域的区域分开。这些模型通过为每个像素分配一个标签来工作。有几种类型的分割：语义分割、实例分割和全景分割。

在本指南中，我们将：

查看不同类型的分割。
有一个用于语义分割的端到端微调示例。

在开始之前，请确保已安装所有必要的库：

pip install -q datasets transformers evaluate

我们鼓励您登录您的 Hugging Face 帐户，这样您就可以上传和与社区分享您的模型。在提示时，输入您的令牌以登录：

>>> from huggingface_hub import notebook_login

>>> notebook_login()

分割类型

语义分割为图像中的每个像素分配一个标签或类。让我们看一下语义分割模型的输出。它将为图像中遇到的每个对象实例分配相同的类，例如，所有猫都将被标记为“cat”而不是“cat-1”、“cat-2”。我们可以使用 transformers 的图像分割管道快速推断一个语义分割模型。让我们看一下示例图像。

from transformers import pipeline
from PIL import Image
import requests

url = "https://huggingface.co/datasets/huggingface/documentation-images/resolve/main/transformers/tasks/segmentation_input.jpg"
image = Image.open(requests.get(url, stream=True).raw)
image

我们将使用nvidia/segformer-b1-finetuned-cityscapes-1024-1024。

semantic_segmentation = pipeline("image-segmentation", "nvidia/segformer-b1-finetuned-cityscapes-1024-1024")
results = semantic_segmentation(image)
results

分割管道输出包括每个预测类的掩码。

[{'score': None,
  'label': 'road',
  'mask': <PIL.Image.Image image mode=L size=612x415>},
 {'score': None,
  'label': 'sidewalk',
  'mask': <PIL.Image.Image image mode=L size=612x415>},
 {'score': None,
  'label': 'building',
  'mask': <PIL.Image.Image image mode=L size=612x415>},
 {'score': None,
  'label': 'wall',
  'mask': <PIL.Image.Image image mode=L size=612x415>},
 {'score': None,
  'label': 'pole',
  'mask': <PIL.Image.Image image mode=L size=612x415>},
 {'score': None,
  'label': 'traffic sign',
  'mask': <PIL.Image.Image image mode=L size=612x415>},
 {'score': None,
  'label': 'vegetation',
  'mask': <PIL.Image.Image image mode=L size=612x415>},
 {'score': None,
  'label': 'terrain',
  'mask': <PIL.Image.Image image mode=L size=612x415>},
 {'score': None,
  'label': 'sky',
  'mask': <PIL.Image.Image image mode=L size=612x415>},
 {'score': None,
  'label': 'car',
  'mask': <PIL.Image.Image image mode=L size=612x415>}]

查看汽车类的掩码，我们可以看到每辆汽车都被分类为相同的掩码。

results[-1]["mask"]

在实例分割中，目标不是对每个像素进行分类，而是为给定图像中的每个对象实例预测一个掩码。它的工作方式与目标检测非常相似，其中每个实例都有一个边界框，而这里有一个分割掩码。我们将使用facebook/mask2former-swin-large-cityscapes-instance。

instance_segmentation = pipeline("image-segmentation", "facebook/mask2former-swin-large-cityscapes-instance")
results = instance_segmentation(Image.open(image))
results

如下所示，有多辆汽车被分类，除了属于汽车和人实例的像素之外，没有对其他像素进行分类。

[{'score': 0.999944,
  'label': 'car',
  'mask': <PIL.Image.Image image mode=L size=612x415>},
 {'score': 0.999945,
  'label': 'car',
  'mask': <PIL.Image.Image image mode=L size=612x415>},
 {'score': 0.999652,
  'label': 'car',
  'mask': <PIL.Image.Image image mode=L size=612x415>},
 {'score': 0.903529,
  'label': 'person',
  'mask': <PIL.Image.Image image mode=L size=612x415>}]

查看下面的一辆汽车掩码。

results[2]["mask"]

全景分割结合了语义分割和实例分割，其中每个像素被分类为一个类和该类的一个实例，并且每个类的每个实例有多个掩码。我们可以使用facebook/mask2former-swin-large-cityscapes-panoptic。

panoptic_segmentation = pipeline("image-segmentation", "facebook/mask2former-swin-large-cityscapes-panoptic")
results = panoptic_segmentation(Image.open(image))
results

如下所示，我们有更多的类。稍后我们将说明，每个像素都被分类为其中的一个类。

[{'score': 0.999981,
  'label': 'car',
  'mask': <PIL.Image.Image image mode=L size=612x415>},
 {'score': 0.999958,
  'label': 'car',
  'mask': <PIL.Image.Image image mode=L size=612x415>},
 {'score': 0.99997,
  'label': 'vegetation',
  'mask': <PIL.Image.Image image mode=L size=612x415>},
 {'score': 0.999575,
  'label': 'pole',
  'mask': <PIL.Image.Image image mode=L size=612x415>},
 {'score': 0.999958,
  'label': 'building',
  'mask': <PIL.Image.Image image mode=L size=612x415>},
 {'score': 0.999634,
  'label': 'road',
  'mask': <PIL.Image.Image image mode=L size=612x415>},
 {'score': 0.996092,
  'label': 'sidewalk',
  'mask': <PIL.Image.Image image mode=L size=612x415>},
 {'score': 0.999221,
  'label': 'car',
  'mask': <PIL.Image.Image image mode=L size=612x415>},
 {'score': 0.99987,
  'label': 'sky',
  'mask': <PIL.Image.Image image mode=L size=612x415>}]

让我们对所有类型的分割进行一次并排比较。

看到所有类型的分割，让我们深入研究为语义分割微调模型。

语义分割的常见实际应用包括训练自动驾驶汽车识别行人和重要的交通信息，识别医学图像中的细胞和异常，以及监测卫星图像中的环境变化。

为分割微调模型

我们现在将：

在SceneParse150数据集上对SegFormer进行微调。
使用您微调的模型进行推断。

本教程中演示的任务由以下模型架构支持：

BEiT, Data2VecVision, DPT, MobileNetV2, MobileViT, MobileViTV2, SegFormer, UPerNet

加载 SceneParse150 数据集

首先从 🤗 数据集库中加载 SceneParse150 数据集的一个较小子集。这将让您有机会进行实验，并确保一切正常，然后再花更多时间在完整数据集上进行训练。

>>> from datasets import load_dataset

>>> ds = load_dataset("scene_parse_150", split="train[:50]")

使用 train_test_split 方法将数据集的 train 分割为训练集和测试集：

>>> ds = ds.train_test_split(test_size=0.2)
>>> train_ds = ds["train"]
>>> test_ds = ds["test"]

然后看一个例子：

>>> train_ds[0]
{'image': <PIL.JpegImagePlugin.JpegImageFile image mode=RGB size=512x683 at 0x7F9B0C201F90>,
 'annotation': <PIL.PngImagePlugin.PngImageFile image mode=L size=512x683 at 0x7F9B0C201DD0>,
 'scene_category': 368}

image：场景的 PIL 图像。
annotation：分割地图的 PIL 图像，也是模型的目标。
scene_category：描述图像场景的类别 id，如“厨房”或“办公室”。在本指南中，您只需要 image 和 annotation，两者都是 PIL 图像。

您还需要创建一个将标签 id 映射到标签类的字典，这在稍后设置模型时会很有用。从 Hub 下载映射并创建 id2label 和 label2id 字典：

>>> import json
>>> from huggingface_hub import cached_download, hf_hub_url

>>> repo_id = "huggingface/label-files"
>>> filename = "ade20k-id2label.json"
>>> id2label = json.load(open(cached_download(hf_hub_url(repo_id, filename, repo_type="dataset")), "r"))
>>> id2label = {int(k): v for k, v in id2label.items()}
>>> label2id = {v: k for k, v in id2label.items()}
>>> num_labels = len(id2label)

自定义数据集

如果您更喜欢使用 run_semantic_segmentation.py 脚本而不是笔记本实例进行训练，您也可以创建并使用自己的数据集。该脚本需要：

一个包含两个 Image 列“image”和“label”的 DatasetDict。

from datasets import Dataset, DatasetDict, Image

image_paths_train = ["path/to/image_1.jpg/jpg", "path/to/image_2.jpg/jpg", ..., "path/to/image_n.jpg/jpg"]
label_paths_train = ["path/to/annotation_1.png", "path/to/annotation_2.png", ..., "path/to/annotation_n.png"]

image_paths_validation = [...]
label_paths_validation = [...]

def create_dataset(image_paths, label_paths):
    dataset = Dataset.from_dict({"image": sorted(image_paths),
                                "label": sorted(label_paths)})
    dataset = dataset.cast_column("image", Image())
    dataset = dataset.cast_column("label", Image())
    return dataset

# step 1: create Dataset objects
train_dataset = create_dataset(image_paths_train, label_paths_train)
validation_dataset = create_dataset(image_paths_validation, label_paths_validation)

# step 2: create DatasetDict
dataset = DatasetDict({
     "train": train_dataset,
     "validation": validation_dataset,
     }
)

# step 3: push to Hub (assumes you have ran the huggingface-cli login command in a terminal/notebook)
dataset.push_to_hub("your-name/dataset-repo")

# optionally, you can push to a private repo on the Hub
# dataset.push_to_hub("name of repo on the hub", private=True)

一个 id2label 字典，将类整数映射到它们的类名

import json
# simple example
id2label = {0: 'cat', 1: 'dog'}
with open('id2label.json', 'w') as fp:
json.dump(id2label, fp)

例如，查看这个示例数据集，该数据集是使用上述步骤创建的。

预处理

下一步是加载一个 SegFormer 图像处理器，准备图像和注释以供模型使用。某些数据集，如此类数据集，使用零索引作为背景类。但是，背景类实际上不包括在 150 个类中，因此您需要设置 reduce_labels=True，从所有标签中减去一个。零索引被替换为 255，因此 SegFormer 的损失函数会忽略它：

>>> from transformers import AutoImageProcessor

>>> checkpoint = "nvidia/mit-b0"
>>> image_processor = AutoImageProcessor.from_pretrained(checkpoint, reduce_labels=True)

Pytorch 隐藏 Pytorch 内容

通常会对图像数据集应用一些数据增强，以使模型更具抗过拟合能力。在本指南中，您将使用 ColorJitter 函数从 torchvision 随机更改图像的颜色属性，但您也可以使用任何您喜欢的图像库。

>>> from torchvision.transforms import ColorJitter

>>> jitter = ColorJitter(brightness=0.25, contrast=0.25, saturation=0.25, hue=0.1)

现在创建两个预处理函数，准备图像和注释以供模型使用。这些函数将图像转换为 pixel_values，将注释转换为 labels。对于训练集，在将图像提供给图像处理器之前应用 jitter。对于测试集，图像处理器裁剪和规范化 images，仅裁剪 labels，因为在测试期间不应用数据增强。

>>> def train_transforms(example_batch):
...     images = [jitter(x) for x in example_batch["image"]]
...     labels = [x for x in example_batch["annotation"]]
...     inputs = image_processor(images, labels)
...     return inputs

>>> def val_transforms(example_batch):
...     images = [x for x in example_batch["image"]]
...     labels = [x for x in example_batch["annotation"]]
...     inputs = image_processor(images, labels)
...     return inputs

要在整个数据集上应用 jitter，请使用 🤗 数据集 set_transform 函数。变换是实时应用的，速度更快，占用的磁盘空间更少：

>>> train_ds.set_transform(train_transforms)
>>> test_ds.set_transform(val_transforms)

TensorFlow 隐藏 TensorFlow 内容

对图像数据集应用一些数据增强是常见的，可以使模型更具抗过拟合能力。在本指南中，您将使用tf.image来随机更改图像的颜色属性，但您也可以使用任何您喜欢的图像库。定义两个单独的转换函数：

包括图像增强的训练数据转换
验证数据转换仅转置图像，因为🤗 Transformers 中的计算机视觉模型期望通道优先布局

>>> import tensorflow as tf

>>> def aug_transforms(image):
...     image = tf.keras.utils.img_to_array(image)
...     image = tf.image.random_brightness(image, 0.25)
...     image = tf.image.random_contrast(image, 0.5, 2.0)
...     image = tf.image.random_saturation(image, 0.75, 1.25)
...     image = tf.image.random_hue(image, 0.1)
...     image = tf.transpose(image, (2, 0, 1))
...     return image

>>> def transforms(image):
...     image = tf.keras.utils.img_to_array(image)
...     image = tf.transpose(image, (2, 0, 1))
...     return image

接下来，创建两个预处理函数，用于为模型准备图像和注释的批处理。这些函数应用图像转换，并使用之前加载的image_processor将图像转换为pixel_values，将注释转换为labels。ImageProcessor还负责调整大小和规范化图像。

>>> def train_transforms(example_batch):
...     images = [aug_transforms(x.convert("RGB")) for x in example_batch["image"]]
...     labels = [x for x in example_batch["annotation"]]
...     inputs = image_processor(images, labels)
...     return inputs

>>> def val_transforms(example_batch):
...     images = [transforms(x.convert("RGB")) for x in example_batch["image"]]
...     labels = [x for x in example_batch["annotation"]]
...     inputs = image_processor(images, labels)
...     return inputs

要在整个数据集上应用预处理转换，使用🤗 Datasets set_transform函数。转换是实时应用的，速度更快，占用的磁盘空间更少：

>>> train_ds.set_transform(train_transforms)
>>> test_ds.set_transform(val_transforms)

评估

在训练过程中包含一个度量标准通常有助于评估模型的性能。您可以使用🤗 Evaluate库快速加载一个评估方法。对于这个任务，加载mean Intersection over Union (IoU)度量标准（查看🤗 Evaluate quick tour以了解如何加载和计算度量标准）：

>>> import evaluate

>>> metric = evaluate.load("mean_iou")

然后创建一个函数来compute度量标准。您的预测需要首先转换为 logits，然后重新调整形状以匹配标签的大小，然后才能调用compute：

PytorchHide Pytorch 内容

>>> import numpy as np
>>> import torch
>>> from torch import nn

>>> def compute_metrics(eval_pred):
...     with torch.no_grad():
...         logits, labels = eval_pred
...         logits_tensor = torch.from_numpy(logits)
...         logits_tensor = nn.functional.interpolate(
...             logits_tensor,
...             size=labels.shape[-2:],
...             mode="bilinear",
...             align_corners=False,
...         ).argmax(dim=1)

...         pred_labels = logits_tensor.detach().cpu().numpy()
...         metrics = metric.compute(
...             predictions=pred_labels,
...             references=labels,
...             num_labels=num_labels,
...             ignore_index=255,
...             reduce_labels=False,
...         )
...         for key, value in metrics.items():
...             if isinstance(value, np.ndarray):
...                 metrics[key] = value.tolist()
...         return metrics

TensorFlowHide TensorFlow 内容

>>> def compute_metrics(eval_pred):
...     logits, labels = eval_pred
...     logits = tf.transpose(logits, perm=[0, 2, 3, 1])
...     logits_resized = tf.image.resize(
...         logits,
...         size=tf.shape(labels)[1:],
...         method="bilinear",
...     )

...     pred_labels = tf.argmax(logits_resized, axis=-1)
...     metrics = metric.compute(
...         predictions=pred_labels,
...         references=labels,
...         num_labels=num_labels,
...         ignore_index=-1,
...         reduce_labels=image_processor.do_reduce_labels,
...     )

...     per_category_accuracy = metrics.pop("per_category_accuracy").tolist()
...     per_category_iou = metrics.pop("per_category_iou").tolist()

...     metrics.update({f"accuracy_{id2label[i]}": v for i, v in enumerate(per_category_accuracy)})
...     metrics.update({f"iou_{id2label[i]}": v for i, v in enumerate(per_category_iou)})
...     return {"val_" + k: v for k, v in metrics.items()}

您的compute_metrics函数现在已经准备就绪，当您设置训练时会再次用到它。

训练

PytorchHide Pytorch 内容

如果您不熟悉如何使用 Trainer 对模型进行微调，请查看这里的基本教程[../training#finetune-with-trainer]！

您现在已经准备好开始训练您的模型了！使用 AutoModelForSemanticSegmentation 加载 SegFormer，并将模型传递给标签 id 和标签类之间的映射：

>>> from transformers import AutoModelForSemanticSegmentation, TrainingArguments, Trainer

>>> model = AutoModelForSemanticSegmentation.from_pretrained(checkpoint, id2label=id2label, label2id=label2id)

目前只剩下三个步骤：

在 TrainingArguments 中定义您的训练超参数。重要的是不要删除未使用的列，因为这会删除image列。没有image列，您就无法创建pixel_values。设置remove_unused_columns=False以防止这种行为！另一个必需的参数是output_dir，指定保存模型的位置。通过设置push_to_hub=True将此模型推送到 Hub（您需要登录 Hugging Face 才能上传您的模型）。在每个 epoch 结束时，Trainer 将评估 IoU 度量标准并保存训练检查点。
将训练参数传递给 Trainer，同时还需要传递模型、数据集、分词器、数据整理器和compute_metrics函数。
调用 train()来微调您的模型。

>>> training_args = TrainingArguments(
...     output_dir="segformer-b0-scene-parse-150",
...     learning_rate=6e-5,
...     num_train_epochs=50,
...     per_device_train_batch_size=2,
...     per_device_eval_batch_size=2,
...     save_total_limit=3,
...     evaluation_strategy="steps",
...     save_strategy="steps",
...     save_steps=20,
...     eval_steps=20,
...     logging_steps=1,
...     eval_accumulation_steps=5,
...     remove_unused_columns=False,
...     push_to_hub=True,
... )

>>> trainer = Trainer(
...     model=model,
...     args=training_args,
...     train_dataset=train_ds,
...     eval_dataset=test_ds,
...     compute_metrics=compute_metrics,
... )

>>> trainer.train()

训练完成后，使用 push_to_hub()方法将您的模型共享到 Hub，这样每个人都可以使用您的模型：

>>> trainer.push_to_hub()

TensorFlowHide TensorFlow 内容

如果您不熟悉使用 Keras 进行模型微调，请先查看基本教程！

要在 TensorFlow 中微调模型，请按照以下步骤进行：

定义训练超参数，并设置优化器和学习率调度。
实例化一个预训练模型。
将一个🤗数据集转换为tf.data.Dataset。
编译您的模型。
添加回调以计算指标并将您的模型上传到🤗 Hub
使用fit()方法运行训练。

首先定义超参数、优化器和学习率调度：

>>> from transformers import create_optimizer

>>> batch_size = 2
>>> num_epochs = 50
>>> num_train_steps = len(train_ds) * num_epochs
>>> learning_rate = 6e-5
>>> weight_decay_rate = 0.01

>>> optimizer, lr_schedule = create_optimizer(
...     init_lr=learning_rate,
...     num_train_steps=num_train_steps,
...     weight_decay_rate=weight_decay_rate,
...     num_warmup_steps=0,
... )

然后，使用 TFAutoModelForSemanticSegmentation 加载 SegFormer 以及标签映射，并使用优化器对其进行编译。请注意，Transformers 模型都有一个默认的与任务相关的损失函数，因此除非您想要指定一个，否则不需要指定：

>>> from transformers import TFAutoModelForSemanticSegmentation

>>> model = TFAutoModelForSemanticSegmentation.from_pretrained(
...     checkpoint,
...     id2label=id2label,
...     label2id=label2id,
... )
>>> model.compile(optimizer=optimizer)  # No loss argument!

使用to_tf_dataset和 DefaultDataCollator 将您的数据集转换为tf.data.Dataset格式：

>>> from transformers import DefaultDataCollator

>>> data_collator = DefaultDataCollator(return_tensors="tf")

>>> tf_train_dataset = train_ds.to_tf_dataset(
...     columns=["pixel_values", "label"],
...     shuffle=True,
...     batch_size=batch_size,
...     collate_fn=data_collator,
... )

>>> tf_eval_dataset = test_ds.to_tf_dataset(
...     columns=["pixel_values", "label"],
...     shuffle=True,
...     batch_size=batch_size,
...     collate_fn=data_collator,
... )

要从预测中计算准确率并将您的模型推送到🤗 Hub，请使用 Keras 回调。将您的compute_metrics函数传递给 KerasMetricCallback，并使用 PushToHubCallback 来上传模型：

>>> from transformers.keras_callbacks import KerasMetricCallback, PushToHubCallback

>>> metric_callback = KerasMetricCallback(
...     metric_fn=compute_metrics, eval_dataset=tf_eval_dataset, batch_size=batch_size, label_cols=["labels"]
... )

>>> push_to_hub_callback = PushToHubCallback(output_dir="scene_segmentation", tokenizer=image_processor)

>>> callbacks = [metric_callback, push_to_hub_callback]

最后，您已经准备好训练您的模型了！使用您的训练和验证数据集、时代数量和回调来调用fit()来微调模型：

>>> model.fit(
...     tf_train_dataset,
...     validation_data=tf_eval_dataset,
...     callbacks=callbacks,
...     epochs=num_epochs,
... )

恭喜！您已经对模型进行了微调并在🤗 Hub 上分享了它。现在您可以用它进行推理！

推理

很好，现在您已经对模型进行了微调，可以用它进行推理！

加载一张图片进行推理：

>>> image = ds[0]["image"]
>>> image

Pytorch 隐藏 Pytorch 内容

现在我们将看到如何在没有管道的情况下进行推理。使用图像处理器处理图像，并将pixel_values放在 GPU 上：

>>> device = torch.device("cuda" if torch.cuda.is_available() else "cpu")  # use GPU if available, otherwise use a CPU
>>> encoding = image_processor(image, return_tensors="pt")
>>> pixel_values = encoding.pixel_values.to(device)

将输入传递给模型并返回logits：

>>> outputs = model(pixel_values=pixel_values)
>>> logits = outputs.logits.cpu()

接下来，将 logits 重新缩放到原始图像大小：

>>> upsampled_logits = nn.functional.interpolate(
...     logits,
...     size=image.size[::-1],
...     mode="bilinear",
...     align_corners=False,
... )

>>> pred_seg = upsampled_logits.argmax(dim=1)[0]

TensorFlow 隐藏 TensorFlow 内容

加载一个图像处理器来预处理图像并将输入返回为 TensorFlow 张量：

>>> from transformers import AutoImageProcessor

>>> image_processor = AutoImageProcessor.from_pretrained("MariaK/scene_segmentation")
>>> inputs = image_processor(image, return_tensors="tf")

将输入传递给模型并返回logits：

>>> from transformers import TFAutoModelForSemanticSegmentation

>>> model = TFAutoModelForSemanticSegmentation.from_pretrained("MariaK/scene_segmentation")
>>> logits = model(**inputs).logits

接下来，将 logits 重新缩放到原始图像大小，并在类维度上应用 argmax：

>>> logits = tf.transpose(logits, [0, 2, 3, 1])

>>> upsampled_logits = tf.image.resize(
...     logits,
...     # We reverse the shape of `image` because `image.size` returns width and height.
...     image.size[::-1],
... )

>>> pred_seg = tf.math.argmax(upsampled_logits, axis=-1)[0]

要可视化结果，加载数据集颜色调色板作为ade_palette()，将每个类映射到它们的 RGB 值。然后您可以组合并绘制您的图像和预测的分割地图：

>>> import matplotlib.pyplot as plt
>>> import numpy as np

>>> color_seg = np.zeros((pred_seg.shape[0], pred_seg.shape[1], 3), dtype=np.uint8)
>>> palette = np.array(ade_palette())
>>> for label, color in enumerate(palette):
...     color_seg[pred_seg == label, :] = color
>>> color_seg = color_seg[..., ::-1]  # convert to BGR

>>> img = np.array(image) * 0.5 + color_seg * 0.5  # plot the image with the segmentation map
>>> img = img.astype(np.uint8)

>>> plt.figure(figsize=(15, 10))
>>> plt.imshow(img)
>>> plt.show()

视频分类

原始文本：huggingface.co/docs/transformers/v4.37.2/en/tasks/video_classification

视频分类是将标签或类别分配给整个视频的任务。预期每个视频只有一个类别。视频分类模型将视频作为输入，并返回关于视频属于哪个类别的预测。这些模型可用于对视频内容进行分类。视频分类的现实应用是动作/活动识别，对于健身应用非常有用。对于视力受损的个体，尤其是在通勤时，这也是有帮助的。

本指南将向您展示如何：

在UCF101数据集的子集上对VideoMAE进行微调。
使用您微调的模型进行推断。

本教程中所示的任务由以下模型架构支持：

TimeSformer, VideoMAE, ViViT

在开始之前，请确保您已安装所有必要的库：

pip install -q pytorchvideo transformers evaluate

您将使用PyTorchVideo（称为pytorchvideo）来处理和准备视频。

我们鼓励您登录您的 Hugging Face 帐户，这样您就可以上传和与社区分享您的模型。提示时，请输入您的令牌以登录：

>>> from huggingface_hub import notebook_login

>>> notebook_login()

加载 UCF101 数据集

首先加载UCF-101 数据集的子集。这将让您有机会进行实验，并确保一切正常，然后再花更多时间在完整数据集上进行训练。

>>> from huggingface_hub import hf_hub_download

>>> hf_dataset_identifier = "sayakpaul/ucf101-subset"
>>> filename = "UCF101_subset.tar.gz"
>>> file_path = hf_hub_download(repo_id=hf_dataset_identifier, filename=filename, repo_type="dataset")

在下载子集后，您需要提取压缩存档：

>>> import tarfile

>>> with tarfile.open(file_path) as t:
...      t.extractall(".")

在高层次上，数据集的组织方式如下：

UCF101_subset/
    train/
        BandMarching/
            video_1.mp4
            video_2.mp4
            ...
        Archery
            video_1.mp4
            video_2.mp4
            ...
        ...
    val/
        BandMarching/
            video_1.mp4
            video_2.mp4
            ...
        Archery
            video_1.mp4
            video_2.mp4
            ...
        ...
    test/
        BandMarching/
            video_1.mp4
            video_2.mp4
            ...
        Archery
            video_1.mp4
            video_2.mp4
            ...
        ...

（排序后的）视频路径看起来像这样：

...
'UCF101_subset/train/ApplyEyeMakeup/v_ApplyEyeMakeup_g07_c04.avi',
'UCF101_subset/train/ApplyEyeMakeup/v_ApplyEyeMakeup_g07_c06.avi',
'UCF101_subset/train/ApplyEyeMakeup/v_ApplyEyeMakeup_g08_c01.avi',
'UCF101_subset/train/ApplyEyeMakeup/v_ApplyEyeMakeup_g09_c02.avi',
'UCF101_subset/train/ApplyEyeMakeup/v_ApplyEyeMakeup_g09_c06.avi'
...

您会注意到有属于同一组/场景的视频片段，其中组在视频文件路径中用g表示。例如，v_ApplyEyeMakeup_g07_c04.avi和v_ApplyEyeMakeup_g07_c06.avi。

对于验证和评估拆分，您不希望从同一组/场景中获取视频片段，以防止数据泄漏。本教程中使用的子集考虑了这些信息。

接下来，您将推导数据集中存在的标签集。还要创建两个在初始化模型时有用的字典：

label2id：将类名映射到整数。
id2label：将整数映射到类名。

>>> class_labels = sorted({str(path).split("/")[2] for path in all_video_file_paths})
>>> label2id = {label: i for i, label in enumerate(class_labels)}
>>> id2label = {i: label for label, i in label2id.items()}

>>> print(f"Unique classes: {list(label2id.keys())}.")

# Unique classes: ['ApplyEyeMakeup', 'ApplyLipstick', 'Archery', 'BabyCrawling', 'BalanceBeam', 'BandMarching', 'BaseballPitch', 'Basketball', 'BasketballDunk', 'BenchPress'].

有 10 个独特的类别。每个类别在训练集中有 30 个视频。

加载一个模型进行微调

从预训练的检查点和其关联的图像处理器实例化一个视频分类模型。模型的编码器带有预训练参数，分类头是随机初始化的。当为我们的数据集编写预处理流水线时，图像处理器会派上用场。

>>> from transformers import VideoMAEImageProcessor, VideoMAEForVideoClassification

>>> model_ckpt = "MCG-NJU/videomae-base"
>>> image_processor = VideoMAEImageProcessor.from_pretrained(model_ckpt)
>>> model = VideoMAEForVideoClassification.from_pretrained(
...     model_ckpt,
...     label2id=label2id,
...     id2label=id2label,
...     ignore_mismatched_sizes=True,  # provide this in case you're planning to fine-tune an already fine-tuned checkpoint
... )

当模型加载时，您可能会注意到以下警告：

Some weights of the model checkpoint at MCG-NJU/videomae-base were not used when initializing VideoMAEForVideoClassification: [..., 'decoder.decoder_layers.1.attention.output.dense.bias', 'decoder.decoder_layers.2.attention.attention.key.weight']
- This IS expected if you are initializing VideoMAEForVideoClassification from the checkpoint of a model trained on another task or with another architecture (e.g. initializing a BertForSequenceClassification model from a BertForPreTraining model).
- This IS NOT expected if you are initializing VideoMAEForVideoClassification from the checkpoint of a model that you expect to be exactly identical (initializing a BertForSequenceClassification model from a BertForSequenceClassification model).
Some weights of VideoMAEForVideoClassification were not initialized from the model checkpoint at MCG-NJU/videomae-base and are newly initialized: ['classifier.bias', 'classifier.weight']
You should probably TRAIN this model on a down-stream task to be able to use it for predictions and inference.

警告告诉我们，我们正在丢弃一些权重（例如classifier层的权重和偏差），并随机初始化其他一些权重和偏差（新classifier层的权重和偏差）。在这种情况下，这是预期的，因为我们正在添加一个新的头部，我们没有预训练的权重，所以库警告我们在使用它进行推断之前应该微调这个模型，这正是我们要做的。

请注意，此检查点在此任务上表现更好，因为该检查点是在一个具有相当大领域重叠的类似下游任务上微调得到的。您可以查看此检查点，该检查点是通过微调MCG-NJU/videomae-base-finetuned-kinetics获得的。

为训练准备数据集

为了对视频进行预处理，您将利用PyTorchVideo 库。首先导入我们需要的依赖项。

>>> import pytorchvideo.data

>>> from pytorchvideo.transforms import (
...     ApplyTransformToKey,
...     Normalize,
...     RandomShortSideScale,
...     RemoveKey,
...     ShortSideScale,
...     UniformTemporalSubsample,
... )

>>> from torchvision.transforms import (
...     Compose,
...     Lambda,
...     RandomCrop,
...     RandomHorizontalFlip,
...     Resize,
... )

对于训练数据集的转换，使用统一的时间子采样、像素归一化、随机裁剪和随机水平翻转的组合。对于验证和评估数据集的转换，保持相同的转换链，除了随机裁剪和水平翻转。要了解这些转换的详细信息，请查看PyTorchVideo 的官方文档。

使用与预训练模型相关联的image_processor来获取以下信息：

用于归一化视频帧像素的图像均值和标准差。
将视频帧调整为的空间分辨率。

首先定义一些常量。

>>> mean = image_processor.image_mean
>>> std = image_processor.image_std
>>> if "shortest_edge" in image_processor.size:
...     height = width = image_processor.size["shortest_edge"]
>>> else:
...     height = image_processor.size["height"]
...     width = image_processor.size["width"]
>>> resize_to = (height, width)

>>> num_frames_to_sample = model.config.num_frames
>>> sample_rate = 4
>>> fps = 30
>>> clip_duration = num_frames_to_sample * sample_rate / fps

现在，分别定义数据集特定的转换和数据集。从训练集开始：

>>> train_transform = Compose(
...     [
...         ApplyTransformToKey(
...             key="video",
...             transform=Compose(
...                 [
...                     UniformTemporalSubsample(num_frames_to_sample),
...                     Lambda(lambda x: x / 255.0),
...                     Normalize(mean, std),
...                     RandomShortSideScale(min_size=256, max_size=320),
...                     RandomCrop(resize_to),
...                     RandomHorizontalFlip(p=0.5),
...                 ]
...             ),
...         ),
...     ]
... )

>>> train_dataset = pytorchvideo.data.Ucf101(
...     data_path=os.path.join(dataset_root_path, "train"),
...     clip_sampler=pytorchvideo.data.make_clip_sampler("random", clip_duration),
...     decode_audio=False,
...     transform=train_transform,
... )

相同的工作流程顺序可以应用于验证集和评估集：

>>> val_transform = Compose(
...     [
...         ApplyTransformToKey(
...             key="video",
...             transform=Compose(
...                 [
...                     UniformTemporalSubsample(num_frames_to_sample),
...                     Lambda(lambda x: x / 255.0),
...                     Normalize(mean, std),
...                     Resize(resize_to),
...                 ]
...             ),
...         ),
...     ]
... )

>>> val_dataset = pytorchvideo.data.Ucf101(
...     data_path=os.path.join(dataset_root_path, "val"),
...     clip_sampler=pytorchvideo.data.make_clip_sampler("uniform", clip_duration),
...     decode_audio=False,
...     transform=val_transform,
... )

>>> test_dataset = pytorchvideo.data.Ucf101(
...     data_path=os.path.join(dataset_root_path, "test"),
...     clip_sampler=pytorchvideo.data.make_clip_sampler("uniform", clip_duration),
...     decode_audio=False,
...     transform=val_transform,
... )

注意：上述数据集管道取自官方 PyTorchVideo 示例。我们使用pytorchvideo.data.Ucf101()函数，因为它专为 UCF-101 数据集定制。在内部，它返回一个pytorchvideo.data.labeled_video_dataset.LabeledVideoDataset对象。LabeledVideoDataset类是 PyTorchVideo 数据集中所有视频相关内容的基类。因此，如果您想使用 PyTorchVideo 不支持的自定义数据集，可以相应地扩展LabeledVideoDataset类。请参考data API 文档以了解更多。此外，如果您的数据集遵循类似的结构（如上所示），那么使用pytorchvideo.data.Ucf101()应该可以正常工作。

您可以访问num_videos参数以了解数据集中的视频数量。

>>> print(train_dataset.num_videos, val_dataset.num_videos, test_dataset.num_videos)
# (300, 30, 75)

可视化预处理后的视频以进行更好的调试

>>> import imageio
>>> import numpy as np
>>> from IPython.display import Image

>>> def unnormalize_img(img):
...     """Un-normalizes the image pixels."""
...     img = (img * std) + mean
...     img = (img * 255).astype("uint8")
...     return img.clip(0, 255)

>>> def create_gif(video_tensor, filename="sample.gif"):
...     """Prepares a GIF from a video tensor.
...     
...     The video tensor is expected to have the following shape:
...     (num_frames, num_channels, height, width).
...     """
...     frames = []
...     for video_frame in video_tensor:
...         frame_unnormalized = unnormalize_img(video_frame.permute(1, 2, 0).numpy())
...         frames.append(frame_unnormalized)
...     kargs = {"duration": 0.25}
...     imageio.mimsave(filename, frames, "GIF", **kargs)
...     return filename

>>> def display_gif(video_tensor, gif_name="sample.gif"):
...     """Prepares and displays a GIF from a video tensor."""
...     video_tensor = video_tensor.permute(1, 0, 2, 3)
...     gif_filename = create_gif(video_tensor, gif_name)
...     return Image(filename=gif_filename)

>>> sample_video = next(iter(train_dataset))
>>> video_tensor = sample_video["video"]
>>> display_gif(video_tensor)

训练模型

利用🤗 Transformers 中的Trainer来训练模型。要实例化一个Trainer，您需要定义训练配置和一个评估指标。最重要的是TrainingArguments，这是一个包含所有属性以配置训练的类。它需要一个输出文件夹名称，用于保存模型的检查点。它还有助于将模型存储库中的所有信息同步到🤗 Hub 中。

大多数训练参数都是不言自明的，但这里有一个非常重要的参数是remove_unused_columns=False。这个参数将删除模型调用函数未使用的任何特征。默认情况下是True，因为通常最好删除未使用的特征列，这样更容易将输入解压缩到模型的调用函数中。但是，在这种情况下，您需要未使用的特征（特别是‘video’）以便创建pixel_values（这是我们的模型在输入中期望的一个必需键）。

>>> from transformers import TrainingArguments, Trainer

>>> model_name = model_ckpt.split("/")[-1]
>>> new_model_name = f"{model_name}-finetuned-ucf101-subset"
>>> num_epochs = 4

>>> args = TrainingArguments(
...     new_model_name,
...     remove_unused_columns=False,
...     evaluation_strategy="epoch",
...     save_strategy="epoch",
...     learning_rate=5e-5,
...     per_device_train_batch_size=batch_size,
...     per_device_eval_batch_size=batch_size,
...     warmup_ratio=0.1,
...     logging_steps=10,
...     load_best_model_at_end=True,
...     metric_for_best_model="accuracy",
...     push_to_hub=True,
...     max_steps=(train_dataset.num_videos // batch_size) * num_epochs,
... )

pytorchvideo.data.Ucf101()返回的数据集没有实现__len__方法。因此，在实例化TrainingArguments时，我们必须定义max_steps。

接下来，您需要定义一个函数来计算从预测中得出的指标，该函数将使用您现在将加载的metric。您唯一需要做的预处理是取出我们预测的 logits 的 argmax：

import evaluate

metric = evaluate.load("accuracy")

def compute_metrics(eval_pred):
    predictions = np.argmax(eval_pred.predictions, axis=1)
    return metric.compute(predictions=predictions, references=eval_pred.label_ids)

关于评估的说明：

在VideoMAE 论文中，作者使用以下评估策略。他们在测试视频的几个剪辑上评估模型，并对这些剪辑应用不同的裁剪，并报告聚合得分。然而，出于简单和简洁的考虑，我们在本教程中不考虑这一点。

此外，定义一个collate_fn，用于将示例批处理在一起。每个批次包括 2 个键，即pixel_values和labels。

>>> def collate_fn(examples):
...     # permute to (num_frames, num_channels, height, width)
...     pixel_values = torch.stack(
...         [example["video"].permute(1, 0, 2, 3) for example in examples]
...     )
...     labels = torch.tensor([example["label"] for example in examples])
...     return {"pixel_values": pixel_values, "labels": labels}

然后，将所有这些与数据集一起传递给Trainer：

>>> trainer = Trainer(
...     model,
...     args,
...     train_dataset=train_dataset,
...     eval_dataset=val_dataset,
...     tokenizer=image_processor,
...     compute_metrics=compute_metrics,
...     data_collator=collate_fn,
... )

您可能想知道为什么在预处理数据时将image_processor作为标记器传递。这只是为了确保图像处理器配置文件（存储为 JSON）也将上传到 Hub 上的存储库中。

现在通过调用train方法对我们的模型进行微调：

>>> train_results = trainer.train()

训练完成后，使用 push_to_hub()方法将您的模型共享到 Hub，以便每个人都可以使用您的模型：

>>> trainer.push_to_hub()

推断

很好，现在您已经对模型进行了微调，可以将其用于推断！

加载视频进行推断：

>>> sample_test_video = next(iter(test_dataset))

尝试使用您微调的模型进行推断的最简单方法是在pipeline中使用它。使用您的模型实例化一个视频分类的pipeline，并将视频传递给它：

>>> from transformers import pipeline

>>> video_cls = pipeline(model="my_awesome_video_cls_model")
>>> video_cls("https://huggingface.co/datasets/sayakpaul/ucf101-subset/resolve/main/v_BasketballDunk_g14_c06.avi")
[{'score': 0.9272987842559814, 'label': 'BasketballDunk'},
 {'score': 0.017777055501937866, 'label': 'BabyCrawling'},
 {'score': 0.01663011871278286, 'label': 'BalanceBeam'},
 {'score': 0.009560945443809032, 'label': 'BandMarching'},
 {'score': 0.0068979403004050255, 'label': 'BaseballPitch'}]

如果愿意，您也可以手动复制pipeline的结果。

>>> def run_inference(model, video):
...     # (num_frames, num_channels, height, width)
...     perumuted_sample_test_video = video.permute(1, 0, 2, 3)
...     inputs = {
...         "pixel_values": perumuted_sample_test_video.unsqueeze(0),
...         "labels": torch.tensor(
...             [sample_test_video["label"]]
...         ),  # this can be skipped if you don't have labels available.
...     }

...     device = torch.device("cuda" if torch.cuda.is_available() else "cpu")
...     inputs = {k: v.to(device) for k, v in inputs.items()}
...     model = model.to(device)

...     # forward pass
...     with torch.no_grad():
...         outputs = model(**inputs)
...         logits = outputs.logits

...     return logits

现在，将您的输入传递给模型并返回logits：

>>> logits = run_inference(trained_model, sample_test_video["video"])

解码logits，我们得到：

>>> predicted_class_idx = logits.argmax(-1).item()
>>> print("Predicted class:", model.config.id2label[predicted_class_idx])
# Predicted class: BasketballDunk

目标检测

原始文本：huggingface.co/docs/transformers/v4.37.2/en/tasks/object_detection

目标检测是计算机视觉任务，用于检测图像中的实例（如人类、建筑物或汽车）。目标检测模型接收图像作为输入，并输出检测到的对象的边界框的坐标和相关标签。一幅图像可以包含多个对象，每个对象都有自己的边界框和标签（例如，它可以有一辆汽车和一座建筑物），每个对象可以出现在图像的不同部分（例如，图像可以有几辆汽车）。这个任务通常用于自动驾驶，用于检测行人、道路标志和交通灯等。其他应用包括在图像中计数对象、图像搜索等。

在本指南中，您将学习如何：

对DETR进行微调，这是一个将卷积主干与编码器-解码器 Transformer 结合的模型，在CPPE-5数据集上进行训练。
使用您微调的模型进行推断。

本教程中所示的任务由以下模型架构支持：

条件 DETR, 可变 DETR, DETA, DETR, 表格 Transformer, YOLOS

在开始之前，请确保已安装所有必要的库：

pip install -q datasets transformers evaluate timm albumentations

您将使用🤗数据集从 Hugging Face Hub 加载数据集，🤗转换器来训练您的模型，并使用albumentations来增强数据。目前需要使用timm来加载 DETR 模型的卷积主干。

我们鼓励您与社区分享您的模型。登录到您的 Hugging Face 帐户并将其上传到 Hub。在提示时，输入您的令牌以登录：

>>> from huggingface_hub import notebook_login

>>> notebook_login()

加载 CPPE-5 数据集

CPPE-5 数据集包含带有注释的图像，用于识别 COVID-19 大流行背景下的医疗个人防护装备（PPE）。

首先加载数据集：

>>> from datasets import load_dataset

>>> cppe5 = load_dataset("cppe-5")
>>> cppe5
DatasetDict({
    train: Dataset({
        features: ['image_id', 'image', 'width', 'height', 'objects'],
        num_rows: 1000
    })
    test: Dataset({
        features: ['image_id', 'image', 'width', 'height', 'objects'],
        num_rows: 29
    })
})

您将看到这个数据集已经带有一个包含 1000 张图像的训练集和一个包含 29 张图像的测试集。

熟悉数据，探索示例的外观。

>>> cppe5["train"][0]
{'image_id': 15,
 'image': <PIL.JpegImagePlugin.JpegImageFile image mode=RGB size=943x663 at 0x7F9EC9E77C10>,
 'width': 943,
 'height': 663,
 'objects': {'id': [114, 115, 116, 117],
  'area': [3796, 1596, 152768, 81002],
  'bbox': [[302.0, 109.0, 73.0, 52.0],
   [810.0, 100.0, 57.0, 28.0],
   [160.0, 31.0, 248.0, 616.0],
   [741.0, 68.0, 202.0, 401.0]],
  'category': [4, 4, 0, 0]}}

数据集中的示例具有以下字段：

image_id：示例图像 id
image：包含图像的PIL.Image.Image对象
width：图像的宽度
height：图像的高度
objects：包含图像中对象的边界框元数据的字典：
- id：注释 id
- area：边界框的面积
- bbox：对象的边界框（以COCO 格式）
- category：对象的类别，可能的值包括防护服（0）、面罩（1）、手套（2）、护目镜（3）和口罩（4）

您可能会注意到bbox字段遵循 COCO 格式，这是 DETR 模型期望的格式。然而，objects内部字段的分组与 DETR 所需的注释格式不同。在使用此数据进行训练之前，您需要应用一些预处理转换。

为了更好地理解数据，可视化数据集中的一个示例。

>>> import numpy as np
>>> import os
>>> from PIL import Image, ImageDraw

>>> image = cppe5["train"][0]["image"]
>>> annotations = cppe5["train"][0]["objects"]
>>> draw = ImageDraw.Draw(image)

>>> categories = cppe5["train"].features["objects"].feature["category"].names

>>> id2label = {index: x for index, x in enumerate(categories, start=0)}
>>> label2id = {v: k for k, v in id2label.items()}

>>> for i in range(len(annotations["id"])):
...     box = annotations["bbox"][i]
...     class_idx = annotations["category"][i]
...     x, y, w, h = tuple(box)
...     # Check if coordinates are normalized or not
...     if max(box) > 1.0:
...         # Coordinates are un-normalized, no need to re-scale them
...         x1, y1 = int(x), int(y)
...         x2, y2 = int(x + w), int(y + h)
...     else:
...         # Coordinates are normalized, re-scale them
...         x1 = int(x * width)
...         y1 = int(y * height)
...         x2 = int((x + w) * width)
...         y2 = int((y + h) * height)
...     draw.rectangle((x, y, x + w, y + h), outline="red", width=1)
...     draw.text((x, y), id2label[class_idx], fill="white")

>>> image

要可视化带有关联标签的边界框，您可以从数据集的元数据中获取标签，特别是category字段。您还需要创建映射标签 id 到标签类别（id2label）以及反向映射（label2id）的字典。在设置模型时，您可以稍后使用它们。包括这些映射将使您的模型在 Hugging Face Hub 上共享时可以被其他人重复使用。请注意，上述代码中绘制边界框的部分假定它是以XYWH（x，y 坐标和框的宽度和高度）格式。对于其他格式如(x1，y1，x2，y2)可能无法正常工作。

作为熟悉数据的最后一步，探索可能存在的问题。目标检测数据集的一个常见问题是边界框“拉伸”到图像边缘之外。这种“失控”的边界框可能会在训练过程中引发错误，应在此阶段加以解决。在这个数据集中有一些示例存在这个问题。为了简化本指南中的操作，我们将这些图像从数据中删除。

>>> remove_idx = [590, 821, 822, 875, 876, 878, 879]
>>> keep = [i for i in range(len(cppe5["train"])) if i not in remove_idx]
>>> cppe5["train"] = cppe5["train"].select(keep)

预处理数据

要微调模型，您必须预处理您计划使用的数据，以精确匹配预训练模型使用的方法。AutoImageProcessor 负责处理图像数据以创建pixel_values，pixel_mask和labels，供 DETR 模型训练。图像处理器具有一些属性，您无需担心：

image_mean = [0.485, 0.456, 0.406 ]
image_std = [0.229, 0.224, 0.225]

这些是用于在模型预训练期间对图像进行归一化的均值和标准差。在进行推理或微调预训练图像模型时，这些值至关重要。

从要微调的模型相同的检查点实例化图像处理器。

>>> from transformers import AutoImageProcessor

>>> checkpoint = "facebook/detr-resnet-50"
>>> image_processor = AutoImageProcessor.from_pretrained(checkpoint)

在将图像传递给image_processor之前，对数据集应用两个预处理转换：

增强图像
重新格式化注释以满足 DETR 的期望

首先，为了确保模型不会在训练数据上过拟合，您可以使用任何数据增强库进行图像增强。这里我们使用Albumentations...此库确保转换影响图像并相应更新边界框。🤗数据集库文档有一个详细的关于如何为目标检测增强图像的指南，它使用相同的数据集作为示例。在这里应用相同的方法，将每个图像调整为(480, 480)，水平翻转并增加亮度：

>>> import albumentations
>>> import numpy as np
>>> import torch

>>> transform = albumentations.Compose(
...     [
...         albumentations.Resize(480, 480),
...         albumentations.HorizontalFlip(p=1.0),
...         albumentations.RandomBrightnessContrast(p=1.0),
...     ],
...     bbox_params=albumentations.BboxParams(format="coco", label_fields=["category"]),
... )

image_processor期望注释采用以下格式：{'image_id': int, 'annotations': List[Dict]}，其中每个字典是一个 COCO 对象注释。让我们添加一个函数来为单个示例重新格式化注释：

>>> def formatted_anns(image_id, category, area, bbox):
...     annotations = []
...     for i in range(0, len(category)):
...         new_ann = {
...             "image_id": image_id,
...             "category_id": category[i],
...             "isCrowd": 0,
...             "area": area[i],
...             "bbox": list(bbox[i]),
...         }
...         annotations.append(new_ann)

...     return annotations

现在，您可以将图像和注释转换组合在一起，用于一批示例：

>>> # transforming a batch
>>> def transform_aug_ann(examples):
...     image_ids = examples["image_id"]
...     images, bboxes, area, categories = [], [], [], []
...     for image, objects in zip(examples["image"], examples["objects"]):
...         image = np.array(image.convert("RGB"))[:, :, ::-1]
...         out = transform(image=image, bboxes=objects["bbox"], category=objects["category"])

...         area.append(objects["area"])
...         images.append(out["image"])
...         bboxes.append(out["bboxes"])
...         categories.append(out["category"])

...     targets = [
...         {"image_id": id_, "annotations": formatted_anns(id_, cat_, ar_, box_)}
...         for id_, cat_, ar_, box_ in zip(image_ids, categories, area, bboxes)
...     ]

...     return image_processor(images=images, annotations=targets, return_tensors="pt")

使用🤗数据集的with_transform方法将此预处理函数应用于整个数据集。此方法在加载数据集元素时动态应用转换。

此时，您可以检查数据集经过转换后的示例是什么样子。您应该看到一个带有pixel_values的张量，一个带有pixel_mask的张量和labels。

>>> cppe5["train"] = cppe5["train"].with_transform(transform_aug_ann)
>>> cppe5["train"][15]
{'pixel_values': tensor([[[ 0.9132,  0.9132,  0.9132,  ..., -1.9809, -1.9809, -1.9809],
          [ 0.9132,  0.9132,  0.9132,  ..., -1.9809, -1.9809, -1.9809],
          [ 0.9132,  0.9132,  0.9132,  ..., -1.9638, -1.9638, -1.9638],
          ...,
          [-1.5699, -1.5699, -1.5699,  ..., -1.9980, -1.9980, -1.9980],
          [-1.5528, -1.5528, -1.5528,  ..., -1.9980, -1.9809, -1.9809],
          [-1.5528, -1.5528, -1.5528,  ..., -1.9980, -1.9809, -1.9809]],

         [[ 1.3081,  1.3081,  1.3081,  ..., -1.8431, -1.8431, -1.8431],
          [ 1.3081,  1.3081,  1.3081,  ..., -1.8431, -1.8431, -1.8431],
          [ 1.3081,  1.3081,  1.3081,  ..., -1.8256, -1.8256, -1.8256],
          ...,
          [-1.3179, -1.3179, -1.3179,  ..., -1.8606, -1.8606, -1.8606],
          [-1.3004, -1.3004, -1.3004,  ..., -1.8606, -1.8431, -1.8431],
          [-1.3004, -1.3004, -1.3004,  ..., -1.8606, -1.8431, -1.8431]],

         [[ 1.4200,  1.4200,  1.4200,  ..., -1.6476, -1.6476, -1.6476],
          [ 1.4200,  1.4200,  1.4200,  ..., -1.6476, -1.6476, -1.6476],
          [ 1.4200,  1.4200,  1.4200,  ..., -1.6302, -1.6302, -1.6302],
          ...,
          [-1.0201, -1.0201, -1.0201,  ..., -1.5604, -1.5604, -1.5604],
          [-1.0027, -1.0027, -1.0027,  ..., -1.5604, -1.5430, -1.5430],
          [-1.0027, -1.0027, -1.0027,  ..., -1.5604, -1.5430, -1.5430]]]),
 'pixel_mask': tensor([[1, 1, 1,  ..., 1, 1, 1],
         [1, 1, 1,  ..., 1, 1, 1],
         [1, 1, 1,  ..., 1, 1, 1],
         ...,
         [1, 1, 1,  ..., 1, 1, 1],
         [1, 1, 1,  ..., 1, 1, 1],
         [1, 1, 1,  ..., 1, 1, 1]]),
 'labels': {'size': tensor([800, 800]), 'image_id': tensor([756]), 'class_labels': tensor([4]), 'boxes': tensor([[0.7340, 0.6986, 0.3414, 0.5944]]), 'area': tensor([519544.4375]), 'iscrowd': tensor([0]), 'orig_size': tensor([480, 480])}}

您已成功增强了单个图像并准备好它们的注释。然而，预处理还没有完成。在最后一步中，创建一个自定义的collate_fn来将图像批量处理在一起。将图像（现在是pixel_values）填充到批次中最大的图像，并创建一个相应的pixel_mask来指示哪些像素是真实的（1），哪些是填充的（0）。

>>> def collate_fn(batch):
...     pixel_values = [item["pixel_values"] for item in batch]
...     encoding = image_processor.pad(pixel_values, return_tensors="pt")
...     labels = [item["labels"] for item in batch]
...     batch = {}
...     batch["pixel_values"] = encoding["pixel_values"]
...     batch["pixel_mask"] = encoding["pixel_mask"]
...     batch["labels"] = labels
...     return batch

训练 DETR 模型

在前几节中，您已经完成了大部分繁重的工作，现在您已经准备好训练您的模型了！即使在调整大小后，此数据集中的图像仍然相当大。这意味着微调此模型将需要至少一个 GPU。

训练包括以下步骤：

使用与预处理中相同的检查点加载模型 AutoModelForObjectDetection。
在 TrainingArguments 中定义您的训练超参数。
将训练参数传递给 Trainer，以及模型、数据集、图像处理器和数据整理器。
调用 train()来微调您的模型。

在从用于预处理的相同检查点加载模型时，请记住传递您从数据集元数据中创建的label2id和id2label映射。此外，我们指定ignore_mismatched_sizes=True以用新的替换现有的分类头。

>>> from transformers import AutoModelForObjectDetection

>>> model = AutoModelForObjectDetection.from_pretrained(
...     checkpoint,
...     id2label=id2label,
...     label2id=label2id,
...     ignore_mismatched_sizes=True,
... )

在 TrainingArguments 中使用output_dir指定保存模型的位置，然后根据需要配置超参数。重要的是不要删除未使用的列，因为这将删除图像列。没有图像列，您无法创建pixel_values。因此，将remove_unused_columns设置为False。如果希望通过将其推送到 Hub 来共享您的模型，请将push_to_hub设置为True（您必须登录到 Hugging Face 才能上传您的模型）。

>>> from transformers import TrainingArguments

>>> training_args = TrainingArguments(
...     output_dir="detr-resnet-50_finetuned_cppe5",
...     per_device_train_batch_size=8,
...     num_train_epochs=10,
...     fp16=True,
...     save_steps=200,
...     logging_steps=50,
...     learning_rate=1e-5,
...     weight_decay=1e-4,
...     save_total_limit=2,
...     remove_unused_columns=False,
...     push_to_hub=True,
... )

最后，将所有内容汇总，并调用 train()：

>>> from transformers import Trainer

>>> trainer = Trainer(
...     model=model,
...     args=training_args,
...     data_collator=collate_fn,
...     train_dataset=cppe5["train"],
...     tokenizer=image_processor,
... )

>>> trainer.train()

如果在training_args中将push_to_hub设置为True，则训练检查点将被推送到 Hugging Face Hub。在训练完成后，通过调用 push_to_hub()方法将最终模型也推送到 Hub。

>>> trainer.push_to_hub()

评估

目标检测模型通常使用一组COCO 风格指标进行评估。您可以使用现有的指标实现之一，但在这里，您将使用来自torchvision的指标来评估推送到 Hub 的最终模型。

要使用torchvision评估器，您需要准备一个真实的 COCO 数据集。构建 COCO 数据集的 API 要求数据以特定格式存储，因此您需要首先将图像和注释保存到磁盘上。就像您为训练准备数据时一样，来自cppe5["test"]的注释需要进行格式化。但是，图像应保持原样。

评估步骤需要一些工作，但可以分为三个主要步骤。首先，准备cppe5["test"]集：格式化注释并将数据保存到磁盘上。

>>> import json

>>> # format annotations the same as for training, no need for data augmentation
>>> def val_formatted_anns(image_id, objects):
...     annotations = []
...     for i in range(0, len(objects["id"])):
...         new_ann = {
...             "id": objects["id"][i],
...             "category_id": objects["category"][i],
...             "iscrowd": 0,
...             "image_id": image_id,
...             "area": objects["area"][i],
...             "bbox": objects["bbox"][i],
...         }
...         annotations.append(new_ann)

...     return annotations

>>> # Save images and annotations into the files torchvision.datasets.CocoDetection expects
>>> def save_cppe5_annotation_file_images(cppe5):
...     output_json = {}
...     path_output_cppe5 = f"{os.getcwd()}/cppe5/"

...     if not os.path.exists(path_output_cppe5):
...         os.makedirs(path_output_cppe5)

...     path_anno = os.path.join(path_output_cppe5, "cppe5_ann.json")
...     categories_json = [{"supercategory": "none", "id": id, "name": id2label[id]} for id in id2label]
...     output_json["images"] = []
...     output_json["annotations"] = []
...     for example in cppe5:
...         ann = val_formatted_anns(example["image_id"], example["objects"])
...         output_json["images"].append(
...             {
...                 "id": example["image_id"],
...                 "width": example["image"].width,
...                 "height": example["image"].height,
...                 "file_name": f"{example['image_id']}.png",
...             }
...         )
...         output_json["annotations"].extend(ann)
...     output_json["categories"] = categories_json

...     with open(path_anno, "w") as file:
...         json.dump(output_json, file, ensure_ascii=False, indent=4)

...     for im, img_id in zip(cppe5["image"], cppe5["image_id"]):
...         path_img = os.path.join(path_output_cppe5, f"{img_id}.png")
...         im.save(path_img)

...     return path_output_cppe5, path_anno

接下来，准备一个可以与cocoevaluator一起使用的CocoDetection类的实例。

>>> import torchvision

>>> class CocoDetection(torchvision.datasets.CocoDetection):
...     def __init__(self, img_folder, image_processor, ann_file):
...         super().__init__(img_folder, ann_file)
...         self.image_processor = image_processor

...     def __getitem__(self, idx):
...         # read in PIL image and target in COCO format
...         img, target = super(CocoDetection, self).__getitem__(idx)

...         # preprocess image and target: converting target to DETR format,
...         # resizing + normalization of both image and target)
...         image_id = self.ids[idx]
...         target = {"image_id": image_id, "annotations": target}
...         encoding = self.image_processor(images=img, annotations=target, return_tensors="pt")
...         pixel_values = encoding["pixel_values"].squeeze()  # remove batch dimension
...         target = encoding["labels"][0]  # remove batch dimension

...         return {"pixel_values": pixel_values, "labels": target}

>>> im_processor = AutoImageProcessor.from_pretrained("devonho/detr-resnet-50_finetuned_cppe5")

>>> path_output_cppe5, path_anno = save_cppe5_annotation_file_images(cppe5["test"])
>>> test_ds_coco_format = CocoDetection(path_output_cppe5, im_processor, path_anno)

最后，加载指标并运行评估。

>>> import evaluate
>>> from tqdm import tqdm

>>> model = AutoModelForObjectDetection.from_pretrained("devonho/detr-resnet-50_finetuned_cppe5")
>>> module = evaluate.load("ybelkada/cocoevaluate", coco=test_ds_coco_format.coco)
>>> val_dataloader = torch.utils.data.DataLoader(
...     test_ds_coco_format, batch_size=8, shuffle=False, num_workers=4, collate_fn=collate_fn
... )

>>> with torch.no_grad():
...     for idx, batch in enumerate(tqdm(val_dataloader)):
...         pixel_values = batch["pixel_values"]
...         pixel_mask = batch["pixel_mask"]

...         labels = [
...             {k: v for k, v in t.items()} for t in batch["labels"]
...         ]  # these are in DETR format, resized + normalized

...         # forward pass
...         outputs = model(pixel_values=pixel_values, pixel_mask=pixel_mask)

...         orig_target_sizes = torch.stack([target["orig_size"] for target in labels], dim=0)
...         results = im_processor.post_process(outputs, orig_target_sizes)  # convert outputs of model to Pascal VOC format (xmin, ymin, xmax, ymax)

...         module.add(prediction=results, reference=labels)
...         del batch

>>> results = module.compute()
>>> print(results)
Accumulating evaluation results...
DONE (t=0.08s).
IoU metric: bbox
 Average Precision  (AP) @[ IoU=0.50:0.95 | area=   all | maxDets=100 ] = 0.352
 Average Precision  (AP) @[ IoU=0.50      | area=   all | maxDets=100 ] = 0.681
 Average Precision  (AP) @[ IoU=0.75      | area=   all | maxDets=100 ] = 0.292
 Average Precision  (AP) @[ IoU=0.50:0.95 | area= small | maxDets=100 ] = 0.168
 Average Precision  (AP) @[ IoU=0.50:0.95 | area=medium | maxDets=100 ] = 0.208
 Average Precision  (AP) @[ IoU=0.50:0.95 | area= large | maxDets=100 ] = 0.429
 Average Recall     (AR) @[ IoU=0.50:0.95 | area=   all | maxDets=  1 ] = 0.274
 Average Recall     (AR) @[ IoU=0.50:0.95 | area=   all | maxDets= 10 ] = 0.484
 Average Recall     (AR) @[ IoU=0.50:0.95 | area=   all | maxDets=100 ] = 0.501
 Average Recall     (AR) @[ IoU=0.50:0.95 | area= small | maxDets=100 ] = 0.191
 Average Recall     (AR) @[ IoU=0.50:0.95 | area=medium | maxDets=100 ] = 0.323
 Average Recall     (AR) @[ IoU=0.50:0.95 | area= large | maxDets=100 ] = 0.590

通过调整 TrainingArguments 中的超参数，这些结果可以进一步改善。试一试吧！

推断

现在您已经微调了一个 DETR 模型，对其进行了评估，并将其上传到 Hugging Face Hub，您可以将其用于推断。尝试使用您微调的模型进行推断的最简单方法是在 Pipeline 中使用它。使用您的模型实例化一个用于目标检测的流水线，并将图像传递给它：

>>> from transformers import pipeline
>>> import requests

>>> url = "https://i.imgur.com/2lnWoly.jpg"
>>> image = Image.open(requests.get(url, stream=True).raw)

>>> obj_detector = pipeline("object-detection", model="devonho/detr-resnet-50_finetuned_cppe5")
>>> obj_detector(image)

如果您愿意，也可以手动复制流水线的结果：

>>> image_processor = AutoImageProcessor.from_pretrained("devonho/detr-resnet-50_finetuned_cppe5")
>>> model = AutoModelForObjectDetection.from_pretrained("devonho/detr-resnet-50_finetuned_cppe5")

>>> with torch.no_grad():
...     inputs = image_processor(images=image, return_tensors="pt")
...     outputs = model(**inputs)
...     target_sizes = torch.tensor([image.size[::-1]])
...     results = image_processor.post_process_object_detection(outputs, threshold=0.5, target_sizes=target_sizes)[0]

>>> for score, label, box in zip(results["scores"], results["labels"], results["boxes"]):
...     box = [round(i, 2) for i in box.tolist()]
...     print(
...         f"Detected {model.config.id2label[label.item()]} with confidence "
...         f"{round(score.item(), 3)} at location {box}"
...     )
Detected Coverall with confidence 0.566 at location [1215.32, 147.38, 4401.81, 3227.08]
Detected Mask with confidence 0.584 at location [2449.06, 823.19, 3256.43, 1413.9]

让我们绘制结果：

>>> draw = ImageDraw.Draw(image)

>>> for score, label, box in zip(results["scores"], results["labels"], results["boxes"]):
...     box = [round(i, 2) for i in box.tolist()]
...     x, y, x2, y2 = tuple(box)
...     draw.rectangle((x, y, x2, y2), outline="red", width=1)
...     draw.text((x, y), model.config.id2label[label.item()], fill="white")

>>> image

在一张新图片上的目标检测结果

零样本目标检测

原文链接：huggingface.co/docs/transformers/v4.37.2/en/tasks/zero_shot_object_detection

传统上，用于目标检测的模型需要标记的图像数据集进行训练，并且仅限于检测训练数据集中的类别集。

零样本目标检测由使用不同方法的 OWL-ViT 模型支持。OWL-ViT 是一个开放词汇的目标检测器。这意味着它可以基于自由文本查询在图像中检测对象，而无需在标记的数据集上对模型进行微调。

OWL-ViT 利用多模态表示执行开放词汇检测。它将 CLIP 与轻量级对象分类和定位头结合起来。通过将自由文本查询嵌入到 CLIP 的文本编码器中，并将其用作对象分类和定位头的输入，实现了开放词汇检测。关联图像及其相应的文本描述，ViT 将图像块作为输入进行处理。OWL-ViT 的作者首先从头开始训练 CLIP，然后使用二部匹配损失在标准目标检测数据集上端到端地微调 OWL-ViT。

通过这种方法，模型可以基于文本描述检测对象，而无需事先在标记的数据集上进行训练。

在本指南中，您将学习如何使用 OWL-ViT：

基于文本提示检测对象
用于批量目标检测
用于图像引导的目标检测

在开始之前，请确保已安装所有必要的库：

pip install -q transformers

零样本目标检测管道

尝试使用 OWL-ViT 进行推理的最简单方法是在 Hugging Face Hub 上的管道()中使用它。从Hugging Face Hub 上的检查点实例化一个零样本目标检测管道：

>>> from transformers import pipeline

>>> checkpoint = "google/owlvit-base-patch32"
>>> detector = pipeline(model=checkpoint, task="zero-shot-object-detection")

接下来，选择一个您想要检测对象的图像。这里我们将使用宇航员 Eileen Collins 的图像，该图像是NASA Great Images 数据集的一部分。

>>> import skimage
>>> import numpy as np
>>> from PIL import Image

>>> image = skimage.data.astronaut()
>>> image = Image.fromarray(np.uint8(image)).convert("RGB")

>>> image

将图像和要查找的候选对象标签传递给管道。这里我们直接传递图像；其他合适的选项包括图像的本地路径或图像 url。我们还传递了所有要查询图像的项目的文本描述。

>>> predictions = detector(
...     image,
...     candidate_labels=["human face", "rocket", "nasa badge", "star-spangled banner"],
... )
>>> predictions
[{'score': 0.3571370542049408,
  'label': 'human face',
  'box': {'xmin': 180, 'ymin': 71, 'xmax': 271, 'ymax': 178}},
 {'score': 0.28099656105041504,
  'label': 'nasa badge',
  'box': {'xmin': 129, 'ymin': 348, 'xmax': 206, 'ymax': 427}},
 {'score': 0.2110239565372467,
  'label': 'rocket',
  'box': {'xmin': 350, 'ymin': -1, 'xmax': 468, 'ymax': 288}},
 {'score': 0.13790413737297058,
  'label': 'star-spangled banner',
  'box': {'xmin': 1, 'ymin': 1, 'xmax': 105, 'ymax': 509}},
 {'score': 0.11950037628412247,
  'label': 'nasa badge',
  'box': {'xmin': 277, 'ymin': 338, 'xmax': 327, 'ymax': 380}},
 {'score': 0.10649408400058746,
  'label': 'rocket',
  'box': {'xmin': 358, 'ymin': 64, 'xmax': 424, 'ymax': 280}}]

让我们可视化预测：

>>> from PIL import ImageDraw

>>> draw = ImageDraw.Draw(image)

>>> for prediction in predictions:
...     box = prediction["box"]
...     label = prediction["label"]
...     score = prediction["score"]

...     xmin, ymin, xmax, ymax = box.values()
...     draw.rectangle((xmin, ymin, xmax, ymax), outline="red", width=1)
...     draw.text((xmin, ymin), f"{label}: {round(score,2)}", fill="white")

>>> image

手动文本提示的零样本目标检测

现在您已经看到如何使用零样本目标检测管道，让我们手动复制相同的结果。

从Hugging Face Hub 上的检查点加载模型和相关处理器。这里我们将使用与之前相同的检查点：

>>> from transformers import AutoProcessor, AutoModelForZeroShotObjectDetection

>>> model = AutoModelForZeroShotObjectDetection.from_pretrained(checkpoint)
>>> processor = AutoProcessor.from_pretrained(checkpoint)

让我们选择不同的图像来改变一下。

>>> import requests

>>> url = "https://unsplash.com/photos/oj0zeY2Ltk4/download?ixid=MnwxMjA3fDB8MXxzZWFyY2h8MTR8fHBpY25pY3xlbnwwfHx8fDE2Nzc0OTE1NDk&force=true&w=640"
>>> im = Image.open(requests.get(url, stream=True).raw)
>>> im

使用处理器为模型准备输入。处理器结合了一个图像处理器，通过调整大小和归一化来为模型准备图像，以及一个 CLIPTokenizer，负责处理文本输入。

>>> text_queries = ["hat", "book", "sunglasses", "camera"]
>>> inputs = processor(text=text_queries, images=im, return_tensors="pt")

通过模型传递输入，后处理并可视化结果。由于图像处理器在将图像馈送到模型之前调整了图像的大小，因此您需要使用 post_process_object_detection()方法，以确保预测的边界框相对于原始图像具有正确的坐标：

>>> import torch

>>> with torch.no_grad():
...     outputs = model(**inputs)
...     target_sizes = torch.tensor([im.size[::-1]])
...     results = processor.post_process_object_detection(outputs, threshold=0.1, target_sizes=target_sizes)[0]

>>> draw = ImageDraw.Draw(im)

>>> scores = results["scores"].tolist()
>>> labels = results["labels"].tolist()
>>> boxes = results["boxes"].tolist()

>>> for box, score, label in zip(boxes, scores, labels):
...     xmin, ymin, xmax, ymax = box
...     draw.rectangle((xmin, ymin, xmax, ymax), outline="red", width=1)
...     draw.text((xmin, ymin), f"{text_queries[label]}: {round(score,2)}", fill="white")

>>> im

批处理

您可以传递多组图像和文本查询以在多个图像中搜索不同（或相同）的对象。让我们一起使用宇航员图像和海滩图像。对于批处理，您应该将文本查询作为嵌套列表传递给处理器，并将图像作为 PIL 图像、PyTorch 张量或 NumPy 数组的列表。

>>> images = [image, im]
>>> text_queries = [
...     ["human face", "rocket", "nasa badge", "star-spangled banner"],
...     ["hat", "book", "sunglasses", "camera"],
... ]
>>> inputs = processor(text=text_queries, images=images, return_tensors="pt")

以前，用于后处理的是将单个图像的大小作为张量传递，但您也可以传递一个元组，或者在有多个图像的情况下，传递一个元组列表。让我们为这两个示例创建预测，并可视化第二个示例（image_idx = 1）。

>>> with torch.no_grad():
...     outputs = model(**inputs)
...     target_sizes = [x.size[::-1] for x in images]
...     results = processor.post_process_object_detection(outputs, threshold=0.1, target_sizes=target_sizes)

>>> image_idx = 1
>>> draw = ImageDraw.Draw(images[image_idx])

>>> scores = results[image_idx]["scores"].tolist()
>>> labels = results[image_idx]["labels"].tolist()
>>> boxes = results[image_idx]["boxes"].tolist()

>>> for box, score, label in zip(boxes, scores, labels):
...     xmin, ymin, xmax, ymax = box
...     draw.rectangle((xmin, ymin, xmax, ymax), outline="red", width=1)
...     draw.text((xmin, ymin), f"{text_queries[image_idx][label]}: {round(score,2)}", fill="white")

>>> images[image_idx]

图像引导的对象检测

除了使用文本查询进行零样本对象检测外，OWL-ViT 还提供了图像引导的对象检测。这意味着您可以使用图像查询在目标图像中找到相似的对象。与文本查询不同，只允许一个示例图像。

让我们以一张沙发上有两只猫的图像作为目标图像，以一张单猫图像作为查询：

>>> url = "http://images.cocodataset.org/val2017/000000039769.jpg"
>>> image_target = Image.open(requests.get(url, stream=True).raw)

>>> query_url = "http://images.cocodataset.org/val2017/000000524280.jpg"
>>> query_image = Image.open(requests.get(query_url, stream=True).raw)

让我们快速查看这些图像：

>>> import matplotlib.pyplot as plt

>>> fig, ax = plt.subplots(1, 2)
>>> ax[0].imshow(image_target)
>>> ax[1].imshow(query_image)

在预处理步骤中，现在需要使用query_images而不是文本查询：

>>> inputs = processor(images=image_target, query_images=query_image, return_tensors="pt")

对于预测，不要将输入传递给模型，而是将它们传递给 image_guided_detection()。除了现在没有标签之外，绘制预测与以前相同。

>>> with torch.no_grad():
...     outputs = model.image_guided_detection(**inputs)
...     target_sizes = torch.tensor([image_target.size[::-1]])
...     results = processor.post_process_image_guided_detection(outputs=outputs, target_sizes=target_sizes)[0]

>>> draw = ImageDraw.Draw(image_target)

>>> scores = results["scores"].tolist()
>>> boxes = results["boxes"].tolist()

>>> for box, score, label in zip(boxes, scores, labels):
...     xmin, ymin, xmax, ymax = box
...     draw.rectangle((xmin, ymin, xmax, ymax), outline="white", width=4)

>>> image_target

posted @ 2024-06-22 14:10 绝不原创的飞龙阅读(114) 评论(0) 收藏举报

刷新页面返回顶部

龙哥盟

掠夺·扩张·投机·博弈

Transformers--4-37-中文文档-二-

Transformers 4.37 中文文档（二）

因果语言建模

加载 ELI5 数据集

预处理

训练

推理

遮蔽语言建模

加载 ELI5 数据集

预处理

训练

推理

翻译

加载 OPUS Books 数据集

预处理

评估

训练

推理

摘要

加载 BillSum 数据集

预处理

评估

训练

推理

多项选择

加载 SWAG 数据集

预处理

评估

训练

推理

音频

音频分类

加载 MInDS-14 数据集

预处理

评估

训练

推理

自动语音识别

加载 MInDS-14 数据集

预处理

评估

训练

推理

计算机视觉

图像分类

加载 Food-101 数据集

预处理

评估

训练

推理

图像分割

分割类型

为分割微调模型

加载 SceneParse150 数据集

自定义数据集

预处理

评估

训练

推理

视频分类

加载 UCF101 数据集

加载一个模型进行微调

为训练准备数据集

可视化预处理后的视频以进行更好的调试

训练模型

推断

目标检测

加载 CPPE-5 数据集

预处理数据

训练 DETR 模型

评估

推断

零样本目标检测

零样本目标检测管道

手动文本提示的零样本目标检测

批处理

图像引导的对象检测

公告