Transformers--4-37-中文文档-十五-

Transformers 4.37 中文文档（十五）

原文：huggingface.co/docs/transformers

DeBERTa-v2

原始文本：huggingface.co/docs/transformers/v4.37.2/en/model_doc/deberta-v2

概述

DeBERTa 模型是由 Pengcheng He、Xiaodong Liu、Jianfeng Gao、Weizhu Chen 在DeBERTa: Decoding-enhanced BERT with Disentangled Attention中提出的，它基于 2018 年发布的 Google 的 BERT 模型和 2019 年发布的 Facebook 的 RoBERTa 模型。

它基于 RoBERTa，具有解耦注意力和增强的掩码解码器训练，使用 RoBERTa 一半的数据。

论文摘要如下：

最近在预训练神经语言模型方面取得了显著进展，大大提高了许多自然语言处理（NLP）任务的性能。在本文中，我们提出了一种新的模型架构 DeBERTa（具有解耦注意力的解码增强 BERT），通过两种新技术改进了 BERT 和 RoBERTa 模型。第一种是解耦注意力机制，其中每个单词使用两个向量表示，分别编码其内容和位置，并且单词之间的注意力权重是使用解耦矩阵在它们的内容和相对位置上计算的。其次，使用增强的掩码解码器来替换输出 softmax 层，以预测模型预训练的掩码标记。我们展示了这两种技术显著提高了模型预训练的效率和下游任务的性能。与 RoBERTa-Large 相比，DeBERTa 模型在一半训练数据上训练的表现始终更好，对一系列 NLP 任务取得了改进，MNLI 提高了+0.9%（90.2% vs. 91.1%），SQuAD v2.0 提高了+2.3%（88.4% vs. 90.7%），RACE 提高了+3.6%（83.2% vs. 86.8%）。DeBERTa 的代码和预训练模型将在github.com/microsoft/DeBERTa上公开。

以下信息直接可见于原始实现存储库。DeBERTa v2 是 DeBERTa 模型的第二个版本。它包括用于 SuperGLUE 单模型提交的 15 亿模型，取得了 89.9 的成绩，而人类基准为 89.8。您可以在作者的博客中找到有关此提交的更多详细信息。

v2 中的新功能：

词汇在 v2 中，分词器更改为使用从训练数据构建的大小为 128K 的新词汇表。分词器不再是基于 GPT2 的，而是基于sentencepiece的分词器。
nGiE（nGram Induced Input Encoding） DeBERTa-v2 模型使用额外的卷积层，与第一个变压器层一起更好地学习输入标记的局部依赖性。
在注意力层中共享位置投影矩阵和内容投影矩阵 根据以前的实验，这可以节省参数而不影响性能。
应用桶编码相对位置 DeBERTa-v2 模型使用对数桶来编码相对位置，类似于 T5。
900M 模型和 1.5B 模型 还提供了两种额外的模型大小：900M 和 1.5B，这显著提高了下游任务的性能。

这个模型是由DeBERTa贡献的。这个模型 TF 2.0 的实现是由kamalkraj贡献的。原始代码可以在这里找到。

资源

文本分类任务指南
标记分类任务指南
问答任务指南
掩码语言建模任务指南
多项选择任务指南

DebertaV2Config

`class transformers.DebertaV2Config`

< source >

( vocab_size = 128100 hidden_size = 1536 num_hidden_layers = 24 num_attention_heads = 24 intermediate_size = 6144 hidden_act = 'gelu' hidden_dropout_prob = 0.1 attention_probs_dropout_prob = 0.1 max_position_embeddings = 512 type_vocab_size = 0 initializer_range = 0.02 layer_norm_eps = 1e-07 relative_attention = False max_relative_positions = -1 pad_token_id = 0 position_biased_input = True pos_att_type = None pooler_dropout = 0 pooler_hidden_act = 'gelu' **kwargs )

参数

vocab_size (int, optional, defaults to 128100) — DeBERTa-v2 模型的词汇表大小。定义了在调用 DebertaV2Model 时可以表示的不同标记的数量。
hidden_size (int, optional, defaults to 1536) — 编码器层和池化器层的维度。
num_hidden_layers (int, optional, defaults to 24) — Transformer 编码器中的隐藏层数量。
num_attention_heads (int, optional, defaults to 24) — Transformer 编码器中每个注意力层的注意力头数。
intermediate_size (int, optional, defaults to 6144) — Transformer 编码器中“中间”（通常称为前馈）层的维度。
hidden_act (str or Callable, optional, defaults to "gelu") — 编码器和池化器中的非线性激活函数（函数或字符串）。如果是字符串，支持"gelu"、"relu"、"silu"、"gelu"、"tanh"、"gelu_fast"、"mish"、"linear"、"sigmoid"和"gelu_new"。
hidden_dropout_prob (float, optional, defaults to 0.1) — 嵌入层、编码器和池化器中所有全连接层的 dropout 概率。
attention_probs_dropout_prob (float, optional, defaults to 0.1) — 注意力概率的 dropout 比率。
max_position_embeddings (int, optional, defaults to 512) — 该模型可能使用的最大序列长度。通常将其设置为较大的值以防万一（例如 512、1024 或 2048）。
type_vocab_size (int, optional, defaults to 0) — 在调用 DebertaModel 或 TFDebertaModel 时传递的token_type_ids的词汇表大小。
initializer_range (float, optional, defaults to 0.02) — 用于初始化所有权重矩阵的截断正态初始化器的标准差。
layer_norm_eps (float, optional, defaults to 1e-7) — 层归一化层使用的 epsilon。
relative_attention (bool, optional, defaults to True) — 是否使用相对位置编码。
max_relative_positions (int, optional, defaults to -1) — 相对位置范围[-max_position_embeddings, max_position_embeddings]。使用与max_position_embeddings相同的值。
pad_token_id (int, optional, defaults to 0) — 用于填充 input_ids 的值。
position_biased_input (bool, optional, defaults to False) — 是否将绝对位置嵌入添加到内容嵌入中。
pos_att_type (List[str], optional) — 相对位置注意力的类型，可以是["p2c", "c2p"]的组合，例如["p2c"]、["p2c", "c2p"]、["p2c", "c2p"]。
layer_norm_eps (float, optional, defaults to 1e-12) — 层归一化层使用的 epsilon。

这是用于存储 DebertaV2Model 配置的配置类。根据指定的参数实例化一个 DeBERTa-v2 模型，定义模型架构。使用默认值实例化配置将产生类似于 DeBERTa microsoft/deberta-v2-xlarge架构的配置。

配置对象继承自 PretrainedConfig，可用于控制模型输出。阅读 PretrainedConfig 的文档以获取更多信息。

示例：

>>> from transformers import DebertaV2Config, DebertaV2Model

>>> # Initializing a DeBERTa-v2 microsoft/deberta-v2-xlarge style configuration
>>> configuration = DebertaV2Config()

>>> # Initializing a model (with random weights) from the microsoft/deberta-v2-xlarge style configuration
>>> model = DebertaV2Model(configuration)

>>> # Accessing the model configuration
>>> configuration = model.config

龙哥盟

掠夺·扩张·投机·博弈

Transformers--4-37-中文文档-十五-

Transformers 4.37 中文文档（十五）

DeBERTa-v2

概述

资源

DebertaV2Config

class transformers.DebertaV2Config

DebertaV2Tokenizer

class transformers.DebertaV2Tokenizer

build_inputs_with_special_tokens

get_special_tokens_mask

create_token_type_ids_from_sequences

save_vocabulary

DebertaV2TokenizerFast

class transformers.DebertaV2TokenizerFast

build_inputs_with_special_tokens

create_token_type_ids_from_sequences

DebertaV2Model

class transformers.DebertaV2Model

forward

DebertaV2PreTrainedModel

class transformers.DebertaV2PreTrainedModel

_forward_unimplemented

DebertaV2ForMaskedLM

class transformers.DebertaV2ForMaskedLM

forward

DebertaV2ForSequenceClassification

forward

DebertaV2ForTokenClassification

class transformers.DebertaV2ForTokenClassification

forward

DebertaV2ForQuestionAnswering

class transformers.DebertaV2ForQuestionAnswering

forward

DebertaV2ForMultipleChoice

class transformers.DebertaV2ForMultipleChoice

forward

TFDebertaV2Model

class transformers.TFDebertaV2Model

call

TFDebertaV2PreTrainedModel

class transformers.TFDebertaV2PreTrainedModel

call

TFDebertaV2ForMaskedLM

class transformers.TFDebertaV2ForMaskedLM

call

TFDebertaV2ForSequenceClassification

class transformers.TFDebertaV2ForSequenceClassification

call

TFDebertaV2ForTokenClassification

class transformers.TFDebertaV2ForTokenClassification

call

TFDebertaV2ForQuestionAnswering

call

TFDebertaV2ForMultipleChoice

class transformers.TFDebertaV2ForMultipleChoice

call

DialoGPT

概述

使用提示

DistilBERT

概述

使用提示

资源

结合 DistilBERT 和 Flash Attention 2

DistilBertConfig

class transformers.DistilBertConfig

DistilBertTokenizer

class transformers.DistilBertTokenizer

build_inputs_with_special_tokens

convert_tokens_to_string

create_token_type_ids_from_sequences

get_special_tokens_mask

DistilBertTokenizerFast

class transformers.DistilBertTokenizerFast

build_inputs_with_special_tokens

create_token_type_ids_from_sequences

DistilBertModel

`class transformers.DebertaV2Config`

`class transformers.DebertaV2Tokenizer`

`build_inputs_with_special_tokens`

`get_special_tokens_mask`

`create_token_type_ids_from_sequences`

`save_vocabulary`

`class transformers.DebertaV2TokenizerFast`

`build_inputs_with_special_tokens`

`create_token_type_ids_from_sequences`

`class transformers.DebertaV2Model`

`forward`

`class transformers.DebertaV2PreTrainedModel`

`_forward_unimplemented`

`class transformers.DebertaV2ForMaskedLM`

`forward`

`forward`

`class transformers.DebertaV2ForTokenClassification`

`forward`

`class transformers.DebertaV2ForQuestionAnswering`

`forward`

`class transformers.DebertaV2ForMultipleChoice`

`forward`

`class transformers.TFDebertaV2Model`

`call`

`class transformers.TFDebertaV2PreTrainedModel`

`call`

`class transformers.TFDebertaV2ForMaskedLM`

`call`

`class transformers.TFDebertaV2ForSequenceClassification`

`call`

`class transformers.TFDebertaV2ForTokenClassification`

`call`

`call`

`class transformers.TFDebertaV2ForMultipleChoice`

`call`

`class transformers.DistilBertConfig`

`class transformers.DistilBertTokenizer`

`build_inputs_with_special_tokens`

`convert_tokens_to_string`

`create_token_type_ids_from_sequences`

`get_special_tokens_mask`

`class transformers.DistilBertTokenizerFast`

`build_inputs_with_special_tokens`

`create_token_type_ids_from_sequences`

`class transformers.DistilBertModel`

`forward`

`class transformers.DistilBertForMaskedLM`

`forward`

`class transformers.DistilBertForSequenceClassification`

`forward`

`class transformers.DistilBertForMultipleChoice`

`forward`

`class transformers.DistilBertForTokenClassification`

`forward`

`class transformers.DistilBertForQuestionAnswering`

`forward`

`class transformers.TFDistilBertModel`

`class transformers.TFDistilBertForMaskedLM`

`call`

`class transformers.TFDistilBertForSequenceClassification`

`call`

`class transformers.TFDistilBertForMultipleChoice`

`call`

`class transformers.TFDistilBertForTokenClassification`

`call`

`class transformers.TFDistilBertForQuestionAnswering`

`call`

`class transformers.FlaxDistilBertModel`

`call`

`class transformers.FlaxDistilBertForMaskedLM`

`call`

`class transformers.FlaxDistilBertForSequenceClassification`

`call`

`class transformers.FlaxDistilBertForMultipleChoice`

`call`

`class transformers.FlaxDistilBertForTokenClassification`

`call`

`class transformers.FlaxDistilBertForQuestionAnswering`

`call`

`class transformers.DPRConfig`