Transformers--4-37-中文文档-三十五-

Transformers 4.37 中文文档（三十五）

原文：huggingface.co/docs/transformers

MobileNet V1

原文链接：huggingface.co/docs/transformers/v4.37.2/en/model_doc/mobilenet_v1

概述

MobileNet 模型是由 Andrew G. Howard, Menglong Zhu, Bo Chen, Dmitry Kalenichenko, Weijun Wang, Tobias Weyand, Marco Andreetto, Hartwig Adam 在MobileNets: Efficient Convolutional Neural Networks for Mobile Vision Applications中提出的。

论文摘要如下：

我们提出了一类称为 MobileNets 的高效模型，用于移动和嵌入式视觉应用。MobileNets 基于一种简化的架构，使用深度可分离卷积来构建轻量级深度神经网络。我们引入了两个简单的全局超参数，有效地在延迟和准确性之间进行权衡。这些超参数允许模型构建者根据问题的约束选择适合其应用的正确大小的模型。我们进行了大量的资源和准确性权衡实验，并展示了与 ImageNet 分类中其他流行模型相比的强大性能。然后，我们展示了 MobileNets 在广泛的应用和用例中的有效性，包括目标检测、细粒度分类、面部属性和大规模地理定位。

此模型由matthijs贡献。原始代码和权重可以在此处找到。

使用提示

检查点的命名为mobilenet_v1_depth_size，例如mobilenet_v1_1.0_224，其中1.0是深度乘数（有时也称为“alpha”或宽度乘数），224是模型训练的输入图像的分辨率。
尽管检查点是在特定大小的图像上训练的，但模型将适用于任何大小的图像。支持的最小图像大小为 32x32。
可以使用 MobileNetV1ImageProcessor 来为模型准备图像。
可用的图像分类检查点是在ImageNet-1k上预训练的（也称为 ILSVRC 2012，包含 130 万张图像和 1000 个类）。但是，该模型预测 1001 个类别：来自 ImageNet 的 1000 个类别加上额外的“背景”类（索引 0）。
原始的 TensorFlow 检查点使用不同的填充规则比 PyTorch，需要模型在推断时确定填充量，因为这取决于输入图像的大小。要使用本机 PyTorch 填充行为，请创建一个 MobileNetV1Config，其中tf_padding = False。

不支持的功能：

MobileNetV1Model 输出最后隐藏状态的全局池化版本。在原始模型中，可以使用带有步幅 2 的 7x7 平均池化层，而不是全局池化。对于较大的输入，这会产生一个大于 1x1 像素的池化输出。HuggingFace 的实现不支持这一点。
目前无法指定output_stride。对于较小的输出步幅，原始模型调用扩张卷积以防止空间分辨率进一步降低。HuggingFace 模型的输出步幅始终为 32。
原始的 TensorFlow 检查点包括量化模型。我们不支持这些模型，因为它们包括额外的“FakeQuantization”操作来取消量化权重。
通常会从逐点层的输出中提取索引为 5、11、12、13 的输出以供下游使用。使用 output_hidden_states=True 返回所有中间层的输出。目前没有办法将其限制在特定层。

资源

一个官方的 Hugging Face 和社区资源列表（由 🌎 表示），帮助您开始使用 MobileNetV1。

图像分类

MobileNetV1ForImageClassification 受到这个示例脚本和笔记本的支持。
参见：图像分类任务指南

如果您有兴趣提交资源以包含在此处，请随时打开一个 Pull Request，我们将进行审核！资源应该展示一些新内容，而不是重复现有资源。

MobileNetV1Config

`class transformers.MobileNetV1Config`

< source >

( num_channels = 3 image_size = 224 depth_multiplier = 1.0 min_depth = 8 hidden_act = 'relu6' tf_padding = True classifier_dropout_prob = 0.999 initializer_range = 0.02 layer_norm_eps = 0.001 **kwargs )

参数

num_channels (int, optional, defaults to 3) — 输入通道的数量。
image_size (int, optional, defaults to 224) — 每个图像的大小（分辨率）。
depth_multiplier (float, optional, defaults to 1.0) — 收缩或扩展每一层中的通道数量。默认值为 1.0，从 32 个通道开始网络。有时也称为“alpha”或“宽度倍增器”。
min_depth (int, optional, defaults to 8) — 所有层至少具有这么多通道。
hidden_act (str or function, optional, defaults to "relu6") — Transformer 编码器和卷积层中的非线性激活函数（函数或字符串）。
tf_padding (bool, optional, defaults to True) — 是否在卷积层上使用 TensorFlow 填充规则。
classifier_dropout_prob (float, optional, defaults to 0.999) — 附加分类器的丢失比例。
initializer_range (float, optional, defaults to 0.02) — 用于初始化所有权重矩阵的截断正态分布初始化器的标准差。
layer_norm_eps (float, optional, defaults to 0.001) — 层归一化层使用的 epsilon。

这是用于存储 MobileNetV1Model 配置的配置类。根据指定的参数实例化一个 MobileNetV1 模型，定义模型架构。使用默认值实例化配置将产生类似于 MobileNetV1 google/mobilenet_v1_1.0_224 架构的配置。

配置对象继承自 PretrainedConfig，可用于控制模型输出。阅读来自 PretrainedConfig 的文档以获取更多信息。

示例：

>>> from transformers import MobileNetV1Config, MobileNetV1Model

>>> # Initializing a "mobilenet_v1_1.0_224" style configuration
>>> configuration = MobileNetV1Config()

>>> # Initializing a model from the "mobilenet_v1_1.0_224" style configuration
>>> model = MobileNetV1Model(configuration)

>>> # Accessing the model configuration
>>> configuration = model.config

MobileNetV1FeatureExtractor

`class transformers.MobileNetV1FeatureExtractor`

< source >

( *args **kwargs )

`preprocess`

< source >

( images: Union do_resize: Optional = None size: Dict = None resample: Resampling = None do_center_crop: bool = None crop_size: Dict = None do_rescale: Optional = None rescale_factor: Optional = None do_normalize: Optional = None image_mean: Union = None image_std: Union = None return_tensors: Union = None data_format: Union = <ChannelDimension.FIRST: 'channels_first'> input_data_format: Union = None **kwargs )

参数

images (ImageInput) — 要预处理的图像。期望单个或批量图像，像素值范围从 0 到 255。如果传入像素值在 0 到 1 之间的图像，请设置 do_rescale=False。
do_resize (bool，可选，默认为self.do_resize) — 是否调整图像大小。
size (Dict[str, int]，可选，默认为self.size) — 调整大小后的图像尺寸。图像的最短边被调整为 size[“shortest_edge”]，最长边被调整以保持输入的长宽比。
resample (PILImageResampling 过滤器，可选，默认为self.resample) — 如果调整图像大小，则使用的PILImageResampling过滤器，例如PILImageResampling.BILINEAR。仅在do_resize设置为True时有效。
do_center_crop (bool，可选，默认为self.do_center_crop) — 是否对图像进行中心裁剪。
crop_size (Dict[str, int]，可选，默认为self.crop_size) — 中心裁剪的尺寸。仅在do_center_crop设置为True时有效。
do_rescale (bool，可选，默认为self.do_rescale) — 是否将图像值重新缩放到[0 - 1]之间。
rescale_factor (float，可选，默认为self.rescale_factor) — 如果do_rescale设置为True，要按照此因子重新缩放图像。
do_normalize (bool，可选，默认为self.do_normalize) — 是否对图像进行归一化。
image_mean (float 或 List[float]，可选，默认为self.image_mean) — 如果do_normalize设置为True，要使用的图像均值。
image_std (float 或 List[float]，可选，默认为self.image_std) — 如果do_normalize设置为True，要使用的图像标准差。
return_tensors (str 或 TensorType，可选) — 要返回的张量类型。可以是以下之一：
- 未设置：返回一个np.ndarray列表。
- TensorType.TENSORFLOW 或 'tf'：返回一个类型为tf.Tensor的批处理。
- TensorType.PYTORCH 或 'pt'：返回一个类型为torch.Tensor的批处理。
- TensorType.NUMPY 或 'np'：返回一个类型为np.ndarray的批处理。
- TensorType.JAX 或 'jax'：返回一个类型为jax.numpy.ndarray的批处理。
data_format (ChannelDimension 或 str，可选，默认为ChannelDimension.FIRST) — 输出图像的通道维度格式。可以是以下之一：
- "channels_first" 或 ChannelDimension.FIRST：图像以(num_channels, height, width)格式。
- "channels_last" 或 ChannelDimension.LAST：图像以(height, width, num_channels)格式。
- 未设置：使用输入图像的通道维度格式。
input_data_format (ChannelDimension 或 str，可选) — 输入图像的通道维度格式。如果未设置，将从输入图像中推断通道维度格式。可以是以下之一：
- "channels_first" 或 ChannelDimension.FIRST：图像以(num_channels, height, width)格式。
- "channels_last" 或 ChannelDimension.LAST：图像以(height, width, num_channels)格式。
- "none" 或 ChannelDimension.NONE：图像以(height, width)格式。

对图像或图像批处理进行预处理。

MobileNetV1ImageProcessor

`class transformers.MobileNetV1ImageProcessor`

< source >

( do_resize: bool = True size: Optional = None resample: Resampling = <Resampling.BILINEAR: 2> do_center_crop: bool = True crop_size: Dict = None do_rescale: bool = True rescale_factor: Union = 0.00392156862745098 do_normalize: bool = True image_mean: Union = None image_std: Union = None **kwargs )

参数

do_resize (bool，可选，默认为True) — 是否将图像的(高度，宽度)维度调整为指定的size。可以被preprocess方法中的do_resize覆盖。
size (Dict[str, int] 可选，默认为{"shortest_edge" -- 256})：调整大小后的图像尺寸。图像的最短边被调整为 size[“shortest_edge”]，最长边被调整以保持输入的长宽比。可以被preprocess方法中的size覆盖。
resample (PILImageResampling，可选，默认为PILImageResampling.BILINEAR) — 如果调整图像大小，则使用的重采样滤波器。可以被preprocess方法中的resample参数覆盖。
do_center_crop (bool，可选，默认为True) — 是否对图像进行中心裁剪。如果输入尺寸小于任何边沿的crop_size，则图像将填充为 0，然后进行中心裁剪。可以被preprocess方法中的do_center_crop参数覆盖。
crop_size (Dict[str, int]，可选，默认为{"height" -- 224, "width": 224})：应用中心裁剪时的期望输出大小。仅在do_center_crop设置为True时有效。可以被preprocess方法中的crop_size参数覆盖。
do_rescale (bool，可选，默认为True) — 是否按指定比例rescale_factor重新缩放图像。可以被preprocess方法中的do_rescale参数覆盖。
rescale_factor (int 或 float，可选，默认为1/255) — 如果重新缩放图像，则使用的缩放因子。可以被preprocess方法中的rescale_factor参数覆盖。do_normalize — 是否对图像进行归一化。可以被preprocess方法中的do_normalize参数覆盖。
image_mean (float 或 List[float]，可选，默认为IMAGENET_STANDARD_MEAN) — 如果对图像进行归一化，则使用的均值。这是一个浮点数或与图像通道数相同长度的浮点数列表。可以被preprocess方法中的image_mean参数覆盖。
image_std (float 或 List[float]，可选，默认为IMAGENET_STANDARD_STD) — 如果对图像进行归一化，则使用的标准差。这是一个浮点数或与图像通道数相同长度的浮点数列表。可以被preprocess方法中的image_std参数覆盖。

构建一个 MobileNetV1 图像处理器。

模型变体	深度	隐藏大小	参数（百万）	ImageNet-1k Top 1
s12	[2, 2, 6, 2]	[64, 128, 320, 512]	12	77.2
s24	[4, 4, 12, 4]	[64, 128, 320, 512]	21	80.3
s36	[6, 6, 18, 6]	[64, 128, 320, 512]	31	81.4
m36	[6, 6, 18, 6]	[96, 192, 384, 768]	56	82.1
m48	[8, 8, 24, 8]	[96, 192, 384, 768]	73	82.5

模型变体	大小	准确率@1	参数（百万）
PVT-Tiny	224	75.1	13.2
PVT-Small	224	79.8	24.5
PVT-Medium	224	81.2	44.2
PVT-Large	224	81.7	61.4

模型变体	深度	隐藏大小	解码器隐藏大小	参数（百万）	ImageNet-1k Top 1
MiT-b0	[2, 2, 2, 2]	[32, 64, 160, 256]	256	3.7	70.5
MiT-b1	[2, 2, 2, 2]	[64, 128, 320, 512]	256	14.0	78.7
MiT-b2	[3, 4, 6, 3]	[64, 128, 320, 512]	768	25.4	81.6
MiT-b3	[3, 4, 18, 3]	[64, 128, 320, 512]	768	45.2	83.1
MiT-b4	[3, 8, 27, 3]	[64, 128, 320, 512]	768	62.6	83.6
MiT-b5	[3, 6, 40, 3]	[64, 128, 320, 512]	768	82.0	83.8

龙哥盟

掠夺·扩张·投机·博弈

Transformers--4-37-中文文档-三十五-

Transformers 4.37 中文文档（三十五）

MobileNet V1

概述

使用提示

资源

MobileNetV1Config

class transformers.MobileNetV1Config

MobileNetV1FeatureExtractor

class transformers.MobileNetV1FeatureExtractor

preprocess

MobileNetV1ImageProcessor

class transformers.MobileNetV1ImageProcessor

preprocess

MobileNetV1Model

class transformers.MobileNetV1Model

forward

MobileNetV1ForImageClassification

class transformers.MobileNetV1ForImageClassification

forward

MobileNet V2

概述

使用提示

资源

MobileNetV2Config

class transformers.MobileNetV2Config

MobileNetV2FeatureExtractor

class transformers.MobileNetV2FeatureExtractor

preprocess

post_process_semantic_segmentation

MobileNetV2ImageProcessor

class transformers.MobileNetV2ImageProcessor

preprocess

post_process_semantic_segmentation

MobileNetV2Model

class transformers.MobileNetV2Model

forward

MobileNetV2ForImageClassification

class transformers.MobileNetV2ForImageClassification

forward

MobileNetV2ForSemanticSegmentation

class transformers.MobileNetV2ForSemanticSegmentation

forward

MobileViT

概述

使用提示

资源

MobileViTConfig

class transformers.MobileViTConfig

MobileViTFeatureExtractor

class transformers.MobileViTFeatureExtractor

__call__

post_process_semantic_segmentation

MobileViTImageProcessor

class transformers.MobileViTImageProcessor

预处理

后处理语义分割

MobileViTModel

class transformers.MobileViTModel

forward

MobileViTForImageClassification

class transformers.MobileViTForImageClassification

forward

MobileViTForSemanticSegmentation

class transformers.MobileViTForSemanticSegmentation

forward

TFMobileViTModel

class transformers.TFMobileViTModel

call

TFMobileViTForImageClassification

class transformers.TFMobileViTForImageClassification

call

TFMobileViTForSemanticSegmentation

class transformers.TFMobileViTForSemanticSegmentation

call

MobileViTV2

概述

使用提示

`class transformers.MobileNetV1Config`

`class transformers.MobileNetV1FeatureExtractor`

`preprocess`

`class transformers.MobileNetV1ImageProcessor`

`preprocess`

`class transformers.MobileNetV1Model`

`forward`

`class transformers.MobileNetV1ForImageClassification`

`forward`

`class transformers.MobileNetV2Config`

`class transformers.MobileNetV2FeatureExtractor`

`preprocess`

`post_process_semantic_segmentation`

`class transformers.MobileNetV2ImageProcessor`

`preprocess`

`post_process_semantic_segmentation`

`class transformers.MobileNetV2Model`

`forward`

`class transformers.MobileNetV2ForImageClassification`

`forward`

`class transformers.MobileNetV2ForSemanticSegmentation`

`forward`

`class transformers.MobileViTConfig`

`class transformers.MobileViTFeatureExtractor`

`call`

`post_process_semantic_segmentation`

`class transformers.MobileViTImageProcessor`

`预处理`

`后处理语义分割`

`class transformers.MobileViTModel`

`forward`

`class transformers.MobileViTForImageClassification`

`forward`

`class transformers.MobileViTForSemanticSegmentation`

`forward`

`class transformers.TFMobileViTModel`

`call`

`class transformers.TFMobileViTForImageClassification`

`call`

`class transformers.TFMobileViTForSemanticSegmentation`

`call`

`class transformers.MobileViTV2Config`

`class transformers.MobileViTV2Model`

`class transformers.MobileViTV2ForImageClassification`

`forward`

`class transformers.MobileViTV2ForSemanticSegmentation`

`forward`

`class transformers.NatConfig`

`class transformers.NatModel`

`forward`

`class transformers.NatForImageClassification`

`forward`

`class transformers.PoolFormerConfig`

`class transformers.PoolFormerFeatureExtractor`

`call`

`class transformers.PoolFormerImageProcessor`

`preprocess`

`class transformers.PoolFormerModel`

`forward`

`class transformers.PoolFormerForImageClassification`

`forward`

`class transformers.PvtConfig`

`class transformers.PvtImageProcessor`

`预处理`

`class transformers.PvtForImageClassification`

`forward`

`class transformers.PvtModel`

`forward`

`class transformers.RegNetConfig`

`class transformers.RegNetModel`

`forward`

`class transformers.RegNetForImageClassification`

`forward`

`class transformers.TFRegNetModel`

`call`

`class transformers.TFRegNetForImageClassification`

`call`