Computer vision models on PyTorch

Computer vision models on PyTorch

PyPI Downloads

This is a collection of image classification, segmentation, detection, and pose estimation models. Many of them are pretrained on ImageNet-1K, CIFAR-10/100, SVHN, CUB-200-2011, Pascal VOC2012, ADE20K, Cityscapes, and COCO datasets and loaded automatically during use. All pretrained models require the same ordinary normalization. Scripts for training/evaluating/converting models are in the imgclsmob repo.


List of implemented models

AlexNet ('One weird trick for parallelizing convolutional neural networks')
ZFNet ('Visualizing and Understanding Convolutional Networks')
VGG/BN-VGG ('Very Deep Convolutional Networks for Large-Scale Image Recognition')
BN-Inception ('Batch Normalization: Accelerating Deep Network Training by Reducing Internal Covariate Shift')
ResNet ('Deep Residual Learning for Image Recognition')
PreResNet ('Identity Mappings in Deep Residual Networks')
ResNeXt ('Aggregated Residual Transformations for Deep Neural Networks')
SENet/SE-ResNet/SE-PreResNet/SE-ResNeXt ('Squeeze-and-Excitation Networks')
ResNeSt(A) ('ResNeSt: Split-Attention Networks')
IBN-ResNet/IBN-ResNeXt/IBN-DenseNet ('Two at Once: Enhancing Learning and Generalization Capacities via IBN-Net')
AirNet/AirNeXt ('Attention Inspiring Receptive-Fields Network for Learning Invariant Representations')
BAM-ResNet ('BAM: Bottleneck Attention Module')
CBAM-ResNet ('CBAM: Convolutional Block Attention Module')
ResAttNet ('Residual Attention Network for Image Classification')
SKNet ('Selective Kernel Networks')
SCNet ('Improving Convolutional Networks with Self-Calibrated Convolutions')
RegNet ('Designing Network Design Spaces')
DIA-ResNet ('DIANet: Dense-and-Implicit Attention Network')
PyramidNet ('Deep Pyramidal Residual Networks')
DiracNetV2 ('DiracNets: Training Very Deep Neural Networks Without Skip-Connections')
ShaResNet ('ShaResNet: reducing residual network parameter number by sharing weights')
DenseNet ('Densely Connected Convolutional Networks')
CondenseNet ('CondenseNet: An Efficient DenseNet using Learned Group Convolutions')
SparseNet ('Sparsely Aggregated Convolutional Networks')
PeleeNet ('Pelee: A Real-Time Object Detection System on Mobile Devices')
Oct-ResNet ('Drop an Octave: Reducing Spatial Redundancy in Convolutional Neural Networks with Octave Convolution')
WRN ('Wide Residual Networks')
WRN-1bit ('Training wide residual networks for deployment using a single bit for each weight')
DRN-C/DRN-D ('Dilated Residual Networks')
DPN ('Dual Path Networks')
DarkNet Ref/Tiny/19 ('Darknet: Open source neural networks in c')
DarkNet-53 ('YOLOv3: An Incremental Improvement')
ChannelNet ('ChannelNets: Compact and Efficient Convolutional Neural Networks via Channel-Wise Convolutions')
iSQRT-COV-ResNet ('Towards Faster Training of Global Covariance Pooling Networks by Iterative Matrix Square Root Normalization')
RevNet ('The Reversible Residual Network: Backpropagation Without Storing Activations')
i-RevNet ('i-RevNet: Deep Invertible Networks')
BagNet ('Approximating CNNs with Bag-of-local-Features models works surprisingly well on ImageNet')
DLA ('Deep Layer Aggregation')
MSDNet ('Multi-Scale Dense Networks for Resource Efficient Image Classification')
FishNet ('FishNet: A Versatile Backbone for Image, Region, and Pixel Level Prediction')
ESPNetv2 ('ESPNetv2: A Light-weight, Power Efficient, and General Purpose Convolutional Neural Network')
DiCENet ('DiCENet: Dimension-wise Convolutions for Efficient Networks')
HRNet ('Deep High-Resolution Representation Learning for Visual Recognition')
VoVNet ('An Energy and GPU-Computation Efficient Backbone Network for Real-Time Object Detection')
SelecSLS ('XNect: Real-time Multi-person 3D Human Pose Estimation with a Single RGB Camera')
HarDNet ('HarDNet: A Low Memory Traffic Network')
X-DenseNet ('Deep Expander Networks: Efficient Deep Networks from Graph Theory')
SqueezeNet/SqueezeResNet ('SqueezeNet: AlexNet-level accuracy with 50x fewer parameters and <0.5MB model size')
SqueezeNext ('SqueezeNext: Hardware-Aware Neural Network Design')
ShuffleNet ('ShuffleNet: An Extremely Efficient Convolutional Neural Network for Mobile Devices')
ShuffleNetV2 ('ShuffleNet V2: Practical Guidelines for Efficient CNN Architecture Design')
MENet ('Merging and Evolution: Improving Convolutional Neural Networks for Mobile Applications')
MobileNet ('MobileNets: Efficient Convolutional Neural Networks for Mobile Vision Applications')
FD-MobileNet ('FD-MobileNet: Improved MobileNet with A Fast Downsampling Strategy')
MobileNetV2 ('MobileNetV2: Inverted Residuals and Linear Bottlenecks')
MobileNetV3 ('Searching for MobileNetV3')
IGCV3 ('IGCV3: Interleaved Low-Rank Group Convolutions for Efficient Deep Neural Networks')
GhostNet ('GhostNet: More Features from Cheap Operations')
MnasNet ('MnasNet: Platform-Aware Neural Architecture Search for Mobile')
DARTS ('DARTS: Differentiable Architecture Search')
ProxylessNAS ('ProxylessNAS: Direct Neural Architecture Search on Target Task and Hardware')
FBNet ('FBNet: Hardware-Aware Efficient ConvNet Design via Differentiable Neural Architecture Search')
Xception ('Xception: Deep Learning with Depthwise Separable Convolutions')
InceptionV3 ('Rethinking the Inception Architecture for Computer Vision')
InceptionV4/InceptionResNetV1/InceptionResNetV2 ('Inception-v4, Inception-ResNet and the Impact of Residual Connections on Learning')
PolyNet ('PolyNet: A Pursuit of Structural Diversity in Very Deep Networks')
NASNet ('Learning Transferable Architectures for Scalable Image Recognition')
PNASNet ('Progressive Neural Architecture Search')
SPNASNet ('Single-Path NAS: Designing Hardware-Efficient ConvNets in less than 4 Hours')
EfficientNet ('EfficientNet: Rethinking Model Scaling for Convolutional Neural Networks')
MixNet ('MixConv: Mixed Depthwise Convolutional Kernels')
NIN ('Network In Network')
RoR-3 ('Residual Networks of Residual Networks: Multilevel Residual Networks')
RiR ('Resnet in Resnet: Generalizing Residual Architectures')
ResDrop-ResNet ('Deep Networks with Stochastic Depth')
Shake-Shake-ResNet ('Shake-Shake regularization')
ShakeDrop-ResNet ('ShakeDrop Regularization for Deep Residual Learning')
FractalNet ('FractalNet: Ultra-Deep Neural Networks without Residuals')
NTS-Net ('Learning to Navigate for Fine-grained Classification')
PSPNet ('Pyramid Scene Parsing Network')
DeepLabv3 ('Rethinking Atrous Convolution for Semantic Image Segmentation')
FCN-8s ('Fully Convolutional Networks for Semantic Segmentation')
ICNet ('ICNet for Real-Time Semantic Segmentation on High-Resolution Images')
Fast-SCNN ('Fast-SCNN: Fast Semantic Segmentation Network')
CGNet ('CGNet: A Light-weight Context Guided Network for Semantic Segmentation')
DABNet ('DABNet: Depth-wise Asymmetric Bottleneck for Real-time Semantic Segmentation')
SINet ('SINet: Extreme Lightweight Portrait Segmentation Networks with Spatial Squeeze Modules and Information Blocking Decoder')
BiSeNet ('BiSeNet: Bilateral Segmentation Network for Real-time Semantic Segmentation')
DANet ('Dual Attention Network for Scene Segmentation')
FPENet ('Feature Pyramid Encoding Network for Real-time Semantic Segmentation')
ContextNet ('ContextNet: Exploring Context and Detail for Semantic Segmentation in Real-time')
LEDNet ('LEDNet: A Lightweight Encoder-Decoder Network for Real-Time Semantic Segmentation')
ESNet ('ESNet: An Efficient Symmetric Network for Real-time Semantic Segmentation')
EDANet ('Efficient Dense Modules of Asymmetric Convolution for Real-Time Semantic Segmentation')
ENet ('ENet: A Deep Neural Network Architecture for Real-Time Semantic Segmentation')
ERFNet ('ERFNet: Efficient Residual Factorized ConvNet for Real-time Semantic Segmentation')
LinkNet ('LinkNet: Exploiting Encoder Representations for Efficient Semantic Segmentation')
SegNet ('SegNet: A Deep Convolutional Encoder-Decoder Architecture for Image Segmentation')
U-Net ('U-Net: Convolutional Networks for Biomedical Image Segmentation')
SQNet ('Speeding up Semantic Segmentation for Autonomous Driving')
CenterNet ('Objects as Points')
LFFD ('LFFD: A Light and Fast Face Detector for Edge Devices')
AlphaPose ('RMPE: Regional Multi-person Pose Estimation')
SimplePose ('Simple Baselines for Human Pose Estimation and Tracking')
Lightweight OpenPose ('Real-time 2D Multi-Person Pose Estimation on CPU: Lightweight OpenPose')
IBPPose ('Simple Pose: Rethinking and Improving a Bottom-up Approach for Multi-Person Pose Estimation')
PFPCNet ('Production-Level Facial Performance Capture Using Deep Convolutional Neural Networks')
VOCA ('Capture, Learning, and Synthesis of 3D Speaking Styles')
Neural Voice Puppetry Audio-to-Expression net ('Neural Voice Puppetry: Audio-driven Facial Reenactment')
Jasper/JasperDR ('Jasper: An End-to-End Convolutional Neural Acoustic Model')
QuartzNet ('QuartzNet: Deep Automatic Speech Recognition with 1D Time-Channel Separable Convolutions')

posted @ 2022-05-12 15:35  Xu_Lin  阅读(88)  评论(0编辑  收藏  举报