基于COCO数据集验证的目标检测算法天梯排行榜

AP50

Rank	Model	box AP	AP50	Paper	Code	Year	Tags
1	SwinV2-G (HTC++)	63.1		Swin Transformer V2: Scaling Up Capacity and Resolution	Link	2021	Swin-Transformer
2	Florence-CoSwin-H	62.4		Florence: A New Foundation Model for Computer Vision		2021	Swin-Transformer
3	GLIP (Swin-L, multi-scale)	61.5	79.5	Grounded Language-Image Pre-training		2021	multiscale; Vision Language; Dynamic Head; BERT-Base
4	Soft Teacher + Swin-L (HTC++, multi-scale)	61.3		End-to-End Semi-Supervised Object Detection with Soft Teacher		2021	multiscale; Swin-Transformer
5	DyHead (Swin-L, multi scale, self-training)	60.6	78.5	Dynamic Head: Unifying Object Detection Heads with Attentions		2021	multiscale; Swin-Transformer
6	Dual-Swin-L (HTC, multi-scale)	60.1		CBNetV2: A Composite Backbone Network Architecture for Object Detection		2021	multiscale Swin-Transformer
7	Dual-Swin-L (HTC, single-scale)	59.4		CBNetV2: A Composite Backbone Network Architecture for Object Detection		2021	Swin-Transformer
8	Focal-L (DyHead, multi-scale)	58.9		Focal Self-attention for Local-Global Interactions in Vision Transformers		2021	multiscale Focal-Transformer
9	DyHead (Swin-L, multi scale)	58.7	77.1	Dynamic Head: Unifying Object Detection Heads with Attentions		2021	multiscale Swin-Transformer
10	Swin-L (HTC++, multi scale)	58.7		Swin Transformer: Hierarchical Vision Transformer using Shifted Windows		2021	multiscale Swin-Transformer
11	Focal-L (HTC++, multi-scale)	58.4		Focal Self-attention for Local-Global Interactions in Vision Transformers		2021	multiscale
12	Swin-L (HTC++, single scale)	57.7		Swin Transformer: Hierarchical Vision Transformer using Shifted Windows		2021	single scale Swin-Transformer
13	YOLOR-D6 (1280, single-scale, 34 fps)	57.3	75.0	You Only Learn One Representation: Unified Network for Multiple Tasks		2021	single scale YOLO
14	SOLQ (Swin-L, single)	56.5		SOLQ: Segmenting Objects by Learning Queries		2021	Transformer single scale
15	YOLOR-E6 (1280, single-scale, 45 fps)	56.4	74.1	You Only Learn One Representation: Unified Network for Multiple Tasks		2021	single scale YOLO
16	CenterNet2 (Res2Net-101-DCN-BiFPN, self-training, 1560 single-scale)	56.4	74.0	Probabilistic two-stage detection		2021	single scale FPN DCN
17	QueryInst (single-scale)	56.1	75.9	Instances as Queries		2021
18	YOLOv4-P7 with TTA	55.8	73.2	Scaled-YOLOv4: Scaling Cross Stage Partial Network		2020	multiscale YOLO
19	DetectoRS (ResNeXt-101-64x4d, multi-scale)	55.7	74.2	DetectoRS: Detecting Objects with Recursive Feature Pyramid and Switchable Atrous Convolution		2020	ResNeXt multiscale
20	YOLOR-W6 (1280, single-scale, 66 fps)	55.5	73.2	You Only Learn One Representation: Unified Network for Multiple Tasks		2021	single scale YOLO
21	YOLOv4-P7 CSP-P7 (single-scale, 16 fps)	55.4	73.3	Scaled-YOLOv4: Scaling Cross Stage Partial Network		2020	single scale YOLO
22	CSP-p6 + Mish (multi-scale)	55.2	72.9	Mish: A Self Regularized Non-Monotonic Activation Function		2019	multiscale
23	YOLOv4-P6 with TTA	54.9	72.6	Scaled-YOLOv4: Scaling Cross Stage Partial Network		2020	multiscale YOLO
24	Cascade Eff-B7 NAS-FPN (1280)	54.8		Simple Copy-Paste is a Strong Data Augmentation Method for Instance Segmentation		2020	single scale NAS-FPN
25	DetectoRS (ResNeXt-101-32x4d, multi-scale)	54.7	73.5	DetectoRS: Detecting Objects with Recursive Feature Pyramid and Switchable Atrous Convolution		2020	ResNeXt multiscale
26	YOLOv4-P6 CSP-P6 (single-scale, 32 fps)	54.3	72.3	Scaled-YOLOv4: Scaling Cross Stage Partial Network		2020	single scale YOLO
27	SpineNet-190 (1280, with Self-training on OpenImages, single-scale)	54.3		Rethinking Pre-training and Self-training		2020	single scale
28	UniverseNet-20.08d (Res2Net-101, DCN, multi-scale)	54.1	71.6	USB: Universal-Scale Object Detection Benchmark		2021	multiscale DCN
29	EfficientDet-D7 (single-scale)	53.7	72.4	EfficientDet: Scalable and Efficient Object Detection		2019	single scale
30	PAA (ResNext-152-32x8d + DCN, multi-scale)	53.5	71.6	Probabilistic Anchor Assignment with IoU Prediction for Object Detection		2020	ResNeXt multiscale DCN
31	LSNet (Res2Net-101+ DCN, multi-scale)	53.5	71.1	Location-Sensitive Visual Recognition with Cross-IOU Loss		2021	multiscale DCN
32	ResNeSt-200 (multi-scale)	53.3	72.0	ResNeSt: Split-Attention Networks		2020	multiscale
33	Cascade Mask R-CNN (Triple-ResNeXt152, multi-scale)	53.3	71.9	CBNet: A Novel Composite Backbone Network Architecture for Object Detection		2019	multiscale
34	DetectoRS (ResNeXt-101-32x4d, single-scale)	53.3	71.6	DetectoRS: Detecting Objects with Recursive Feature Pyramid and Switchable Atrous Convolution		2020	ResNeXt single scale
35	GFLV2 (Res2Net-101, DCN, multiscale)	53.3	70.9	Generalized Focal Loss V2: Learning Reliable Localization Quality Estimation for Dense Object Detection		2020	multiscale DCN
36	RelationNet++ (ResNeXt-64x4d-101-DCN)	52.7		RelationNet++: Bridging Visual Representations for Object Detection via Transformer Decoder		2020	ResNeXt DCN
37	YOLOv4-P5 with TTA	52.5	70.3	Scaled-YOLOv4: Scaling Cross Stage Partial Network		2020	multiscale YOLO
38	Deformable DETR (ResNeXt-101+DCN)	52.3	71.9	Deformable DETR: Deformable Transformers for End-to-End Object Detection		2020	ResNeXt DCN
39	GCNet (ResNeXt-101 + DCN + cascade + GC r4)	52.3	70.9	Global Context Networks		2020	ResNeXt DCN GCN
40	RetinaNet (SpineNet-190, 1280x1280)	52.1	71.8	SpineNet: Learning Scale-Permuted Backbone for Recognition and Localization		2019
41	RepPoints v2 (ResNeXt-101, DCN, multi-scale)	52.1	70.1	RepPoints V2: Verification Meets Regression for Object Detection		2020	ResNeXt; multiscale DCN


42	AC-FPN Cascade R-CNN (X-152-32x8d-FPN-IN5k, multi scale, only CEM)	51.9	70.4	Attention-guided Context Feature Pyramid Network for Object Detection		2020	ResNeXt multiscale FPN
43	OTA (ResNeXt-101+DCN, multiscale)	51.5	68.6	OTA: Optimal Transport Assignment for Object Detection		2021
44	UniverseNet-20.08d (Res2Net-101, DCN, single-scale)	51.3	70.0	USB: Universal-Scale Object Detection Benchmark		2021	single scale DCN
45	TSD (SENet154-DCN,multi-scale)	51.2	71.9	Revisiting the Sibling Head in Object Detector		2020	multiscale DCN
46	YOLOX-X (Modified CSP v5)	51.2	69.6	YOLOX: Exceeding YOLO Series in 2021		2021	YOLO
47	RetinaNet (SpineNet-143, 1280x1280)	50.7	70.4	SpineNet: Learning Scale-Permuted Backbone for Recognition and Localization		2019
48	ATSS (ResNetXt-64x4d-101+DCN,multi-scale)	50.7	68.9	Bridging the Gap Between Anchor-based and Anchor-free Detection via Adaptive Training Sample Selection		2019	ResNeXt multiscale DCN
49	NAS-FPN (AmoebaNet-D, learned aug)	50.7		Learning Data Augmentation Strategies for Object Detection		2019	FPN
50	GFLV2 (Res2Net-101, DCN)	50.6	69	Generalized Focal Loss V2: Learning Reliable Localization Quality Estimation for Dense Object Detection		2020	DCN
51	aLRP Loss (ResNext-101-64x4d, DCN, multiscale test)	50.2	70.3	A Ranking-based, Balanced Loss Function Unifying Classification and Localisation in Object Detection		2020	ResNeXt multiscale DCN
52	FreeAnchor + SEPC (DCN, ResNext-101-64x4d)	50.1	69.8	Scale-Equalizing Pyramid Convolution for Object Detection		2020	ResNeXt DCN
53	D2Det (ResNet-101-DCN, multi-scale test)	50.1	69.4	D2Det: Towards High Quality Object Detection and Instance Segmentation		2020	multiscale DCN ResNet
54	Dynamic R-CNN (ResNet-101-DCN, multi-scale)	50.1	68.3	Dynamic R-CNN: Towards High Quality Object Detection via Dynamic Training		2020	multiscale DCN ResNet
55	TSD (ResNet-101-Deformable, Image Pyramid)	49.4	69.6	Revisiting the Sibling Head in Object Detector		2020	ResNet
56	RepPoints v2 (ResNeXt-101, DCN)	49.4	68.9	RepPoints V2: Verification Meets Regression for Object Detection		2020	ResNeXt DCN
57	CPNDet (Hourglass-104, multi-scale)	49.2	67.3	Corner Proposal Network for Anchor-free, Two-stage Object Detection		2020	multiscale
58	GFLV2 (ResNeXt-101, 32x4d, DCN)	49	67.6	Generalized Focal Loss V2: Learning Reliable Localization Quality Estimation for Dense Object Detection		2020	ResNeXt DCN
59	aLRP Loss (ResNext-101-64x4d, DCN, single scale)	48.9	69.3	A Ranking-based, Balanced Loss Function Unifying Classification and Localisation in Object Detection		2020	ResNeXt single scale DCN
60	UniverseNet-20.08 (Res2Net-50, DCN, single-scale)	48.8	67.5	USB: Universal-Scale Object Detection Benchmark		2021	single scale DCN
61	SOLQ (ResNet101, single scale)	48.7		SOLQ: Segmenting Objects by Learning Queries		2021	Transformer single scale
62	RetinaNet (SpineNet-96, 1024x1024)	48.6	68.4	SpineNet: Learning Scale-Permuted Backbone for Recognition and Localization		2019
63	TridentNet (ResNet-101-Deformable, Image Pyramid)	48.4	69.7	Scale-Aware Trident Networks for Object Detection		2019	ResNet
64	GCNet (ResNeXt-101 + DCN + cascade + GC r4)	48.4	67.6	GCNet: Non-local Networks Meet Squeeze-Excitation Networks and Beyond		2019	ResNeXt DCN GCN
65	GFLV2 (ResNet-101-DCN)	48.3	66.5	Generalized Focal Loss V2: Learning Reliable Localization Quality Estimation for Dense Object Detection		2020	DCN ResNet
66	GFL (X-101-32x4d-DCN, single-scale)	48.2	67.4	Generalized Focal Loss: Learning Qualified and Distributed Bounding Boxes for Dense Object Detection		2020	ResNeXt single scale DCN
67	ISTR (ResNet101-FPN-3x, single-scale)	48.1		ISTR: End-to-End Instance Segmentation with Transformers		2021
68	aLRP Loss (ResNext-101-64x4d, single scale)	47.8	68.4	A Ranking-based, Balanced Loss Function Unifying Classification and Localisation in Object Detection		2020	ResNeXt single scale
69	MatrixNet Corners (ResNet-152, multi-scale)	47.8	66.2	Matrix Nets: A New Deep Architecture for Object Detection		2019	multiscale ResNet
70	SOLQ (ResNet50, single scale)	47.8		SOLQ: Segmenting Objects by Learning Queries		2021	Transformer single scale
71	SAPD (ResNeXt-101, single-scale)	47.4	67.4	Soft Anchor-Point Object Detection		2019	ResNeXt single scale
72	PANet (ResNeXt-101, multi-scale)	47.4	67.2	Path Aggregation Network for Instance Segmentation		2018	ResNeXt multiscale
73	HTC (HRNetV2p-W48)	47.3	65.9	Deep High-Resolution Representation Learning for Visual Recognition		2019
74	HTC (ResNeXt-101-FPN)	47.1	63.9	Hybrid Task Cascade for Instance Segmentation		2019	ResNeXt FPN
75	CenterNet511 (Hourglass-104, multi-scale)	47.0	64.5	CenterNet: Keypoint Triplets for Object Detection		2019	multiscale
76	MAL (ResNeXt101, multi-scale)	47.0		Multiple Anchor Learning for Visual Object Detection		2019	ResNeXt multiscale
77	ISTR (ResNet50-FPN-3x)	46.8		ISTR: End-to-End Instance Segmentation with Transformers		2021	FPN ResNet
78	RetinaNet (SpineNet-49, 896x896)	46.7	66.3	SpineNet: Learning Scale-Permuted Backbone for Recognition and Localization		2019
79	RPDet (ResNet-101-DCN, multi-scale)	46.5	67.4	RepPoints: Point Set Representation for Object Detection		2019	multiscale DCN ResNet
80	HoughNet (MS)	46.4	65.1	HoughNet: Integrating near and long-range evidence for bottom-up object detection		2020	multiscale
81	PPDet (ResNeXt-101-FPN, multiscale)	46.3	64.8	Reducing Label Noise in Anchor-Free Object Detection		2020	ResNeXt multiscale FPN
82	GFLV2 (ResNet-101)	46.2	64.3	Generalized Focal Loss V2: Learning Reliable Localization Quality Estimation for Dense Object Detection		2020	ResNet
83	SNIPER (ResNet-101)	46.1	67.0	SNIPER: Efficient Multi-Scale Training		2018	ResNet
84	Mask R-CNN (HRNetV2p-W48 + cascade)	46.1	64.0	Deep High-Resolution Representation Learning for Visual Recognition		2019
85	DCNv2 (ResNet-101, multi-scale)	46.0	67.9	Deformable ConvNets v2: More Deformable, Better Results		2018	multiscale DCN ResNet
86	Gaussian-FCOS	46		Localization Uncertainty Estimation for Anchor-Free Object Detection		2020
87	Cascade R-CNN-FPN (ResNet-101, map-guided)	45.9	64.2	InstaBoost: Boosting Instance Segmentation via Probability Map Guided Copy-Pasting		2019	FPN ResNet
88	MAL (ResNeXt101, single-scale)	45.9		Multiple Anchor Learning for Visual Object Detection		2019	ResNeXt single scale
89	CenterMask+VoVNetV2-99 (single-scale)	45.8	64.5	CenterMask : Real-Time Anchor-Free Instance Segmentation		2019	single scale
90	D-RFCN + SNIP (DPN-98 with flip, multi-scale)	45.7	67.3	An Analysis of Scale Invariance in Object Detection - SNIP		2017	multiscale
91	YOLOv4 (CD53)	45.5	64.1	Scaled-YOLOv4: Scaling Cross Stage Partial Network		2020	single scale YOLO
92	PP-YOLO (608x608)	45.2	65.2	PP-YOLO: An Effective and Efficient Implementation of Object Detector		2020	YOLO
93	AC-FPN Cascade R-CNN (ResNet-101, single scale)	45	64.4	Attention-guided Context Feature Pyramid Network for Object Detection		2019	single scale FPN ResNet
94	FreeAnchor (ResNeXt-101)	44.8	64.3	FreeAnchor: Learning to Match Anchors for Visual Object Detection		2019	ResNeXt
95	FCOS (ResNeXt-64x4d-101-FPN 4 + improvements)	44.7	64.1	FCOS: Fully Convolutional One-Stage Object Detection		2019	ResNeXt FPN
96	CenterMask+VoVNet2-57 (single-scale)	44.7	63.1	CenterMask : Real-Time Anchor-Free Instance Segmentation		2019	single scale
97	FSAF (ResNeXt-101, multi-scale)	44.6	65.2	Feature Selective Anchor-Free Module for Single-Shot Object Detection		2019	ResNeXt multiscale
98	aLRP Loss (ResNext-101, DCN, 500 scale)	44.6	65.0	A Ranking-based, Balanced Loss Function Unifying Classification and Localisation in Object Detection		2020	ResNeXt DCN
99	CenterMask + X-101-32x8d (single-scale)	44.6	63.4	CenterMask : Real-Time Anchor-Free Instance Segmentation		2019	single scale
100	RetinaNet (SpineNet-49, 640x640)	44.3	63.8	SpineNet: Learning Scale-Permuted Backbone for Recognition and Localization		2019
101	YOLOF-DC5	44.3	62.9	You Only Look One-level Feature		2021	YOLO
102	GFLV2 (ResNet-50)	44.3	62.3	Generalized Focal Loss V2: Learning Reliable Localization Quality Estimation for Dense Object Detection		2020	ResNet
103	InterNet (ResNet-101-FPN, multi-scale)	44.2	67.5	Feature Intertwiner for Object Detection		2019	multiscale FPN ResNet
104	M2Det (VGG-16, multi-scale)	44.2	64.6	M2Det: A Single-Shot Object Detector based on Multi-Level Feature Pyramid Network		2018	multiscale
105	Faster R-CNN (LIP-ResNet-101-MD w FPN)	43.9	65.7	LIP: Local Importance-based Pooling		2019	FPN
106	M2Det (ResNet-101, multi-scale)	43.9	64.4	M2Det: A Single-Shot Object Detector based on Multi-Level Feature Pyramid Network		2018	multiscale ResNet
107	YOLOv3 @800 + ASFF* (Darknet-53)	43.9	64.1	Learning Spatial Fusion for Single-Shot Object Detection		2019	YOLO
108	FoveaBox (ResNeXt-101)	43.9	63.5	FoveaBox: Beyond Anchor-based Object Detector		2019	ResNeXt
109	ExtremeNet (Hourglass-104, multi-scale)	43.7	60.5	Bottom-up Object Detection by Grouping Extreme and Center Points		2019	multiscale
110	YOLOv4-608	43.5	65.7	YOLOv4: Optimal Speed and Accuracy of Object Detection		2020	single scale YOLO
111	SNIPER (ResNet-50)	43.5	65.0	SNIPER: Efficient Multi-Scale Training		2018	ResNet
112	CenterNet (HRNetV2-W48)	43.5		Deep High-Resolution Representation Learning for Visual Recognition		2019
113	D-RFCN + SNIP (ResNet-101, multi-scale)	43.4	65.5	An Analysis of Scale Invariance in Object Detection - SNIP		2017	multiscale ResNet
114	Grid R-CNN (ResNeXt-101-FPN)	43.2	63.0	Grid R-CNN		2018	ResNeXt FPN
115	FCOS (ResNeXt-101-64x4d-FPN)	43.2	62.8	FCOS: Fully Convolutional One-Stage Object Detection		2019	ResNeXt FPN
116	CornerNet-Saccade (Hourglass-104, multi-scale)	43.2		CornerNet-Lite: Efficient Keypoint Based Object Detection		2019	multiscale
117	Libra R-CNN (ResNeXt-101-FPN)	43.0	64	Libra R-CNN: Towards Balanced Learning for Object Detection		2019	ResNeXt FPN
118	RPDet (ResNet-101-DCN)	42.8	65.0	RepPoints: Point Set Representation for Object Detection		2019	DCN ResNet
119	SpineNet-49 (640, RetinaNet, single-scale)	42.8	62.3	SpineNet: Learning Scale-Permuted Backbone for Recognition and Localization		2019	single scale
120	Cascade R-CNN (ResNet-101-FPN+, cascade)	42.8	62.1	Cascade R-CNN: Delving into High Quality Object Detection		2017	FPN ResNet
121	Cascade R-CNN	42.8	62.1	Cascade R-CNN: High Quality Object Detection and Instance Segmentation		2019
122	TridentNet (ResNet-101)	42.7	63.6	Scale-Aware Trident Networks for Object Detection		2019	ResNet
123	FCOS (ResNeXt-32x8d-101-FPN)	42.7	62.2	FCOS: Fully Convolutional One-Stage Object Detection		2019	ResNeXt FPN
124	RetinaMask (ResNeXt-101-FPN-GN)	42.6	62.5	RetinaMask: Learning to predict masks improves state-of-the-art single-shot detection for free		2019	ResNeXt FPN
125	TAL + TAP	42.5	60.3	TOOD: Task-aligned One-stage Object Detection		2021
126	Faster R-CNN (HRNetV2p-W48)	42.4	63.6	Deep High-Resolution Representation Learning for Visual Recognition		2019
127	HSD (Rest101, 768x768, single-scale test)	42.3	61.2	Hierarchical Shot Detector		2019	single scale
128	CornerNet511 (Hourglass-104, multi-scale)	42.1	57.8	CornerNet: Detecting Objects as Paired Keypoints		2018	multiscale
129	FoveaBox (ResNeXt-101)	42.1		FoveaBox: Beyond Anchor-based Object Detector		2019	ResNeXt
130	FCOS (HRNet-W32-5l)	42.0	60.4	FCOS: Fully Convolutional One-Stage Object Detection		2019
131	RefineDet512+ (ResNet-101)	41.8	62.9	Single-Shot Refinement Neural Network for Object Detection		2017	ResNet
132	GHM-C + GHM-R (RetinaNet-FPN-ResNeXt-101)	41.6	62.8	Gradient Harmonized Single-stage Detector		2018	FPN
133	CenterNet-DLA (DLA-34, multi-scale)	41.6		Objects as Points		2019	multiscale
134	RetinaNet (SpineNet-49S, 640x640)	41.5	60.5	SpineNet: Learning Scale-Permuted Backbone for Recognition and Localization		2019
135	RPDet (ResNet-101)	41	62.9	RepPoints: Point Set Representation for Object Detection		2019	ResNet
136	M2Det (VGG-16, single-scale)	41.0	59.7	M2Det: A Single-Shot Object Detector based on Multi-Level Feature Pyramid Network		2018	single scale
137	FSAF (ResNet-101, single-scale)	40.9	61.5	Feature Selective Anchor-Free Module for Single-Shot Object Detection		2019	single scale ResNet
138	RetinaNet (ResNeXt-101-FPN)	40.8	61.1	Focal Loss for Dense Object Detection		2017	ResNeXt FPN
139	Cascade R-CNN (ResNet-50-FPN+, cascade)	40.6	59.9	Cascade R-CNN: Delving into High Quality Object Detection		2017	FPN ResNet
140	Faster R-CNN (Cascade RPN)	40.6	58.9	Cascade RPN: Delving into High-Quality Region Proposal Network with Adaptive Convolution		2019
141	ResNet-50-DW-DPN (Deformable Kernels)	40.6		Deformable Kernels: Adapting Effective Receptive Fields for Object Deformation		2019	ResNet
142	IoU-Net	40.6		Acquisition of Localization Confidence for Accurate Object Detection		2018
143	FCOS (HRNetV2p-W48)	40.5	59.3	Deep High-Resolution Representation Learning for Visual Recognition		2019
144	ResNet-50-FPN Mask R-CNN + KL Loss + var voting + soft-NMS	40.4		Bounding Box Regression with Uncertainty for Accurate Object Detection		2018	FPN ResNet
145	RDSNet (ResNet-101, RetinaNet, mask, MBRM)	40.3	60.1	RDSNet: A New Deep Architecture for Reciprocal Object Detection and Instance Segmentation		2019	ResNet
146	ExtremeNet (Hourglass-104, single-scale)	40.2	55.5	Bottom-up Object Detection by Grouping Extreme and Center Points		2019	single scale
147	Mask R-CNN (ResNet-101-FPN, CBN)	40.1	60.5	Cross-Iteration Batch Normalization		2020	FPN ResNet
148	Fast R-CNN (Cascade RPN)	40.1	59.4	Cascade RPN: Delving into High-Quality Region Proposal Network with Adaptive Convolution		2019
149	Mask R-CNN (ResNeXt-101-FPN)	39.8	62.3	Mask R-CNN		2017	ResNeXt FPN
150	GA-Faster-RCNN	39.8	59.2	Region Proposal by Guided Anchoring		2019
151	FPN (ResNet101 backbone)	39.5		ChainerCV: a Library for Deep Learning in Computer Vision		2017	FPN ResNet
152	RetinaMask (ResNet-50-FPN)	39.4	58.6	RetinaMask: Learning to predict masks improves state-of-the-art single-shot detection for free		2019	FPN ResNet
153	PP-YOLO (320x320)	39.3	59.3	PP-YOLO: An Effective and Efficient Implementation of Object Detector		2020	YOLO
154	AA-ResNet-10 + RetinaNet	39.2		Attention Augmented Convolutional Networks		2019
155	MAL (ResNet50, single-scale)	39.2		Multiple Anchor Learning for Visual Object Detection		2019	single scale ResNet
156	RetinaNet (ResNet-101-FPN)	39.1	59.1	Focal Loss for Dense Object Detection		2017	FPN ResNet
157	Cascade R-CNN (ResNet-101-FPN+)	38.8	61.1	Cascade R-CNN: Delving into High Quality Object Detection		2017	FPN ResNet
158	M2Det (ResNet-101, single-scale)	38.8	59.4	M2Det: A Single-Shot Object Detector based on Multi-Level Feature Pyramid Network		2018	single scale ResNet
159	SaccadeNet (DLA-34-DCN)	38.5	55.6	SaccadeNet: A Fast and Accurate Object Detector		2020	DCN
160	Mask R-CNN (ResNet-101-FPN)	38.2	60.3	Mask R-CNN		2017	FPN ResNet
161	WSMA-Seg	38.1		Segmentation is All You Need		2019
162	Faster R-CNN + FPN + CGD	37.9		Compact Global Descriptor for Neural Networks		2019	FPN
163	CornerNet511 (Hourglass-52, single-scale)	37.8	53.7	CornerNet: Detecting Objects as Paired Keypoints		2018	single scale
164	RefineDet512+ (VGG-16)	37.6	58.7	Single-Shot Refinement Neural Network for Object Detection		2017
165	DeformConv-R-FCN (Aligned-Inception-ResNet)	37.5	58.0	Deformable Convolutional Networks		2017
166	Faster R-CNN (ImageNet+300M)	37.4	58	Revisiting Unreasonable Effectiveness of Data in Deep Learning Era		2017
167	Mask R-CNN (Bottleneck-injected ResNet-50, FPN)	36.9		torchdistill: A Modular, Configuration-Driven Framework for Knowledge Distillation		2020	FPN ResNet
168	Faster R-CNN + TDM	36.8		Beyond Skip Connections: Top-Down Modulation for Object Detection		2016
169	Cascade R-CNN (ResNet-50-FPN+)	36.5	59	Cascade R-CNN: Delving into High Quality Object Detection		2017	FPN; ResNet
170	RefineDet512 (ResNet-101)	36.4	57.5	Single-Shot Refinement Neural Network for Object Detection		2017	ResNet
171	Faster R-CNN + FPN	36.2		Feature Pyramid Networks for Object Detection		2016	FPN
172	Faster R-CNN (Bottleneck-injected ResNet-50 and FPN)	35.9		torchdistill: A Modular, Configuration-Driven Framework for Knowledge Distillation		2020	FPN; ResNet
173	Faster R-CNN (box refinement, context, multi-scale testing)	34.9		Deep Residual Learning for Image Recognition		2015	multiscale
174	Faster R-CNN	34.7		Speed/accuracy trade-offs for modern convolutional object detectors		2016
175	CornerNet-Squeeze	34.4		CornerNet-Lite: Efficient Keypoint Based Object Detection		2019
176	MultiPath Network	33.2		A MultiPath Network for Object Detection		2016
177	ION	33.1	55.7	Inside-Outside Net: Detecting Objects in Context with Skip Pooling and Recurrent Neural Networks		2015
178	RefineDet512 (VGG-16)	33	54.5	Single-Shot Refinement Neural Network for Object Detection		2017
179	YOLOv3 + Darknet-53	33.0		YOLOv3: An Incremental Improvement		2018	YOLO
180	SSD512	28.8	48.5	SSD: Single Shot MultiBox Detector		2015
181	MnasFPN (MobileNetV2)	26.1		MnasFPN: Learning Latency-aware Pyramid Architecture for Object Detection on Mobile Devices		2019	FPN
182	ESPNetv2-512	26.0		ESPNetv2: A Light-weight, Power Efficient, and General Purpose Convolutional Neural Network		2018
183	MnasFPN (MobileNetV3)	25.5		MnasFPN: Learning Latency-aware Pyramid Architecture for Object Detection on Mobile Devices		2019	FPN
184	MnasFPN (MNASNet-B1)	24.6		MnasFPN: Learning Latency-aware Pyramid Architecture for Object Detection on Mobile Devices		2019	FPN
185	MnasFPN x0.7 (MobileNetV2)	23.8		MnasFPN: Learning Latency-aware Pyramid Architecture for Object Detection on Mobile Devices		2019	FPN
186	MobielNet-v1-SSD-300x300+CGD	21.4		Compact Global Descriptor for Neural Networks		2019
187	Fast-RCNN	19.7		Fast R-CNN		2015
188	MobileNet	19.3		MobileNets: Efficient Convolutional Neural Networks for Mobile Vision Applications		2017
189	DAT-S (RetinaNet)		69.6	Vision Transformer with Deformable Attention		2022
190	CenterMask-VoVNet99 (multi-scale)		68.3	CenterMask : Real-Time Anchor-Free Instance Segmentation		2019	multiscale
191	Mask R-CNN (HRNetV2p-W32 + cascade)		62.5	Deep High-Resolution Representation Learning for Visual Recognition		2019
192	FoveaBox (ResNeXt-101)		61.9	FoveaBox: Beyond Anchor-based Object Detector		2019	ResNeXt
193	VirTex Mask R-CNN (ResNet-50-FPN)		61.7	VirTex: Learning Visual Representations from Textual Annotations		2020	FPN; ResNet
194	Centermask + ResNet101		61.6	CenterMask : Real-Time Anchor-Free Instance Segmentation		2019	ResNet
195	PAFNet (ResNet50-vd)		59.8	PAFNet: An Efficient Anchor-Free Object Detector Guidance		2021	ResNet
196	IoU-Net+EnergyRegression		58.5	Energy-Based Models for Deep Probabilistic Regression		2019
197	Cascade R-CNN (HRNetV2p-W48)			Deep High-Resolution Representation Learning for Visual Recognition		2019
198	ISTR (ResNet50-FPN-3x, single-scale)			ISTR: End-to-End Instance Segmentation with Transformers		2021
199	FoveaBox (ResNeXt-101)			FoveaBox: Beyond Anchor-based Object Detector		2019	ResNeXt
200	EfficientDet-D7x (single-scale)			EfficientDet: Scalable and Efficient Object Detection		2019	single scale

AP75

Rank	Model	box AP	AP75	Paper	Code	Year	Tags
1	SwinV2-G (HTC++)	63.1		Swin Transformer V2: Scaling Up Capacity and Resolution	Link	2021	Swin-Transformer
2	Florence-CoSwin-H	62.4		Florence: A New Foundation Model for Computer Vision		2021	Swin-Transformer
3	GLIP (Swin-L, multi-scale)	61.5	67.7	Grounded Language-Image Pre-training		2021	multiscale; Vision Language; Dynamic Head; BERT-Base
4	Soft Teacher + Swin-L (HTC++, multi-scale)	61.3		End-to-End Semi-Supervised Object Detection with Soft Teacher		2021	multiscale; Swin-Transformer
5	DyHead (Swin-L, multi scale, self-training)	60.6	66.6	Dynamic Head: Unifying Object Detection Heads with Attentions		2021	multiscale; Swin-Transformer
6	Dual-Swin-L (HTC, multi-scale)	60.1		CBNetV2: A Composite Backbone Network Architecture for Object Detection		2021	multiscale Swin-Transformer
7	Dual-Swin-L (HTC, single-scale)	59.4		CBNetV2: A Composite Backbone Network Architecture for Object Detection		2021	Swin-Transformer
8	Focal-L (DyHead, multi-scale)	58.9		Focal Self-attention for Local-Global Interactions in Vision Transformers		2021	multiscale Focal-Transformer
9	DyHead (Swin-L, multi scale)	58.7	64.5	Dynamic Head: Unifying Object Detection Heads with Attentions		2021	multiscale Swin-Transformer
10	Swin-L (HTC++, multi scale)	58.7		Swin Transformer: Hierarchical Vision Transformer using Shifted Windows		2021	multiscale Swin-Transformer
11	Focal-L (HTC++, multi-scale)	58.4		Focal Self-attention for Local-Global Interactions in Vision Transformers		2021	multiscale
12	Swin-L (HTC++, single scale)	57.7		Swin Transformer: Hierarchical Vision Transformer using Shifted Windows		2021	single scale Swin-Transformer
13	YOLOR-D6 (1280, single-scale, 34 fps)	57.3	62.7	You Only Learn One Representation: Unified Network for Multiple Tasks		2021	single scale YOLO
14	SOLQ (Swin-L, single)	56.5		SOLQ: Segmenting Objects by Learning Queries		2021	Transformer single scale
15	YOLOR-E6 (1280, single-scale, 45 fps)	56.4	61.6	You Only Learn One Representation: Unified Network for Multiple Tasks		2021	single scale YOLO
16	CenterNet2 (Res2Net-101-DCN-BiFPN, self-training, 1560 single-scale)	56.4	61.6	Probabilistic two-stage detection		2021	single scale FPN DCN
17	QueryInst (single-scale)	56.1	61.9	Instances as Queries		2021
18	YOLOv4-P7 with TTA	55.8	61.2	Scaled-YOLOv4: Scaling Cross Stage Partial Network		2020	multiscale YOLO
19	DetectoRS (ResNeXt-101-64x4d, multi-scale)	55.7	61.1	DetectoRS: Detecting Objects with Recursive Feature Pyramid and Switchable Atrous Convolution		2020	ResNeXt multiscale
20	YOLOR-W6 (1280, single-scale, 66 fps)	55.5	60.6	You Only Learn One Representation: Unified Network for Multiple Tasks		2021	single scale YOLO
21	YOLOv4-P7 CSP-P7 (single-scale, 16 fps)	55.4	60.7	Scaled-YOLOv4: Scaling Cross Stage Partial Network		2020	single scale YOLO
22	CSP-p6 + Mish (multi-scale)	55.2	60.5	Mish: A Self Regularized Non-Monotonic Activation Function		2019	multiscale
23	YOLOv4-P6 with TTA	54.9	60.2	Scaled-YOLOv4: Scaling Cross Stage Partial Network		2020	multiscale YOLO
24	Cascade Eff-B7 NAS-FPN (1280)	54.8		Simple Copy-Paste is a Strong Data Augmentation Method for Instance Segmentation		2020	single scale NAS-FPN
25	DetectoRS (ResNeXt-101-32x4d, multi-scale)	54.7	60.1	DetectoRS: Detecting Objects with Recursive Feature Pyramid and Switchable Atrous Convolution		2020	ResNeXt multiscale
26	YOLOv4-P6 CSP-P6 (single-scale, 32 fps)	54.3	59.5	Scaled-YOLOv4: Scaling Cross Stage Partial Network		2020	single scale YOLO
27	SpineNet-190 (1280, with Self-training on OpenImages, single-scale)	54.3		Rethinking Pre-training and Self-training		2020	single scale
28	UniverseNet-20.08d (Res2Net-101, DCN, multi-scale)	54.1	59.9	USB: Universal-Scale Object Detection Benchmark		2021	multiscale DCN
29	EfficientDet-D7 (single-scale)	53.7		EfficientDet: Scalable and Efficient Object Detection		2019	single scale
30	PAA (ResNext-152-32x8d + DCN, multi-scale)	53.5	59.1	Probabilistic Anchor Assignment with IoU Prediction for Object Detection		2020	ResNeXt multiscale DCN
31	LSNet (Res2Net-101+ DCN, multi-scale)	53.5	59.2	Location-Sensitive Visual Recognition with Cross-IOU Loss		2021	multiscale DCN
32	ResNeSt-200 (multi-scale)	53.3	58.0	ResNeSt: Split-Attention Networks		2020	multiscale
33	Cascade Mask R-CNN (Triple-ResNeXt152, multi-scale)	53.3	58.5	CBNet: A Novel Composite Backbone Network Architecture for Object Detection		2019	multiscale
34	DetectoRS (ResNeXt-101-32x4d, single-scale)	53.3	58.5	DetectoRS: Detecting Objects with Recursive Feature Pyramid and Switchable Atrous Convolution		2020	ResNeXt single scale
35	GFLV2 (Res2Net-101, DCN, multiscale)	53.3	59.2	Generalized Focal Loss V2: Learning Reliable Localization Quality Estimation for Dense Object Detection		2020	multiscale DCN
36	RelationNet++ (ResNeXt-64x4d-101-DCN)	52.7		RelationNet++: Bridging Visual Representations for Object Detection via Transformer Decoder		2020	ResNeXt DCN
37	YOLOv4-P5 with TTA	52.5	58	Scaled-YOLOv4: Scaling Cross Stage Partial Network		2020	multiscale YOLO
38	Deformable DETR (ResNeXt-101+DCN)	52.3	58.1	Deformable DETR: Deformable Transformers for End-to-End Object Detection		2020	ResNeXt DCN
39	GCNet (ResNeXt-101 + DCN + cascade + GC r4)	52.3	56.9	Global Context Networks		2020	ResNeXt DCN GCN
40	RetinaNet (SpineNet-190, 1280x1280)	52.1	56.5	SpineNet: Learning Scale-Permuted Backbone for Recognition and Localization		2019
41	RepPoints v2 (ResNeXt-101, DCN, multi-scale)	52.1	57.5	RepPoints V2: Verification Meets Regression for Object Detection		2020	ResNeXt; multiscale DCN


42	AC-FPN Cascade R-CNN (X-152-32x8d-FPN-IN5k, multi scale, only CEM)	51.9	57	Attention-guided Context Feature Pyramid Network for Object Detection		2020	ResNeXt multiscale FPN
43	OTA (ResNeXt-101+DCN, multiscale)	51.5	57.1	OTA: Optimal Transport Assignment for Object Detection		2021
44	UniverseNet-20.08d (Res2Net-101, DCN, single-scale)	51.3	55.8	USB: Universal-Scale Object Detection Benchmark		2021	single scale DCN
45	TSD (SENet154-DCN,multi-scale)	51.2	56.0	Revisiting the Sibling Head in Object Detector		2020	multiscale DCN
46	YOLOX-X (Modified CSP v5)	51.2	55.7	YOLOX: Exceeding YOLO Series in 2021		2021	YOLO
47	RetinaNet (SpineNet-143, 1280x1280)	50.7	54.9	SpineNet: Learning Scale-Permuted Backbone for Recognition and Localization		2019
48	ATSS (ResNetXt-64x4d-101+DCN,multi-scale)	50.7	56.3	Bridging the Gap Between Anchor-based and Anchor-free Detection via Adaptive Training Sample Selection		2019	ResNeXt multiscale DCN
49	NAS-FPN (AmoebaNet-D, learned aug)	50.7		Learning Data Augmentation Strategies for Object Detection		2019	FPN
50	GFLV2 (Res2Net-101, DCN)	50.6	55.3	Generalized Focal Loss V2: Learning Reliable Localization Quality Estimation for Dense Object Detection		2020	DCN
51	aLRP Loss (ResNext-101-64x4d, DCN, multiscale test)	50.2	53.9	A Ranking-based, Balanced Loss Function Unifying Classification and Localisation in Object Detection		2020	ResNeXt multiscale DCN
52	FreeAnchor + SEPC (DCN, ResNext-101-64x4d)	50.1	54.3	Scale-Equalizing Pyramid Convolution for Object Detection		2020	ResNeXt DCN
53	D2Det (ResNet-101-DCN, multi-scale test)	50.1	54.9	D2Det: Towards High Quality Object Detection and Instance Segmentation		2020	multiscale DCN ResNet
54	Dynamic R-CNN (ResNet-101-DCN, multi-scale)	50.1	55.6	Dynamic R-CNN: Towards High Quality Object Detection via Dynamic Training		2020	multiscale DCN ResNet
55	TSD (ResNet-101-Deformable, Image Pyramid)	49.4	54.4	Revisiting the Sibling Head in Object Detector		2020	ResNet
56	RepPoints v2 (ResNeXt-101, DCN)	49.4	53.4	RepPoints V2: Verification Meets Regression for Object Detection		2020	ResNeXt DCN
57	CPNDet (Hourglass-104, multi-scale)	49.2	53.7	Corner Proposal Network for Anchor-free, Two-stage Object Detection		2020	multiscale
58	GFLV2 (ResNeXt-101, 32x4d, DCN)	49	53.5	Generalized Focal Loss V2: Learning Reliable Localization Quality Estimation for Dense Object Detection		2020	ResNeXt DCN
59	aLRP Loss (ResNext-101-64x4d, DCN, single scale)	48.9	52.5	A Ranking-based, Balanced Loss Function Unifying Classification and Localisation in Object Detection		2020	ResNeXt single scale DCN
60	UniverseNet-20.08 (Res2Net-50, DCN, single-scale)	48.8	53.0	USB: Universal-Scale Object Detection Benchmark		2021	single scale DCN
61	SOLQ (ResNet101, single scale)	48.7		SOLQ: Segmenting Objects by Learning Queries		2021	Transformer single scale
62	RetinaNet (SpineNet-96, 1024x1024)	48.6	52.5	SpineNet: Learning Scale-Permuted Backbone for Recognition and Localization		2019
63	TridentNet (ResNet-101-Deformable, Image Pyramid)	48.4	53.5	Scale-Aware Trident Networks for Object Detection		2019	ResNet
64	GCNet (ResNeXt-101 + DCN + cascade + GC r4)	48.4	52.7	GCNet: Non-local Networks Meet Squeeze-Excitation Networks and Beyond		2019	ResNeXt DCN GCN
65	GFLV2 (ResNet-101-DCN)	48.3	52.8	Generalized Focal Loss V2: Learning Reliable Localization Quality Estimation for Dense Object Detection		2020	DCN ResNet
66	GFL (X-101-32x4d-DCN, single-scale)	48.2	52.6	Generalized Focal Loss: Learning Qualified and Distributed Bounding Boxes for Dense Object Detection		2020	ResNeXt single scale DCN
67	ISTR (ResNet101-FPN-3x, single-scale)	48.1		ISTR: End-to-End Instance Segmentation with Transformers		2021
68	aLRP Loss (ResNext-101-64x4d, single scale)	47.8	51.1	A Ranking-based, Balanced Loss Function Unifying Classification and Localisation in Object Detection		2020	ResNeXt single scale
69	MatrixNet Corners (ResNet-152, multi-scale)	47.8	52.3	Matrix Nets: A New Deep Architecture for Object Detection		2019	multiscale ResNet
70	SOLQ (ResNet50, single scale)	47.8		SOLQ: Segmenting Objects by Learning Queries		2021	Transformer single scale
71	SAPD (ResNeXt-101, single-scale)	47.4	51.1	Soft Anchor-Point Object Detection		2019	ResNeXt single scale
72	PANet (ResNeXt-101, multi-scale)	47.4	51.8	Path Aggregation Network for Instance Segmentation		2018	ResNeXt multiscale
73	HTC (HRNetV2p-W48)	47.3	51.2	Deep High-Resolution Representation Learning for Visual Recognition		2019
74	HTC (ResNeXt-101-FPN)	47.1	44.7	Hybrid Task Cascade for Instance Segmentation		2019	ResNeXt FPN
75	CenterNet511 (Hourglass-104, multi-scale)	47.0	50.7	CenterNet: Keypoint Triplets for Object Detection		2019	multiscale
76	MAL (ResNeXt101, multi-scale)	47.0		Multiple Anchor Learning for Visual Object Detection		2019	ResNeXt multiscale
77	ISTR (ResNet50-FPN-3x)	46.8		ISTR: End-to-End Instance Segmentation with Transformers		2021	FPN ResNet
78	RetinaNet (SpineNet-49, 896x896)	46.7	50.6	SpineNet: Learning Scale-Permuted Backbone for Recognition and Localization		2019
79	RPDet (ResNet-101-DCN, multi-scale)	46.5	50.9	RepPoints: Point Set Representation for Object Detection		2019	multiscale DCN ResNet
80	HoughNet (MS)	46.4	50.7	HoughNet: Integrating near and long-range evidence for bottom-up object detection		2020	multiscale
81	PPDet (ResNeXt-101-FPN, multiscale)	46.3	51.6	Reducing Label Noise in Anchor-Free Object Detection		2020	ResNeXt multiscale FPN
82	GFLV2 (ResNet-101)	46.2	50.5	Generalized Focal Loss V2: Learning Reliable Localization Quality Estimation for Dense Object Detection		2020	ResNet
83	SNIPER (ResNet-101)	46.1	51.6	SNIPER: Efficient Multi-Scale Training		2018	ResNet
84	Mask R-CNN (HRNetV2p-W48 + cascade)	46.1	50.3	Deep High-Resolution Representation Learning for Visual Recognition		2019
85	DCNv2 (ResNet-101, multi-scale)	46.0	50.8	Deformable ConvNets v2: More Deformable, Better Results		2018	multiscale DCN ResNet
86	Gaussian-FCOS	46		Localization Uncertainty Estimation for Anchor-Free Object Detection		2020
87	Cascade R-CNN-FPN (ResNet-101, map-guided)	45.9	50	InstaBoost: Boosting Instance Segmentation via Probability Map Guided Copy-Pasting		2019	FPN ResNet
88	MAL (ResNeXt101, single-scale)	45.9		Multiple Anchor Learning for Visual Object Detection		2019	ResNeXt single scale
89	CenterMask+VoVNetV2-99 (single-scale)	45.8		CenterMask : Real-Time Anchor-Free Instance Segmentation		2019	single scale
90	D-RFCN + SNIP (DPN-98 with flip, multi-scale)	45.7	51.1	An Analysis of Scale Invariance in Object Detection - SNIP		2017	multiscale
91	YOLOv4 (CD53)	45.5	49.5	Scaled-YOLOv4: Scaling Cross Stage Partial Network		2020	single scale YOLO
92	PP-YOLO (608x608)	45.2	49.9	PP-YOLO: An Effective and Efficient Implementation of Object Detector		2020	YOLO
93	AC-FPN Cascade R-CNN (ResNet-101, single scale)	45	49	Attention-guided Context Feature Pyramid Network for Object Detection		2019	single scale FPN ResNet
94	FreeAnchor (ResNeXt-101)	44.8	48.4	FreeAnchor: Learning to Match Anchors for Visual Object Detection		2019	ResNeXt
95	FCOS (ResNeXt-64x4d-101-FPN 4 + improvements)	44.7	48.4	FCOS: Fully Convolutional One-Stage Object Detection		2019	ResNeXt FPN
96	CenterMask+VoVNet2-57 (single-scale)	44.7	48.6	CenterMask : Real-Time Anchor-Free Instance Segmentation		2019	single scale
97	FSAF (ResNeXt-101, multi-scale)	44.6	48.6	Feature Selective Anchor-Free Module for Single-Shot Object Detection		2019	ResNeXt multiscale
98	aLRP Loss (ResNext-101, DCN, 500 scale)	44.6	47.5	A Ranking-based, Balanced Loss Function Unifying Classification and Localisation in Object Detection		2020	ResNeXt DCN
99	CenterMask + X-101-32x8d (single-scale)	44.6	48.4	CenterMask : Real-Time Anchor-Free Instance Segmentation		2019	single scale
100	RetinaNet (SpineNet-49, 640x640)	44.3	47.6	SpineNet: Learning Scale-Permuted Backbone for Recognition and Localization		2019
101	YOLOF-DC5	44.3	47.5	You Only Look One-level Feature		2021	YOLO
102	GFLV2 (ResNet-50)	44.3	48.5	Generalized Focal Loss V2: Learning Reliable Localization Quality Estimation for Dense Object Detection		2020	ResNet
103	InterNet (ResNet-101-FPN, multi-scale)	44.2	51.1	Feature Intertwiner for Object Detection		2019	multiscale FPN ResNet
104	M2Det (VGG-16, multi-scale)	44.2	49.3	M2Det: A Single-Shot Object Detector based on Multi-Level Feature Pyramid Network		2018	multiscale
105	Faster R-CNN (LIP-ResNet-101-MD w FPN)	43.9	48.1	LIP: Local Importance-based Pooling		2019	FPN
106	M2Det (ResNet-101, multi-scale)	43.9	48	M2Det: A Single-Shot Object Detector based on Multi-Level Feature Pyramid Network		2018	multiscale ResNet
107	YOLOv3 @800 + ASFF* (Darknet-53)	43.9	49.2	Learning Spatial Fusion for Single-Shot Object Detection		2019	YOLO
108	FoveaBox (ResNeXt-101)	43.9	47.7	FoveaBox: Beyond Anchor-based Object Detector		2019	ResNeXt
109	ExtremeNet (Hourglass-104, multi-scale)	43.7	47.0	Bottom-up Object Detection by Grouping Extreme and Center Points		2019	multiscale
110	YOLOv4-608	43.5	47.3	YOLOv4: Optimal Speed and Accuracy of Object Detection		2020	single scale YOLO
111	SNIPER (ResNet-50)	43.5	48.6	SNIPER: Efficient Multi-Scale Training		2018	ResNet
112	CenterNet (HRNetV2-W48)	43.5	46.5	Deep High-Resolution Representation Learning for Visual Recognition		2019
113	D-RFCN + SNIP (ResNet-101, multi-scale)	43.4	48.4	An Analysis of Scale Invariance in Object Detection - SNIP		2017	multiscale ResNet
114	Grid R-CNN (ResNeXt-101-FPN)	43.2	46.6	Grid R-CNN		2018	ResNeXt FPN
115	FCOS (ResNeXt-101-64x4d-FPN)	43.2	46.6	FCOS: Fully Convolutional One-Stage Object Detection		2019	ResNeXt FPN
116	CornerNet-Saccade (Hourglass-104, multi-scale)	43.2		CornerNet-Lite: Efficient Keypoint Based Object Detection		2019	multiscale
117	Libra R-CNN (ResNeXt-101-FPN)	43.0	47	Libra R-CNN: Towards Balanced Learning for Object Detection		2019	ResNeXt FPN
118	RPDet (ResNet-101-DCN)	42.8	46.3	RepPoints: Point Set Representation for Object Detection		2019	DCN ResNet
119	SpineNet-49 (640, RetinaNet, single-scale)	42.8	46.1	SpineNet: Learning Scale-Permuted Backbone for Recognition and Localization		2019	single scale
120	Cascade R-CNN (ResNet-101-FPN+, cascade)	42.8	46.3	Cascade R-CNN: Delving into High Quality Object Detection		2017	FPN ResNet
121	Cascade R-CNN	42.8	46.3	Cascade R-CNN: High Quality Object Detection and Instance Segmentation		2019
122	TridentNet (ResNet-101)	42.7	46.5	Scale-Aware Trident Networks for Object Detection		2019	ResNet
123	FCOS (ResNeXt-32x8d-101-FPN)	42.7	46.1	FCOS: Fully Convolutional One-Stage Object Detection		2019	ResNeXt FPN
124	RetinaMask (ResNeXt-101-FPN-GN)	42.6	46.0	RetinaMask: Learning to predict masks improves state-of-the-art single-shot detection for free		2019	ResNeXt FPN
125	TAL + TAP	42.5	46.4	TOOD: Task-aligned One-stage Object Detection		2021
126	Faster R-CNN (HRNetV2p-W48)	42.4	46.4	Deep High-Resolution Representation Learning for Visual Recognition		2019
127	HSD (Rest101, 768x768, single-scale test)	42.3	46.9	Hierarchical Shot Detector		2019	single scale
128	CornerNet511 (Hourglass-104, multi-scale)	42.1	45.3	CornerNet: Detecting Objects as Paired Keypoints		2018	multiscale
129	FoveaBox (ResNeXt-101)	42.1		FoveaBox: Beyond Anchor-based Object Detector		2019	ResNeXt
130	FCOS (HRNet-W32-5l)	42.0	45.3	FCOS: Fully Convolutional One-Stage Object Detection		2019
131	RefineDet512+ (ResNet-101)	41.8	45.7	Single-Shot Refinement Neural Network for Object Detection		2017	ResNet
132	GHM-C + GHM-R (RetinaNet-FPN-ResNeXt-101)	41.6	44.2	Gradient Harmonized Single-stage Detector		2018	FPN
133	CenterNet-DLA (DLA-34, multi-scale)	41.6		Objects as Points		2019	multiscale
134	RetinaNet (SpineNet-49S, 640x640)	41.5	44.6	SpineNet: Learning Scale-Permuted Backbone for Recognition and Localization		2019
135	RPDet (ResNet-101)	41	44.3	RepPoints: Point Set Representation for Object Detection		2019	ResNet
136	M2Det (VGG-16, single-scale)	41.0	45	M2Det: A Single-Shot Object Detector based on Multi-Level Feature Pyramid Network		2018	single scale
137	FSAF (ResNet-101, single-scale)	40.9	44	Feature Selective Anchor-Free Module for Single-Shot Object Detection		2019	single scale ResNet
138	RetinaNet (ResNeXt-101-FPN)	40.8	44.1	Focal Loss for Dense Object Detection		2017	ResNeXt FPN
139	Cascade R-CNN (ResNet-50-FPN+, cascade)	40.6	44	Cascade R-CNN: Delving into High Quality Object Detection		2017	FPN ResNet
140	Faster R-CNN (Cascade RPN)	40.6	44.5	Cascade RPN: Delving into High-Quality Region Proposal Network with Adaptive Convolution		2019
141	ResNet-50-DW-DPN (Deformable Kernels)	40.6		Deformable Kernels: Adapting Effective Receptive Fields for Object Deformation		2019	ResNet
142	IoU-Net	40.6		Acquisition of Localization Confidence for Accurate Object Detection		2018
143	FCOS (HRNetV2p-W48)	40.5		Deep High-Resolution Representation Learning for Visual Recognition		2019
144	ResNet-50-FPN Mask R-CNN + KL Loss + var voting + soft-NMS	40.4		Bounding Box Regression with Uncertainty for Accurate Object Detection		2018	FPN ResNet
145	RDSNet (ResNet-101, RetinaNet, mask, MBRM)	40.3	43	RDSNet: A New Deep Architecture for Reciprocal Object Detection and Instance Segmentation		2019	ResNet
146	ExtremeNet (Hourglass-104, single-scale)	40.2	43.2	Bottom-up Object Detection by Grouping Extreme and Center Points		2019	single scale
147	Mask R-CNN (ResNet-101-FPN, CBN)	40.1	44.1	Cross-Iteration Batch Normalization		2020	FPN ResNet
148	Fast R-CNN (Cascade RPN)	40.1	43.8	Cascade RPN: Delving into High-Quality Region Proposal Network with Adaptive Convolution		2019
149	Mask R-CNN (ResNeXt-101-FPN)	39.8	43.4	Mask R-CNN		2017	ResNeXt FPN
150	GA-Faster-RCNN	39.8	43.5	Region Proposal by Guided Anchoring		2019
151	FPN (ResNet101 backbone)	39.5		ChainerCV: a Library for Deep Learning in Computer Vision		2017	FPN ResNet
152	RetinaMask (ResNet-50-FPN)	39.4	42.3	RetinaMask: Learning to predict masks improves state-of-the-art single-shot detection for free		2019	FPN ResNet
153	PP-YOLO (320x320)	39.3	42.7	PP-YOLO: An Effective and Efficient Implementation of Object Detector		2020	YOLO
154	AA-ResNet-10 + RetinaNet	39.2		Attention Augmented Convolutional Networks		2019
155	MAL (ResNet50, single-scale)	39.2		Multiple Anchor Learning for Visual Object Detection		2019	single scale ResNet
156	RetinaNet (ResNet-101-FPN)	39.1	42.3	Focal Loss for Dense Object Detection		2017	FPN ResNet
157	Cascade R-CNN (ResNet-101-FPN+)	38.8	41.9	Cascade R-CNN: Delving into High Quality Object Detection		2017	FPN ResNet
158	M2Det (ResNet-101, single-scale)	38.8	41.7	M2Det: A Single-Shot Object Detector based on Multi-Level Feature Pyramid Network		2018	single scale ResNet
159	SaccadeNet (DLA-34-DCN)	38.5	41.4	SaccadeNet: A Fast and Accurate Object Detector		2020	DCN
160	Mask R-CNN (ResNet-101-FPN)	38.2	41.7	Mask R-CNN		2017	FPN ResNet
161	WSMA-Seg	38.1		Segmentation is All You Need		2019
162	Faster R-CNN + FPN + CGD	37.9		Compact Global Descriptor for Neural Networks		2019	FPN
163	CornerNet511 (Hourglass-52, single-scale)	37.8	40.1	CornerNet: Detecting Objects as Paired Keypoints		2018	single scale
164	RefineDet512+ (VGG-16)	37.6	40.8	Single-Shot Refinement Neural Network for Object Detection		2017
165	DeformConv-R-FCN (Aligned-Inception-ResNet)	37.5		Deformable Convolutional Networks		2017
166	Faster R-CNN (ImageNet+300M)	37.4	40.1	Revisiting Unreasonable Effectiveness of Data in Deep Learning Era		2017
167	Mask R-CNN (Bottleneck-injected ResNet-50, FPN)	36.9		torchdistill: A Modular, Configuration-Driven Framework for Knowledge Distillation		2020	FPN ！！ResNet
168	Faster R-CNN + TDM	36.8		Beyond Skip Connections: Top-Down Modulation for Object Detection		2016
169	Cascade R-CNN (ResNet-50-FPN+)	36.5	39.2	Cascade R-CNN: Delving into High Quality Object Detection		2017	FPN; ResNet
170	RefineDet512 (ResNet-101)	36.4	39.5	Single-Shot Refinement Neural Network for Object Detection		2017	ResNet
171	Faster R-CNN + FPN	36.2		Feature Pyramid Networks for Object Detection		2016	FPN
172	Faster R-CNN (Bottleneck-injected ResNet-50 and FPN)	35.9		torchdistill: A Modular, Configuration-Driven Framework for Knowledge Distillation		2020	FPN; ResNet
173	Faster R-CNN (box refinement, context, multi-scale testing)	34.9		Deep Residual Learning for Image Recognition		2015	multiscale
174	Faster R-CNN	34.7		Speed/accuracy trade-offs for modern convolutional object detectors		2016
175	CornerNet-Squeeze	34.4		CornerNet-Lite: Efficient Keypoint Based Object Detection		2019
176	MultiPath Network	33.2		A MultiPath Network for Object Detection		2016
177	ION	33.1	34.6	Inside-Outside Net: Detecting Objects in Context with Skip Pooling and Recurrent Neural Networks		2015
178	RefineDet512 (VGG-16)	33	35.5	Single-Shot Refinement Neural Network for Object Detection		2017
179	YOLOv3 + Darknet-53	33.0		YOLOv3: An Incremental Improvement		2018	YOLO
180	SSD512	28.8	30.3	SSD: Single Shot MultiBox Detector		2015
181	MnasFPN (MobileNetV2)	26.1		MnasFPN: Learning Latency-aware Pyramid Architecture for Object Detection on Mobile Devices		2019	FPN
182	ESPNetv2-512	26.0		ESPNetv2: A Light-weight, Power Efficient, and General Purpose Convolutional Neural Network		2018
183	MnasFPN (MobileNetV3)	25.5		MnasFPN: Learning Latency-aware Pyramid Architecture for Object Detection on Mobile Devices		2019	FPN
184	MnasFPN (MNASNet-B1)	24.6		MnasFPN: Learning Latency-aware Pyramid Architecture for Object Detection on Mobile Devices		2019	FPN
185	MnasFPN x0.7 (MobileNetV2)	23.8		MnasFPN: Learning Latency-aware Pyramid Architecture for Object Detection on Mobile Devices		2019	FPN
186	MobielNet-v1-SSD-300x300+CGD	21.4		Compact Global Descriptor for Neural Networks		2019
187	Fast-RCNN	19.7		Fast R-CNN		2015
188	MobileNet	19.3		MobileNets: Efficient Convolutional Neural Networks for Mobile Vision Applications		2017
189	DAT-S (RetinaNet)		51.2	Vision Transformer with Deformable Attention		2022
190	CenterMask-VoVNet99 (multi-scale)		53.2	CenterMask : Real-Time Anchor-Free Instance Segmentation		2019	multiscale
191	Mask R-CNN (HRNetV2p-W32 + cascade)		48.6	Deep High-Resolution Representation Learning for Visual Recognition		2019
192	FoveaBox (ResNeXt-101)		45.2	FoveaBox: Beyond Anchor-based Object Detector		2019	ResNeXt
193	VirTex Mask R-CNN (ResNet-50-FPN)		44.8	VirTex: Learning Visual Representations from Textual Annotations		2020	FPN; ResNet
194	Centermask + ResNet101		46.9	CenterMask : Real-Time Anchor-Free Instance Segmentation		2019	ResNet
195	PAFNet (ResNet50-vd)		45.3	PAFNet: An Efficient Anchor-Free Object Detector Guidance		2021	ResNet
196	IoU-Net+EnergyRegression		41.8	Energy-Based Models for Deep Probabilistic Regression		2019
197	Cascade R-CNN (HRNetV2p-W48)		48.6	Deep High-Resolution Representation Learning for Visual Recognition		2019
198	ISTR (ResNet50-FPN-3x, single-scale)			ISTR: End-to-End Instance Segmentation with Transformers		2021
199	FoveaBox (ResNeXt-101)			FoveaBox: Beyond Anchor-based Object Detector		2019	ResNeXt
200	EfficientDet-D7x (single-scale)			EfficientDet: Scalable and Efficient Object Detection		2019	single scale

APS

Rank	Model	box AP	APS	Paper	Code	Year	Tags
1	SwinV2-G (HTC++)	63.1		Swin Transformer V2: Scaling Up Capacity and Resolution	Link	2021	Swin-Transformer
2	Florence-CoSwin-H	62.4		Florence: A New Foundation Model for Computer Vision		2021	Swin-Transformer
3	GLIP (Swin-L, multi-scale)	61.5	45.3	Grounded Language-Image Pre-training		2021	multiscale; Vision Language; Dynamic Head; BERT-Base
4	Soft Teacher + Swin-L (HTC++, multi-scale)	61.3		End-to-End Semi-Supervised Object Detection with Soft Teacher		2021	multiscale; Swin-Transformer
5	DyHead (Swin-L, multi scale, self-training)	60.6		Dynamic Head: Unifying Object Detection Heads with Attentions		2021	multiscale; Swin-Transformer
6	Dual-Swin-L (HTC, multi-scale)	60.1		CBNetV2: A Composite Backbone Network Architecture for Object Detection		2021	multiscale Swin-Transformer
7	Dual-Swin-L (HTC, single-scale)	59.4		CBNetV2: A Composite Backbone Network Architecture for Object Detection		2021	Swin-Transformer
8	Focal-L (DyHead, multi-scale)	58.9		Focal Self-attention for Local-Global Interactions in Vision Transformers		2021	multiscale Focal-Transformer
9	DyHead (Swin-L, multi scale)	58.7	41.7	Dynamic Head: Unifying Object Detection Heads with Attentions		2021	multiscale Swin-Transformer
10	Swin-L (HTC++, multi scale)	58.7		Swin Transformer: Hierarchical Vision Transformer using Shifted Windows		2021	multiscale Swin-Transformer
11	Focal-L (HTC++, multi-scale)	58.4		Focal Self-attention for Local-Global Interactions in Vision Transformers		2021	multiscale
12	Swin-L (HTC++, single scale)	57.7		Swin Transformer: Hierarchical Vision Transformer using Shifted Windows		2021	single scale Swin-Transformer
13	YOLOR-D6 (1280, single-scale, 34 fps)	57.3	40.4	You Only Learn One Representation: Unified Network for Multiple Tasks		2021	single scale YOLO
14	SOLQ (Swin-L, single)	56.5		SOLQ: Segmenting Objects by Learning Queries		2021	Transformer single scale
15	YOLOR-E6 (1280, single-scale, 45 fps)	56.4	39.1	You Only Learn One Representation: Unified Network for Multiple Tasks		2021	single scale YOLO
16	CenterNet2 (Res2Net-101-DCN-BiFPN, self-training, 1560 single-scale)	56.4	38.7	Probabilistic two-stage detection		2021	single scale FPN DCN
17	QueryInst (single-scale)	56.1	37.4	Instances as Queries		2021
18	YOLOv4-P7 with TTA	55.8		Scaled-YOLOv4: Scaling Cross Stage Partial Network		2020	multiscale YOLO
19	DetectoRS (ResNeXt-101-64x4d, multi-scale)	55.7	37.7	DetectoRS: Detecting Objects with Recursive Feature Pyramid and Switchable Atrous Convolution		2020	ResNeXt multiscale
20	YOLOR-W6 (1280, single-scale, 66 fps)	55.5	37.6	You Only Learn One Representation: Unified Network for Multiple Tasks		2021	single scale YOLO
21	YOLOv4-P7 CSP-P7 (single-scale, 16 fps)	55.4	38.1	Scaled-YOLOv4: Scaling Cross Stage Partial Network		2020	single scale YOLO
22	CSP-p6 + Mish (multi-scale)	55.2	37.6	Mish: A Self Regularized Non-Monotonic Activation Function		2019	multiscale
23	YOLOv4-P6 with TTA	54.9		Scaled-YOLOv4: Scaling Cross Stage Partial Network		2020	multiscale YOLO
24	Cascade Eff-B7 NAS-FPN (1280)	54.8		Simple Copy-Paste is a Strong Data Augmentation Method for Instance Segmentation		2020	single scale NAS-FPN
25	DetectoRS (ResNeXt-101-32x4d, multi-scale)	54.7	37.4	DetectoRS: Detecting Objects with Recursive Feature Pyramid and Switchable Atrous Convolution		2020	ResNeXt multiscale
26	YOLOv4-P6 CSP-P6 (single-scale, 32 fps)	54.3	36.6	Scaled-YOLOv4: Scaling Cross Stage Partial Network		2020	single scale YOLO
27	SpineNet-190 (1280, with Self-training on OpenImages, single-scale)	54.3		Rethinking Pre-training and Self-training		2020	single scale
28	UniverseNet-20.08d (Res2Net-101, DCN, multi-scale)	54.1	35.8	USB: Universal-Scale Object Detection Benchmark		2021	multiscale DCN
29	EfficientDet-D7 (single-scale)	53.7		EfficientDet: Scalable and Efficient Object Detection		2019	single scale
30	PAA (ResNext-152-32x8d + DCN, multi-scale)	53.5	36.0	Probabilistic Anchor Assignment with IoU Prediction for Object Detection		2020	ResNeXt multiscale DCN
31	LSNet (Res2Net-101+ DCN, multi-scale)	53.5	35.2	Location-Sensitive Visual Recognition with Cross-IOU Loss		2021	multiscale DCN
32	ResNeSt-200 (multi-scale)	53.3	35.1	ResNeSt: Split-Attention Networks		2020	multiscale
33	Cascade Mask R-CNN (Triple-ResNeXt152, multi-scale)	53.3	35.5	CBNet: A Novel Composite Backbone Network Architecture for Object Detection		2019	multiscale
34	DetectoRS (ResNeXt-101-32x4d, single-scale)	53.3	33.9	DetectoRS: Detecting Objects with Recursive Feature Pyramid and Switchable Atrous Convolution		2020	ResNeXt single scale
35	GFLV2 (Res2Net-101, DCN, multiscale)	53.3	35.7	Generalized Focal Loss V2: Learning Reliable Localization Quality Estimation for Dense Object Detection		2020	multiscale DCN
36	RelationNet++ (ResNeXt-64x4d-101-DCN)	52.7		RelationNet++: Bridging Visual Representations for Object Detection via Transformer Decoder		2020	ResNeXt DCN
37	YOLOv4-P5 with TTA	52.5		Scaled-YOLOv4: Scaling Cross Stage Partial Network		2020	multiscale YOLO
38	Deformable DETR (ResNeXt-101+DCN)	52.3	34.4	Deformable DETR: Deformable Transformers for End-to-End Object Detection		2020	ResNeXt DCN
39	GCNet (ResNeXt-101 + DCN + cascade + GC r4)	52.3		Global Context Networks		2020	ResNeXt DCN GCN
40	RetinaNet (SpineNet-190, 1280x1280)	52.1	35.4	SpineNet: Learning Scale-Permuted Backbone for Recognition and Localization		2019
41	RepPoints v2 (ResNeXt-101, DCN, multi-scale)	52.1	34.5	RepPoints V2: Verification Meets Regression for Object Detection		2020	ResNeXt; multiscale DCN


42	AC-FPN Cascade R-CNN (X-152-32x8d-FPN-IN5k, multi scale, only CEM)	51.9	34.2	Attention-guided Context Feature Pyramid Network for Object Detection		2020	ResNeXt multiscale FPN
43	OTA (ResNeXt-101+DCN, multiscale)	51.5	34.1	OTA: Optimal Transport Assignment for Object Detection		2021
44	UniverseNet-20.08d (Res2Net-101, DCN, single-scale)	51.3	31.7	USB: Universal-Scale Object Detection Benchmark		2021	single scale DCN
45	TSD (SENet154-DCN,multi-scale)	51.2	33.8	Revisiting the Sibling Head in Object Detector		2020	multiscale DCN
46	YOLOX-X (Modified CSP v5)	51.2	31.2	YOLOX: Exceeding YOLO Series in 2021		2021	YOLO
47	RetinaNet (SpineNet-143, 1280x1280)	50.7	33.6	SpineNet: Learning Scale-Permuted Backbone for Recognition and Localization		2019
48	ATSS (ResNetXt-64x4d-101+DCN,multi-scale)	50.7	33.2	Bridging the Gap Between Anchor-based and Anchor-free Detection via Adaptive Training Sample Selection		2019	ResNeXt multiscale DCN
49	NAS-FPN (AmoebaNet-D, learned aug)	50.7	34.2	Learning Data Augmentation Strategies for Object Detection		2019	FPN
50	GFLV2 (Res2Net-101, DCN)	50.6	31.3	Generalized Focal Loss V2: Learning Reliable Localization Quality Estimation for Dense Object Detection		2020	DCN
51	aLRP Loss (ResNext-101-64x4d, DCN, multiscale test)	50.2	32.0	A Ranking-based, Balanced Loss Function Unifying Classification and Localisation in Object Detection		2020	ResNeXt multiscale DCN
52	FreeAnchor + SEPC (DCN, ResNext-101-64x4d)	50.1	31.3	Scale-Equalizing Pyramid Convolution for Object Detection		2020	ResNeXt DCN
53	D2Det (ResNet-101-DCN, multi-scale test)	50.1	32.7	D2Det: Towards High Quality Object Detection and Instance Segmentation		2020	multiscale DCN ResNet
54	Dynamic R-CNN (ResNet-101-DCN, multi-scale)	50.1	32.8	Dynamic R-CNN: Towards High Quality Object Detection via Dynamic Training		2020	multiscale DCN ResNet
55	TSD (ResNet-101-Deformable, Image Pyramid)	49.4	32.7	Revisiting the Sibling Head in Object Detector		2020	ResNet
56	RepPoints v2 (ResNeXt-101, DCN)	49.4	30.3	RepPoints V2: Verification Meets Regression for Object Detection		2020	ResNeXt DCN
57	CPNDet (Hourglass-104, multi-scale)	49.2	31.0	Corner Proposal Network for Anchor-free, Two-stage Object Detection		2020	multiscale
58	GFLV2 (ResNeXt-101, 32x4d, DCN)	49	29.7	Generalized Focal Loss V2: Learning Reliable Localization Quality Estimation for Dense Object Detection		2020	ResNeXt DCN
59	aLRP Loss (ResNext-101-64x4d, DCN, single scale)	48.9	30.8	A Ranking-based, Balanced Loss Function Unifying Classification and Localisation in Object Detection		2020	ResNeXt single scale DCN
60	UniverseNet-20.08 (Res2Net-50, DCN, single-scale)	48.8	30.1	USB: Universal-Scale Object Detection Benchmark		2021	single scale DCN
61	SOLQ (ResNet101, single scale)	48.7		SOLQ: Segmenting Objects by Learning Queries		2021	Transformer single scale
62	RetinaNet (SpineNet-96, 1024x1024)	48.6	32	SpineNet: Learning Scale-Permuted Backbone for Recognition and Localization		2019
63	TridentNet (ResNet-101-Deformable, Image Pyramid)	48.4	31.8	Scale-Aware Trident Networks for Object Detection		2019	ResNet
64	GCNet (ResNeXt-101 + DCN + cascade + GC r4)	48.4		GCNet: Non-local Networks Meet Squeeze-Excitation Networks and Beyond		2019	ResNeXt DCN GCN
65	GFLV2 (ResNet-101-DCN)	48.3	28.8	Generalized Focal Loss V2: Learning Reliable Localization Quality Estimation for Dense Object Detection		2020	DCN ResNet
66	GFL (X-101-32x4d-DCN, single-scale)	48.2	29.2	Generalized Focal Loss: Learning Qualified and Distributed Bounding Boxes for Dense Object Detection		2020	ResNeXt single scale DCN
67	ISTR (ResNet101-FPN-3x, single-scale)	48.1	28.7	ISTR: End-to-End Instance Segmentation with Transformers		2021
68	aLRP Loss (ResNext-101-64x4d, single scale)	47.8	30.2	A Ranking-based, Balanced Loss Function Unifying Classification and Localisation in Object Detection		2020	ResNeXt single scale
69	MatrixNet Corners (ResNet-152, multi-scale)	47.8	29.7	Matrix Nets: A New Deep Architecture for Object Detection		2019	multiscale ResNet
70	SOLQ (ResNet50, single scale)	47.8		SOLQ: Segmenting Objects by Learning Queries		2021	Transformer single scale
71	SAPD (ResNeXt-101, single-scale)	47.4	28.1	Soft Anchor-Point Object Detection		2019	ResNeXt single scale
72	PANet (ResNeXt-101, multi-scale)	47.4	30.1	Path Aggregation Network for Instance Segmentation		2018	ResNeXt multiscale
73	HTC (HRNetV2p-W48)	47.3	28.0	Deep High-Resolution Representation Learning for Visual Recognition		2019
74	HTC (ResNeXt-101-FPN)	47.1	22.8	Hybrid Task Cascade for Instance Segmentation		2019	ResNeXt FPN
75	CenterNet511 (Hourglass-104, multi-scale)	47.0	28.9	CenterNet: Keypoint Triplets for Object Detection		2019	multiscale
76	MAL (ResNeXt101, multi-scale)	47.0		Multiple Anchor Learning for Visual Object Detection		2019	ResNeXt multiscale
77	ISTR (ResNet50-FPN-3x)	46.8		ISTR: End-to-End Instance Segmentation with Transformers		2021	FPN ResNet
78	RetinaNet (SpineNet-49, 896x896)	46.7	29.1	SpineNet: Learning Scale-Permuted Backbone for Recognition and Localization		2019
79	RPDet (ResNet-101-DCN, multi-scale)	46.5	30.3	RepPoints: Point Set Representation for Object Detection		2019	multiscale DCN ResNet
80	HoughNet (MS)	46.4	29.1	HoughNet: Integrating near and long-range evidence for bottom-up object detection		2020	multiscale
81	PPDet (ResNeXt-101-FPN, multiscale)	46.3	31.4	Reducing Label Noise in Anchor-Free Object Detection		2020	ResNeXt multiscale FPN
82	GFLV2 (ResNet-101)	46.2	27.8	Generalized Focal Loss V2: Learning Reliable Localization Quality Estimation for Dense Object Detection		2020	ResNet
83	SNIPER (ResNet-101)	46.1	29.6	SNIPER: Efficient Multi-Scale Training		2018	ResNet
84	Mask R-CNN (HRNetV2p-W48 + cascade)	46.1	27.1	Deep High-Resolution Representation Learning for Visual Recognition		2019
85	DCNv2 (ResNet-101, multi-scale)	46.0	27.8	Deformable ConvNets v2: More Deformable, Better Results		2018	multiscale DCN ResNet
86	Gaussian-FCOS	46		Localization Uncertainty Estimation for Anchor-Free Object Detection		2020
87	Cascade R-CNN-FPN (ResNet-101, map-guided)	45.9	26.3	InstaBoost: Boosting Instance Segmentation via Probability Map Guided Copy-Pasting		2019	FPN ResNet
88	MAL (ResNeXt101, single-scale)	45.9		Multiple Anchor Learning for Visual Object Detection		2019	ResNeXt single scale
89	CenterMask+VoVNetV2-99 (single-scale)	45.8	27.8	CenterMask : Real-Time Anchor-Free Instance Segmentation		2019	single scale
90	D-RFCN + SNIP (DPN-98 with flip, multi-scale)	45.7	29.3	An Analysis of Scale Invariance in Object Detection - SNIP		2017	multiscale
91	YOLOv4 (CD53)	45.5	27	Scaled-YOLOv4: Scaling Cross Stage Partial Network		2020	single scale YOLO
92	PP-YOLO (608x608)	45.2	26.3	PP-YOLO: An Effective and Efficient Implementation of Object Detector		2020	YOLO
93	AC-FPN Cascade R-CNN (ResNet-101, single scale)	45	26.9	Attention-guided Context Feature Pyramid Network for Object Detection		2019	single scale FPN ResNet
94	FreeAnchor (ResNeXt-101)	44.8	27	FreeAnchor: Learning to Match Anchors for Visual Object Detection		2019	ResNeXt
95	FCOS (ResNeXt-64x4d-101-FPN 4 + improvements)	44.7	27.6	FCOS: Fully Convolutional One-Stage Object Detection		2019	ResNeXt FPN
96	CenterMask+VoVNet2-57 (single-scale)	44.7	27.1	CenterMask : Real-Time Anchor-Free Instance Segmentation		2019	single scale
97	FSAF (ResNeXt-101, multi-scale)	44.6	29.7	Feature Selective Anchor-Free Module for Single-Shot Object Detection		2019	ResNeXt multiscale
98	aLRP Loss (ResNext-101, DCN, 500 scale)	44.6	24.6	A Ranking-based, Balanced Loss Function Unifying Classification and Localisation in Object Detection		2020	ResNeXt DCN
99	CenterMask + X-101-32x8d (single-scale)	44.6		CenterMask : Real-Time Anchor-Free Instance Segmentation		2019	single scale
100	RetinaNet (SpineNet-49, 640x640)	44.3	25.9	SpineNet: Learning Scale-Permuted Backbone for Recognition and Localization		2019
101	YOLOF-DC5	44.3	24.0	You Only Look One-level Feature		2021	YOLO
102	GFLV2 (ResNet-50)	44.3	26.8	Generalized Focal Loss V2: Learning Reliable Localization Quality Estimation for Dense Object Detection		2020	ResNet
103	InterNet (ResNet-101-FPN, multi-scale)	44.2	27.2	Feature Intertwiner for Object Detection		2019	multiscale FPN ResNet
104	M2Det (VGG-16, multi-scale)	44.2	29.2	M2Det: A Single-Shot Object Detector based on Multi-Level Feature Pyramid Network		2018	multiscale
105	Faster R-CNN (LIP-ResNet-101-MD w FPN)	43.9	25.4	LIP: Local Importance-based Pooling		2019	FPN
106	M2Det (ResNet-101, multi-scale)	43.9	29.6	M2Det: A Single-Shot Object Detector based on Multi-Level Feature Pyramid Network		2018	multiscale ResNet
107	YOLOv3 @800 + ASFF* (Darknet-53)	43.9	27.0	Learning Spatial Fusion for Single-Shot Object Detection		2019	YOLO
108	FoveaBox (ResNeXt-101)	43.9	26.8	FoveaBox: Beyond Anchor-based Object Detector		2019	ResNeXt
109	ExtremeNet (Hourglass-104, multi-scale)	43.7	24.1	Bottom-up Object Detection by Grouping Extreme and Center Points		2019	multiscale
110	YOLOv4-608	43.5	26.7	YOLOv4: Optimal Speed and Accuracy of Object Detection		2020	single scale YOLO
111	SNIPER (ResNet-50)	43.5	26.1	SNIPER: Efficient Multi-Scale Training		2018	ResNet
112	CenterNet (HRNetV2-W48)	43.5	22.2	Deep High-Resolution Representation Learning for Visual Recognition		2019
113	D-RFCN + SNIP (ResNet-101, multi-scale)	43.4	27.2	An Analysis of Scale Invariance in Object Detection - SNIP		2017	multiscale ResNet
114	Grid R-CNN (ResNeXt-101-FPN)	43.2	25.1	Grid R-CNN		2018	ResNeXt FPN
115	FCOS (ResNeXt-101-64x4d-FPN)	43.2	26.5	FCOS: Fully Convolutional One-Stage Object Detection		2019	ResNeXt FPN
116	CornerNet-Saccade (Hourglass-104, multi-scale)	43.2	24.4	CornerNet-Lite: Efficient Keypoint Based Object Detection		2019	multiscale
117	Libra R-CNN (ResNeXt-101-FPN)	43.0	25.3	Libra R-CNN: Towards Balanced Learning for Object Detection		2019	ResNeXt FPN
118	RPDet (ResNet-101-DCN)	42.8	24.9	RepPoints: Point Set Representation for Object Detection		2019	DCN ResNet
119	SpineNet-49 (640, RetinaNet, single-scale)	42.8	23.7	SpineNet: Learning Scale-Permuted Backbone for Recognition and Localization		2019	single scale
120	Cascade R-CNN (ResNet-101-FPN+, cascade)	42.8	23.7	Cascade R-CNN: Delving into High Quality Object Detection		2017	FPN ResNet
121	Cascade R-CNN	42.8	23.7	Cascade R-CNN: High Quality Object Detection and Instance Segmentation		2019
122	TridentNet (ResNet-101)	42.7	23.9	Scale-Aware Trident Networks for Object Detection		2019	ResNet
123	FCOS (ResNeXt-32x8d-101-FPN)	42.7	26.0	FCOS: Fully Convolutional One-Stage Object Detection		2019	ResNeXt FPN
124	RetinaMask (ResNeXt-101-FPN-GN)	42.6	24.8	RetinaMask: Learning to predict masks improves state-of-the-art single-shot detection for free		2019	ResNeXt FPN
125	TAL + TAP	42.5		TOOD: Task-aligned One-stage Object Detection		2021
126	Faster R-CNN (HRNetV2p-W48)	42.4	24.9	Deep High-Resolution Representation Learning for Visual Recognition		2019
127	HSD (Rest101, 768x768, single-scale test)	42.3	22.8	Hierarchical Shot Detector		2019	single scale
128	CornerNet511 (Hourglass-104, multi-scale)	42.1	20.8	CornerNet: Detecting Objects as Paired Keypoints		2018	multiscale
129	FoveaBox (ResNeXt-101)	42.1		FoveaBox: Beyond Anchor-based Object Detector		2019	ResNeXt
130	FCOS (HRNet-W32-5l)	42.0	25.4	FCOS: Fully Convolutional One-Stage Object Detection		2019
131	RefineDet512+ (ResNet-101)	41.8	25.6	Single-Shot Refinement Neural Network for Object Detection		2017	ResNet
132	GHM-C + GHM-R (RetinaNet-FPN-ResNeXt-101)	41.6	22.3	Gradient Harmonized Single-stage Detector		2018	FPN
133	CenterNet-DLA (DLA-34, multi-scale)	41.6	21.5	Objects as Points		2019	multiscale
134	RetinaNet (SpineNet-49S, 640x640)	41.5	23.3	SpineNet: Learning Scale-Permuted Backbone for Recognition and Localization		2019
135	RPDet (ResNet-101)	41	23.6	RepPoints: Point Set Representation for Object Detection		2019	ResNet
136	M2Det (VGG-16, single-scale)	41.0	22.1	M2Det: A Single-Shot Object Detector based on Multi-Level Feature Pyramid Network		2018	single scale
137	FSAF (ResNet-101, single-scale)	40.9	24	Feature Selective Anchor-Free Module for Single-Shot Object Detection		2019	single scale ResNet
138	RetinaNet (ResNeXt-101-FPN)	40.8	24.1	Focal Loss for Dense Object Detection		2017	ResNeXt FPN
139	Cascade R-CNN (ResNet-50-FPN+, cascade)	40.6	22.6	Cascade R-CNN: Delving into High Quality Object Detection		2017	FPN ResNet
140	Faster R-CNN (Cascade RPN)	40.6	22.0	Cascade RPN: Delving into High-Quality Region Proposal Network with Adaptive Convolution		2019
141	ResNet-50-DW-DPN (Deformable Kernels)	40.6	24.6	Deformable Kernels: Adapting Effective Receptive Fields for Object Deformation		2019	ResNet
142	IoU-Net	40.6		Acquisition of Localization Confidence for Accurate Object Detection		2018
143	FCOS (HRNetV2p-W48)	40.5	23.4	Deep High-Resolution Representation Learning for Visual Recognition		2019
144	ResNet-50-FPN Mask R-CNN + KL Loss + var voting + soft-NMS	40.4		Bounding Box Regression with Uncertainty for Accurate Object Detection		2018	FPN ResNet
145	RDSNet (ResNet-101, RetinaNet, mask, MBRM)	40.3	22.1	RDSNet: A New Deep Architecture for Reciprocal Object Detection and Instance Segmentation		2019	ResNet
146	ExtremeNet (Hourglass-104, single-scale)	40.2	20.4	Bottom-up Object Detection by Grouping Extreme and Center Points		2019	single scale
147	Mask R-CNN (ResNet-101-FPN, CBN)	40.1	35.8	Cross-Iteration Batch Normalization		2020	FPN ResNet
148	Fast R-CNN (Cascade RPN)	40.1	22.1	Cascade RPN: Delving into High-Quality Region Proposal Network with Adaptive Convolution		2019
149	Mask R-CNN (ResNeXt-101-FPN)	39.8	22.1	Mask R-CNN		2017	ResNeXt FPN
150	GA-Faster-RCNN	39.8	21.8	Region Proposal by Guided Anchoring		2019
151	FPN (ResNet101 backbone)	39.5		ChainerCV: a Library for Deep Learning in Computer Vision		2017	FPN ResNet
152	RetinaMask (ResNet-50-FPN)	39.4	21.9	RetinaMask: Learning to predict masks improves state-of-the-art single-shot detection for free		2019	FPN ResNet
153	PP-YOLO (320x320)	39.3	16.7	PP-YOLO: An Effective and Efficient Implementation of Object Detector		2020	YOLO
154	AA-ResNet-10 + RetinaNet	39.2		Attention Augmented Convolutional Networks		2019
155	MAL (ResNet50, single-scale)	39.2		Multiple Anchor Learning for Visual Object Detection		2019	single scale ResNet
156	RetinaNet (ResNet-101-FPN)	39.1	21.8	Focal Loss for Dense Object Detection		2017	FPN ResNet
157	Cascade R-CNN (ResNet-101-FPN+)	38.8	21.3	Cascade R-CNN: Delving into High Quality Object Detection		2017	FPN ResNet
158	M2Det (ResNet-101, single-scale)	38.8	20.5	M2Det: A Single-Shot Object Detector based on Multi-Level Feature Pyramid Network		2018	single scale ResNet
159	SaccadeNet (DLA-34-DCN)	38.5	19.2	SaccadeNet: A Fast and Accurate Object Detector		2020	DCN
160	Mask R-CNN (ResNet-101-FPN)	38.2	20.1	Mask R-CNN		2017	FPN ResNet
161	WSMA-Seg	38.1		Segmentation is All You Need		2019
162	Faster R-CNN + FPN + CGD	37.9		Compact Global Descriptor for Neural Networks		2019	FPN
163	CornerNet511 (Hourglass-52, single-scale)	37.8	17.0	CornerNet: Detecting Objects as Paired Keypoints		2018	single scale
164	RefineDet512+ (VGG-16)	37.6	22.7	Single-Shot Refinement Neural Network for Object Detection		2017
165	DeformConv-R-FCN (Aligned-Inception-ResNet)	37.5	19.4	Deformable Convolutional Networks		2017
166	Faster R-CNN (ImageNet+300M)	37.4	17.5	Revisiting Unreasonable Effectiveness of Data in Deep Learning Era		2017
167	Mask R-CNN (Bottleneck-injected ResNet-50, FPN)	36.9		torchdistill: A Modular, Configuration-Driven Framework for Knowledge Distillation		2020	FPN ！！ResNet
168	Faster R-CNN + TDM	36.8		Beyond Skip Connections: Top-Down Modulation for Object Detection		2016
169	Cascade R-CNN (ResNet-50-FPN+)	36.5	20.3	Cascade R-CNN: Delving into High Quality Object Detection		2017	FPN; ResNet
170	RefineDet512 (ResNet-101)	36.4	16.6	Single-Shot Refinement Neural Network for Object Detection		2017	ResNet
171	Faster R-CNN + FPN	36.2		Feature Pyramid Networks for Object Detection		2016	FPN
172	Faster R-CNN (Bottleneck-injected ResNet-50 and FPN)	35.9		torchdistill: A Modular, Configuration-Driven Framework for Knowledge Distillation		2020	FPN; ResNet
173	Faster R-CNN (box refinement, context, multi-scale testing)	34.9		Deep Residual Learning for Image Recognition		2015	multiscale
174	Faster R-CNN	34.7		Speed/accuracy trade-offs for modern convolutional object detectors		2016
175	CornerNet-Squeeze	34.4		CornerNet-Lite: Efficient Keypoint Based Object Detection		2019
176	MultiPath Network	33.2		A MultiPath Network for Object Detection		2016
177	ION	33.1	14.5	Inside-Outside Net: Detecting Objects in Context with Skip Pooling and Recurrent Neural Networks		2015
178	RefineDet512 (VGG-16)	33	16.3	Single-Shot Refinement Neural Network for Object Detection		2017
179	YOLOv3 + Darknet-53	33.0		YOLOv3: An Incremental Improvement		2018	YOLO
180	SSD512	28.8		SSD: Single Shot MultiBox Detector		2015
181	MnasFPN (MobileNetV2)	26.1		MnasFPN: Learning Latency-aware Pyramid Architecture for Object Detection on Mobile Devices		2019	FPN
182	ESPNetv2-512	26.0		ESPNetv2: A Light-weight, Power Efficient, and General Purpose Convolutional Neural Network		2018
183	MnasFPN (MobileNetV3)	25.5		MnasFPN: Learning Latency-aware Pyramid Architecture for Object Detection on Mobile Devices		2019	FPN
184	MnasFPN (MNASNet-B1)	24.6		MnasFPN: Learning Latency-aware Pyramid Architecture for Object Detection on Mobile Devices		2019	FPN
185	MnasFPN x0.7 (MobileNetV2)	23.8		MnasFPN: Learning Latency-aware Pyramid Architecture for Object Detection on Mobile Devices		2019	FPN
186	MobielNet-v1-SSD-300x300+CGD	21.4		Compact Global Descriptor for Neural Networks		2019
187	Fast-RCNN	19.7		Fast R-CNN		2015
188	MobileNet	19.3		MobileNets: Efficient Convolutional Neural Networks for Mobile Vision Applications		2017
189	DAT-S (RetinaNet)		32.3	Vision Transformer with Deformable Attention		2022
190	CenterMask-VoVNet99 (multi-scale)		32.4	CenterMask : Real-Time Anchor-Free Instance Segmentation		2019	multiscale
191	Mask R-CNN (HRNetV2p-W32 + cascade)			Deep High-Resolution Representation Learning for Visual Recognition		2019
192	FoveaBox (ResNeXt-101)			FoveaBox: Beyond Anchor-based Object Detector		2019	ResNeXt
193	VirTex Mask R-CNN (ResNet-50-FPN)			VirTex: Learning Visual Representations from Textual Annotations		2020	FPN; ResNet
194	Centermask + ResNet101			CenterMask : Real-Time Anchor-Free Instance Segmentation		2019	ResNet
195	PAFNet (ResNet50-vd)		22.8	PAFNet: An Efficient Anchor-Free Object Detector Guidance		2021	ResNet
196	IoU-Net+EnergyRegression			Energy-Based Models for Deep Probabilistic Regression		2019
197	Cascade R-CNN (HRNetV2p-W48)		26.0	Deep High-Resolution Representation Learning for Visual Recognition		2019
198	ISTR (ResNet50-FPN-3x, single-scale)		27.8	ISTR: End-to-End Instance Segmentation with Transformers		2021
199	FoveaBox (ResNeXt-101)		24.9	FoveaBox: Beyond Anchor-based Object Detector		2019	ResNeXt
200	EfficientDet-D7x (single-scale)			EfficientDet: Scalable and Efficient Object Detection		2019	single scale

Rank	Model	box AP	AP50	AP75	APS	APM	APL	AP	Paper	Code	Year	Tags
1	SwinV2-G (HTC++)	63.1							Swin Transformer V2: Scaling Up Capacity and Resolution	Link	2021	Swin-Transformer
2	Florence-CoSwin-H	62.4							Florence: A New Foundation Model for Computer Vision		2021	Swin-Transformer
3	GLIP (Swin-L, multi-scale)	61.5	79.5	67.7	45.3	64.9	75.0		Grounded Language-Image Pre-training		2021	multiscale; Vision Language; Dynamic Head; BERT-Base
4	Soft Teacher + Swin-L (HTC++, multi-scale)	61.3							End-to-End Semi-Supervised Object Detection with Soft Teacher		2021	multiscale; Swin-Transformer
5	DyHead (Swin-L, multi scale, self-training)	60.6	78.5	66.6		64.0	74.2		Dynamic Head: Unifying Object Detection Heads with Attentions		2021	multiscale; Swin-Transformer
6	Dual-Swin-L (HTC, multi-scale)	60.1							CBNetV2: A Composite Backbone Network Architecture for Object Detection		2021	multiscale Swin-Transformer
7	Dual-Swin-L (HTC, single-scale)	59.4							CBNetV2: A Composite Backbone Network Architecture for Object Detection		2021	Swin-Transformer
8	Focal-L (DyHead, multi-scale)	58.9							Focal Self-attention for Local-Global Interactions in Vision Transformers		2021	multiscale Focal-Transformer
9	DyHead (Swin-L, multi scale)	58.7	77.1	64.5	41.7	62.0	72.8		Dynamic Head: Unifying Object Detection Heads with Attentions		2021	multiscale Swin-Transformer
10	Swin-L (HTC++, multi scale)	58.7							Swin Transformer: Hierarchical Vision Transformer using Shifted Windows		2021	multiscale Swin-Transformer
11	Focal-L (HTC++, multi-scale)	58.4							Focal Self-attention for Local-Global Interactions in Vision Transformers		2021	multiscale
12	Swin-L (HTC++, single scale)	57.7							Swin Transformer: Hierarchical Vision Transformer using Shifted Windows		2021	single scale Swin-Transformer
13	YOLOR-D6 (1280, single-scale, 34 fps)	57.3	75.0	62.7	40.4	61.2	69.2		You Only Learn One Representation: Unified Network for Multiple Tasks		2021	single scale YOLO
14	SOLQ (Swin-L, single)	56.5							SOLQ: Segmenting Objects by Learning Queries		2021	Transformer single scale
15	YOLOR-E6 (1280, single-scale, 45 fps)	56.4	74.1	61.6	39.1	60.1	68.2		You Only Learn One Representation: Unified Network for Multiple Tasks		2021	single scale YOLO
16	CenterNet2 (Res2Net-101-DCN-BiFPN, self-training, 1560 single-scale)	56.4	74.0	61.6	38.7	59.7	68.6		Probabilistic two-stage detection		2021	single scale FPN DCN
17	QueryInst (single-scale)	56.1	75.9	61.9	37.4	58.9	70.3		Instances as Queries		2021
18	YOLOv4-P7 with TTA	55.8	73.2	61.2					Scaled-YOLOv4: Scaling Cross Stage Partial Network		2020	multiscale YOLO
19	DetectoRS (ResNeXt-101-64x4d, multi-scale)	55.7	74.2	61.1	37.7	58.4	68.1		DetectoRS: Detecting Objects with Recursive Feature Pyramid and Switchable Atrous Convolution		2020	ResNeXt multiscale
20	YOLOR-W6 (1280, single-scale, 66 fps)	55.5	73.2	60.6	37.6	59.5	67.7		You Only Learn One Representation: Unified Network for Multiple Tasks		2021	single scale YOLO
21	YOLOv4-P7 CSP-P7 (single-scale, 16 fps)	55.4	73.3	60.7	38.1	59.5	67.4		Scaled-YOLOv4: Scaling Cross Stage Partial Network		2020	single scale YOLO
22	CSP-p6 + Mish (multi-scale)	55.2	72.9	60.5	37.6	59.0	66.9		Mish: A Self Regularized Non-Monotonic Activation Function		2019	multiscale
23	YOLOv4-P6 with TTA	54.9	72.6	60.2					Scaled-YOLOv4: Scaling Cross Stage Partial Network		2020	multiscale YOLO
24	Cascade Eff-B7 NAS-FPN (1280)	54.8							Simple Copy-Paste is a Strong Data Augmentation Method for Instance Segmentation		2020	single scale NAS-FPN
25	DetectoRS (ResNeXt-101-32x4d, multi-scale)	54.7	73.5	60.1	37.4	57.3	66.4		DetectoRS: Detecting Objects with Recursive Feature Pyramid and Switchable Atrous Convolution		2020	ResNeXt multiscale
26	YOLOv4-P6 CSP-P6 (single-scale, 32 fps)	54.3	72.3	59.5	36.6	58.2	65.5		Scaled-YOLOv4: Scaling Cross Stage Partial Network		2020	single scale YOLO
27	SpineNet-190 (1280, with Self-training on OpenImages, single-scale)	54.3							Rethinking Pre-training and Self-training		2020	single scale
28	UniverseNet-20.08d (Res2Net-101, DCN, multi-scale)	54.1	71.6	59.9	35.8	57.2	67.4		USB: Universal-Scale Object Detection Benchmark		2021	multiscale DCN
29	EfficientDet-D7 (single-scale)	53.7	72.4			57.0	66.3		EfficientDet: Scalable and Efficient Object Detection		2019	single scale
30	PAA (ResNext-152-32x8d + DCN, multi-scale)	53.5	71.6	59.1	36.0	56.3	66.9		Probabilistic Anchor Assignment with IoU Prediction for Object Detection		2020	ResNeXt multiscale DCN
31	LSNet (Res2Net-101+ DCN, multi-scale)	53.5	71.1	59.2	35.2	56.4	65.8		Location-Sensitive Visual Recognition with Cross-IOU Loss		2021	multiscale DCN
32	ResNeSt-200 (multi-scale)	53.3	72.0	58.0	35.1	56.2	66.8		ResNeSt: Split-Attention Networks		2020	multiscale
33	Cascade Mask R-CNN (Triple-ResNeXt152, multi-scale)	53.3	71.9	58.5	35.5	55.8	66.7		CBNet: A Novel Composite Backbone Network Architecture for Object Detection		2019	multiscale
34	DetectoRS (ResNeXt-101-32x4d, single-scale)	53.3	71.6	58.5	33.9	56.5	66.9		DetectoRS: Detecting Objects with Recursive Feature Pyramid and Switchable Atrous Convolution		2020	ResNeXt single scale
35	GFLV2 (Res2Net-101, DCN, multiscale)	53.3	70.9	59.2	35.7	56.1	65.6		Generalized Focal Loss V2: Learning Reliable Localization Quality Estimation for Dense Object Detection		2020	multiscale DCN
36	RelationNet++ (ResNeXt-64x4d-101-DCN)	52.7							RelationNet++: Bridging Visual Representations for Object Detection via Transformer Decoder		2020	ResNeXt DCN
37	YOLOv4-P5 with TTA	52.5	70.3	58					Scaled-YOLOv4: Scaling Cross Stage Partial Network		2020	multiscale YOLO
38	Deformable DETR (ResNeXt-101+DCN)	52.3	71.9	58.1	34.4	54.4	65.6		Deformable DETR: Deformable Transformers for End-to-End Object Detection		2020	ResNeXt DCN
39	GCNet (ResNeXt-101 + DCN + cascade + GC r4)	52.3	70.9	56.9					Global Context Networks		2020	ResNeXt DCN GCN
40	RetinaNet (SpineNet-190, 1280x1280)	52.1	71.8	56.5	35.4	55	63.6		SpineNet: Learning Scale-Permuted Backbone for Recognition and Localization		2019
41	RepPoints v2 (ResNeXt-101, DCN, multi-scale)	52.1	70.1	57.5	34.5	54.6	63.6		RepPoints V2: Verification Meets Regression for Object Detection		2020	ResNeXt; multiscale DCN


42	AC-FPN Cascade R-CNN (X-152-32x8d-FPN-IN5k, multi scale, only CEM)	51.9	70.4	57	34.2	54.8	64.7		Attention-guided Context Feature Pyramid Network for Object Detection		2020	ResNeXt multiscale FPN
43	OTA (ResNeXt-101+DCN, multiscale)	51.5	68.6	57.1	34.1	53.7	64.1		OTA: Optimal Transport Assignment for Object Detection		2021
44	UniverseNet-20.08d (Res2Net-101, DCN, single-scale)	51.3	70.0	55.8	31.7	55.3	64.9		USB: Universal-Scale Object Detection Benchmark		2021	single scale DCN
45	TSD (SENet154-DCN,multi-scale)	51.2	71.9	56.0	33.8	54.8	64.2		Revisiting the Sibling Head in Object Detector		2020	multiscale DCN
46	YOLOX-X (Modified CSP v5)	51.2	69.6	55.7	31.2	56.1	66.1		YOLOX: Exceeding YOLO Series in 2021		2021	YOLO
47	RetinaNet (SpineNet-143, 1280x1280)	50.7	70.4	54.9	33.6	53.9	62.1		SpineNet: Learning Scale-Permuted Backbone for Recognition and Localization		2019
48	ATSS (ResNetXt-64x4d-101+DCN,multi-scale)	50.7	68.9	56.3	33.2	52.9	62.4		Bridging the Gap Between Anchor-based and Anchor-free Detection via Adaptive Training Sample Selection		2019	ResNeXt multiscale DCN
49	NAS-FPN (AmoebaNet-D, learned aug)	50.7			34.2	55.5	64.5		Learning Data Augmentation Strategies for Object Detection		2019	FPN
50	GFLV2 (Res2Net-101, DCN)	50.6	69	55.3	31.3	54.3	63.5		Generalized Focal Loss V2: Learning Reliable Localization Quality Estimation for Dense Object Detection		2020	DCN
51	aLRP Loss (ResNext-101-64x4d, DCN, multiscale test)	50.2	70.3	53.9	32.0	53.1	63.0		A Ranking-based, Balanced Loss Function Unifying Classification and Localisation in Object Detection		2020	ResNeXt multiscale DCN
52	FreeAnchor + SEPC (DCN, ResNext-101-64x4d)	50.1	69.8	54.3	31.3	53.3	63.7		Scale-Equalizing Pyramid Convolution for Object Detection		2020	ResNeXt DCN
53	D2Det (ResNet-101-DCN, multi-scale test)	50.1	69.4	54.9	32.7	52.7	62.1		D2Det: Towards High Quality Object Detection and Instance Segmentation		2020	multiscale DCN ResNet
54	Dynamic R-CNN (ResNet-101-DCN, multi-scale)	50.1	68.3	55.6	32.8	53.0	61.2		Dynamic R-CNN: Towards High Quality Object Detection via Dynamic Training		2020	multiscale DCN ResNet
55	TSD (ResNet-101-Deformable, Image Pyramid)	49.4	69.6	54.4	32.7	52.5	61.0		Revisiting the Sibling Head in Object Detector		2020	ResNet
56	RepPoints v2 (ResNeXt-101, DCN)	49.4	68.9	53.4	30.3	52.1	62.3		RepPoints V2: Verification Meets Regression for Object Detection		2020	ResNeXt DCN
57	CPNDet (Hourglass-104, multi-scale)	49.2	67.3	53.7	31.0	51.9	62.4		Corner Proposal Network for Anchor-free, Two-stage Object Detection		2020	multiscale
58	GFLV2 (ResNeXt-101, 32x4d, DCN)	49	67.6	53.5	29.7	52.4	61.4		Generalized Focal Loss V2: Learning Reliable Localization Quality Estimation for Dense Object Detection		2020	ResNeXt DCN
59	aLRP Loss (ResNext-101-64x4d, DCN, single scale)	48.9	69.3	52.5	30.8	51.5	62.1		A Ranking-based, Balanced Loss Function Unifying Classification and Localisation in Object Detection		2020	ResNeXt single scale DCN
60	UniverseNet-20.08 (Res2Net-50, DCN, single-scale)	48.8	67.5	53.0	30.1	52.3	61.1		USB: Universal-Scale Object Detection Benchmark		2021	single scale DCN
61	SOLQ (ResNet101, single scale)	48.7							SOLQ: Segmenting Objects by Learning Queries		2021	Transformer single scale
62	RetinaNet (SpineNet-96, 1024x1024)	48.6	68.4	52.5	32	52.3	62		SpineNet: Learning Scale-Permuted Backbone for Recognition and Localization		2019
63	TridentNet (ResNet-101-Deformable, Image Pyramid)	48.4	69.7	53.5	31.8	51.3	60.3		Scale-Aware Trident Networks for Object Detection		2019	ResNet
64	GCNet (ResNeXt-101 + DCN + cascade + GC r4)	48.4	67.6	52.7					GCNet: Non-local Networks Meet Squeeze-Excitation Networks and Beyond		2019	ResNeXt DCN GCN
65	GFLV2 (ResNet-101-DCN)	48.3	66.5	52.8	28.8	51.9	60.7		Generalized Focal Loss V2: Learning Reliable Localization Quality Estimation for Dense Object Detection		2020	DCN ResNet
66	GFL (X-101-32x4d-DCN, single-scale)	48.2	67.4	52.6	29.2	51.7	60.2		Generalized Focal Loss: Learning Qualified and Distributed Bounding Boxes for Dense Object Detection		2020	ResNeXt single scale DCN
67	ISTR (ResNet101-FPN-3x, single-scale)	48.1			28.7	50.4	61.5		ISTR: End-to-End Instance Segmentation with Transformers		2021
68	aLRP Loss (ResNext-101-64x4d, single scale)	47.8	68.4	51.1	30.2	50.8	59.1		A Ranking-based, Balanced Loss Function Unifying Classification and Localisation in Object Detection		2020	ResNeXt single scale
69	MatrixNet Corners (ResNet-152, multi-scale)	47.8	66.2	52.3	29.7	50.4	60.7		Matrix Nets: A New Deep Architecture for Object Detection		2019	multiscale ResNet
70	SOLQ (ResNet50, single scale)	47.8							SOLQ: Segmenting Objects by Learning Queries		2021	Transformer single scale
71	SAPD (ResNeXt-101, single-scale)	47.4	67.4	51.1	28.1	50.3	61.5		Soft Anchor-Point Object Detection		2019	ResNeXt single scale
72	PANet (ResNeXt-101, multi-scale)	47.4	67.2	51.8	30.1	51.7	60.0		Path Aggregation Network for Instance Segmentation		2018	ResNeXt multiscale
73	HTC (HRNetV2p-W48)	47.3	65.9	51.2	28.0	49.7	59.8		Deep High-Resolution Representation Learning for Visual Recognition		2019
74	HTC (ResNeXt-101-FPN)	47.1	63.9	44.7	22.8	43.9	54.6		Hybrid Task Cascade for Instance Segmentation		2019	ResNeXt FPN
75	CenterNet511 (Hourglass-104, multi-scale)	47.0	64.5	50.7	28.9	49.9	58.9		CenterNet: Keypoint Triplets for Object Detection		2019	multiscale
76	MAL (ResNeXt101, multi-scale)	47.0							Multiple Anchor Learning for Visual Object Detection		2019	ResNeXt multiscale
77	ISTR (ResNet50-FPN-3x)	46.8							ISTR: End-to-End Instance Segmentation with Transformers		2021	FPN ResNet
78	RetinaNet (SpineNet-49, 896x896)	46.7	66.3	50.6	29.1	50.1	61.7		SpineNet: Learning Scale-Permuted Backbone for Recognition and Localization		2019
79	RPDet (ResNet-101-DCN, multi-scale)	46.5	67.4	50.9	30.3	49.7	57.1		RepPoints: Point Set Representation for Object Detection		2019	multiscale DCN ResNet
80	HoughNet (MS)	46.4	65.1	50.7	29.1	48.5	58.1		HoughNet: Integrating near and long-range evidence for bottom-up object detection		2020	multiscale
81	PPDet (ResNeXt-101-FPN, multiscale)	46.3	64.8	51.6	31.4	49.9	56.4		Reducing Label Noise in Anchor-Free Object Detection		2020	ResNeXt multiscale FPN
82	GFLV2 (ResNet-101)	46.2	64.3	50.5	27.8	49.9	57		Generalized Focal Loss V2: Learning Reliable Localization Quality Estimation for Dense Object Detection		2020	ResNet
83	SNIPER (ResNet-101)	46.1	67.0	51.6	29.6	48.9	58.1		SNIPER: Efficient Multi-Scale Training		2018	ResNet
84	Mask R-CNN (HRNetV2p-W48 + cascade)	46.1	64.0	50.3	27.1	48.6	58.3		Deep High-Resolution Representation Learning for Visual Recognition		2019
85	DCNv2 (ResNet-101, multi-scale)	46.0	67.9	50.8	27.8	49.1	59.5		Deformable ConvNets v2: More Deformable, Better Results		2018	multiscale DCN ResNet
86	Gaussian-FCOS	46							Localization Uncertainty Estimation for Anchor-Free Object Detection		2020
87	Cascade R-CNN-FPN (ResNet-101, map-guided)	45.9	64.2	50	26.3	49	58.6		InstaBoost: Boosting Instance Segmentation via Probability Map Guided Copy-Pasting		2019	FPN ResNet
88	MAL (ResNeXt101, single-scale)	45.9							Multiple Anchor Learning for Visual Object Detection		2019	ResNeXt single scale
89	CenterMask+VoVNetV2-99 (single-scale)	45.8	64.5		27.8	48.3	57.6		CenterMask : Real-Time Anchor-Free Instance Segmentation		2019	single scale
90	D-RFCN + SNIP (DPN-98 with flip, multi-scale)	45.7	67.3	51.1	29.3	48.8	57.1		An Analysis of Scale Invariance in Object Detection - SNIP		2017	multiscale
91	YOLOv4 (CD53)	45.5	64.1	49.5	27	49	56.7		Scaled-YOLOv4: Scaling Cross Stage Partial Network		2020	single scale YOLO
92	PP-YOLO (608x608)	45.2	65.2	49.9	26.3	47.8	57.2		PP-YOLO: An Effective and Efficient Implementation of Object Detector		2020	YOLO
93	AC-FPN Cascade R-CNN (ResNet-101, single scale)	45	64.4	49	26.9	47.7	56.6		Attention-guided Context Feature Pyramid Network for Object Detection		2019	single scale FPN ResNet
94	FreeAnchor (ResNeXt-101)	44.8	64.3	48.4	27	47.9	56		FreeAnchor: Learning to Match Anchors for Visual Object Detection		2019	ResNeXt
95	FCOS (ResNeXt-64x4d-101-FPN 4 + improvements)	44.7	64.1	48.4	27.6	47.5	55.6		FCOS: Fully Convolutional One-Stage Object Detection		2019	ResNeXt FPN
96	CenterMask+VoVNet2-57 (single-scale)	44.7	63.1	48.6	27.1		55.9		CenterMask : Real-Time Anchor-Free Instance Segmentation		2019	single scale
97	FSAF (ResNeXt-101, multi-scale)	44.6	65.2	48.6	29.7	47.1	54.6		Feature Selective Anchor-Free Module for Single-Shot Object Detection		2019	ResNeXt multiscale
98	aLRP Loss (ResNext-101, DCN, 500 scale)	44.6	65.0	47.5	24.6	48.1	58.3		A Ranking-based, Balanced Loss Function Unifying Classification and Localisation in Object Detection		2020	ResNeXt DCN
99	CenterMask + X-101-32x8d (single-scale)	44.6	63.4	48.4		47.2			CenterMask : Real-Time Anchor-Free Instance Segmentation		2019	single scale
100	RetinaNet (SpineNet-49, 640x640)	44.3	63.8	47.6	25.9	47.7	61.1		SpineNet: Learning Scale-Permuted Backbone for Recognition and Localization		2019
101	YOLOF-DC5	44.3	62.9	47.5	24.0	48.5	60.4		You Only Look One-level Feature		2021	YOLO
102	GFLV2 (ResNet-50)	44.3	62.3	48.5	26.8	47.7	54.1		Generalized Focal Loss V2: Learning Reliable Localization Quality Estimation for Dense Object Detection		2020	ResNet
103	InterNet (ResNet-101-FPN, multi-scale)	44.2	67.5	51.1	27.2	50.3	57.7		Feature Intertwiner for Object Detection		2019	multiscale FPN ResNet
104	M2Det (VGG-16, multi-scale)	44.2	64.6	49.3	29.2	47.9	55.1		M2Det: A Single-Shot Object Detector based on Multi-Level Feature Pyramid Network		2018	multiscale
105	Faster R-CNN (LIP-ResNet-101-MD w FPN)	43.9	65.7	48.1	25.4	46.7	56.3		LIP: Local Importance-based Pooling		2019	FPN
106	M2Det (ResNet-101, multi-scale)	43.9	64.4	48	29.6	49.6	54.3		M2Det: A Single-Shot Object Detector based on Multi-Level Feature Pyramid Network		2018	multiscale ResNet
107	YOLOv3 @800 + ASFF* (Darknet-53)	43.9	64.1	49.2	27.0	46.6	53.4		Learning Spatial Fusion for Single-Shot Object Detection		2019	YOLO
108	FoveaBox (ResNeXt-101)	43.9	63.5	47.7	26.8	46.9	55.6		FoveaBox: Beyond Anchor-based Object Detector		2019	ResNeXt
109	ExtremeNet (Hourglass-104, multi-scale)	43.7	60.5	47.0	24.1	46.9	57.6		Bottom-up Object Detection by Grouping Extreme and Center Points		2019	multiscale
110	YOLOv4-608	43.5	65.7	47.3	26.7	46.7	53.3		YOLOv4: Optimal Speed and Accuracy of Object Detection		2020	single scale YOLO
111	SNIPER (ResNet-50)	43.5	65.0	48.6	26.1	46.3	56.0		SNIPER: Efficient Multi-Scale Training		2018	ResNet
112	CenterNet (HRNetV2-W48)	43.5		46.5	22.2		57.8		Deep High-Resolution Representation Learning for Visual Recognition		2019
113	D-RFCN + SNIP (ResNet-101, multi-scale)	43.4	65.5	48.4	27.2	46.5	54.9		An Analysis of Scale Invariance in Object Detection - SNIP		2017	multiscale ResNet
114	Grid R-CNN (ResNeXt-101-FPN)	43.2	63.0	46.6	25.1	46.5	55.2		Grid R-CNN		2018	ResNeXt FPN
115	FCOS (ResNeXt-101-64x4d-FPN)	43.2	62.8	46.6	26.5	46.2	53.3		FCOS: Fully Convolutional One-Stage Object Detection		2019	ResNeXt FPN
116	CornerNet-Saccade (Hourglass-104, multi-scale)	43.2			24.4	44.6	57.3		CornerNet-Lite: Efficient Keypoint Based Object Detection		2019	multiscale
117	Libra R-CNN (ResNeXt-101-FPN)	43.0	64	47	25.3	45.6	54.6		Libra R-CNN: Towards Balanced Learning for Object Detection		2019	ResNeXt FPN
118	RPDet (ResNet-101-DCN)	42.8	65.0	46.3	24.9	46.2	54.7		RepPoints: Point Set Representation for Object Detection		2019	DCN ResNet
119	SpineNet-49 (640, RetinaNet, single-scale)	42.8	62.3	46.1	23.7	45.2	57.3		SpineNet: Learning Scale-Permuted Backbone for Recognition and Localization		2019	single scale
120	Cascade R-CNN (ResNet-101-FPN+, cascade)	42.8	62.1	46.3	23.7	45.5	55.2		Cascade R-CNN: Delving into High Quality Object Detection		2017	FPN ResNet
121	Cascade R-CNN	42.8	62.1	46.3	23.7	45.5	55.2		Cascade R-CNN: High Quality Object Detection and Instance Segmentation		2019
122	TridentNet (ResNet-101)	42.7	63.6	46.5	23.9	46.6	56.6		Scale-Aware Trident Networks for Object Detection		2019	ResNet
123	FCOS (ResNeXt-32x8d-101-FPN)	42.7	62.2	46.1	26.0	45.6	52.6		FCOS: Fully Convolutional One-Stage Object Detection		2019	ResNeXt FPN
124	RetinaMask (ResNeXt-101-FPN-GN)	42.6	62.5	46.0	24.8	45.6	53.8		RetinaMask: Learning to predict masks improves state-of-the-art single-shot detection for free		2019	ResNeXt FPN
125	TAL + TAP	42.5	60.3	46.4					TOOD: Task-aligned One-stage Object Detection		2021
126	Faster R-CNN (HRNetV2p-W48)	42.4	63.6	46.4	24.9	44.6	53.0		Deep High-Resolution Representation Learning for Visual Recognition		2019
127	HSD (Rest101, 768x768, single-scale test)	42.3	61.2	46.9	22.8	47.3	55.9		Hierarchical Shot Detector		2019	single scale
128	CornerNet511 (Hourglass-104, multi-scale)	42.1	57.8	45.3	20.8	44.8	56.7		CornerNet: Detecting Objects as Paired Keypoints		2018	multiscale
129	FoveaBox (ResNeXt-101)	42.1							FoveaBox: Beyond Anchor-based Object Detector		2019	ResNeXt
130	FCOS (HRNet-W32-5l)	42.0	60.4	45.3	25.4	45.0	51.0		FCOS: Fully Convolutional One-Stage Object Detection		2019
131	RefineDet512+ (ResNet-101)	41.8	62.9	45.7	25.6	45.1	54.1		Single-Shot Refinement Neural Network for Object Detection		2017	ResNet
132	GHM-C + GHM-R (RetinaNet-FPN-ResNeXt-101)	41.6	62.8	44.2	22.3	45.1	55.3		Gradient Harmonized Single-stage Detector		2018	FPN
133	CenterNet-DLA (DLA-34, multi-scale)	41.6			21.5	43.9	56.0		Objects as Points		2019	multiscale
134	RetinaNet (SpineNet-49S, 640x640)	41.5	60.5	44.6	23.3	45	58		SpineNet: Learning Scale-Permuted Backbone for Recognition and Localization		2019
135	RPDet (ResNet-101)	41	62.9	44.3	23.6	44.1	51.7		RepPoints: Point Set Representation for Object Detection		2019	ResNet
136	M2Det (VGG-16, single-scale)	41.0	59.7	45	22.1	46.5	53.8		M2Det: A Single-Shot Object Detector based on Multi-Level Feature Pyramid Network		2018	single scale
137	FSAF (ResNet-101, single-scale)	40.9	61.5	44	24	44.2	51.3		Feature Selective Anchor-Free Module for Single-Shot Object Detection		2019	single scale ResNet
138	RetinaNet (ResNeXt-101-FPN)	40.8	61.1	44.1	24.1	44.2	51.2		Focal Loss for Dense Object Detection		2017	ResNeXt FPN
139	Cascade R-CNN (ResNet-50-FPN+, cascade)	40.6	59.9	44	22.6	42.7	52.1		Cascade R-CNN: Delving into High Quality Object Detection		2017	FPN ResNet
140	Faster R-CNN (Cascade RPN)	40.6	58.9	44.5	22.0	42.8	52.6		Cascade RPN: Delving into High-Quality Region Proposal Network with Adaptive Convolution		2019
141	ResNet-50-DW-DPN (Deformable Kernels)	40.6			24.6	43.9	53.3		Deformable Kernels: Adapting Effective Receptive Fields for Object Deformation		2019	ResNet
142	IoU-Net	40.6							Acquisition of Localization Confidence for Accurate Object Detection		2018
143	FCOS (HRNetV2p-W48)	40.5	59.3		23.4	42.6	51.0		Deep High-Resolution Representation Learning for Visual Recognition		2019
144	ResNet-50-FPN Mask R-CNN + KL Loss + var voting + soft-NMS	40.4							Bounding Box Regression with Uncertainty for Accurate Object Detection		2018	FPN ResNet
145	RDSNet (ResNet-101, RetinaNet, mask, MBRM)	40.3	60.1	43	22.1	43.5	51.5		RDSNet: A New Deep Architecture for Reciprocal Object Detection and Instance Segmentation		2019	ResNet
146	ExtremeNet (Hourglass-104, single-scale)	40.2	55.5	43.2	20.4	43.2	53.1		Bottom-up Object Detection by Grouping Extreme and Center Points		2019	single scale
147	Mask R-CNN (ResNet-101-FPN, CBN)	40.1	60.5	44.1	35.8	57.3	38.5		Cross-Iteration Batch Normalization		2020	FPN ResNet
148	Fast R-CNN (Cascade RPN)	40.1	59.4	43.8	22.1	42.4	51.6		Cascade RPN: Delving into High-Quality Region Proposal Network with Adaptive Convolution		2019
149	Mask R-CNN (ResNeXt-101-FPN)	39.8	62.3	43.4	22.1	43.2	51.2		Mask R-CNN		2017	ResNeXt FPN
150	GA-Faster-RCNN	39.8	59.2	43.5	21.8	42.6	50.7		Region Proposal by Guided Anchoring		2019
151	FPN (ResNet101 backbone)	39.5							ChainerCV: a Library for Deep Learning in Computer Vision		2017	FPN ResNet
152	RetinaMask (ResNet-50-FPN)	39.4	58.6	42.3	21.9	42.0	51.0		RetinaMask: Learning to predict masks improves state-of-the-art single-shot detection for free		2019	FPN ResNet
153	PP-YOLO (320x320)	39.3	59.3	42.7	16.7	41.4	57.8		PP-YOLO: An Effective and Efficient Implementation of Object Detector		2020	YOLO
154	AA-ResNet-10 + RetinaNet	39.2							Attention Augmented Convolutional Networks		2019
155	MAL (ResNet50, single-scale)	39.2							Multiple Anchor Learning for Visual Object Detection		2019	single scale ResNet
156	RetinaNet (ResNet-101-FPN)	39.1	59.1	42.3	21.8	42.7	50.2		Focal Loss for Dense Object Detection		2017	FPN ResNet
157	Cascade R-CNN (ResNet-101-FPN+)	38.8	61.1	41.9	21.3	41.8	49.8		Cascade R-CNN: Delving into High Quality Object Detection		2017	FPN ResNet
158	M2Det (ResNet-101, single-scale)	38.8	59.4	41.7	20.5	43.9	53.4		M2Det: A Single-Shot Object Detector based on Multi-Level Feature Pyramid Network		2018	single scale ResNet
159	SaccadeNet (DLA-34-DCN)	38.5	55.6	41.4	19.2	42.1	50.6		SaccadeNet: A Fast and Accurate Object Detector		2020	DCN
160	Mask R-CNN (ResNet-101-FPN)	38.2	60.3	41.7	20.1	41.1	50.2		Mask R-CNN		2017	FPN ResNet
161	WSMA-Seg	38.1							Segmentation is All You Need		2019
162	Faster R-CNN + FPN + CGD	37.9							Compact Global Descriptor for Neural Networks		2019	FPN
163	CornerNet511 (Hourglass-52, single-scale)	37.8	53.7	40.1	17.0	39.0	50.5		CornerNet: Detecting Objects as Paired Keypoints		2018	single scale
164	RefineDet512+ (VGG-16)	37.6	58.7	40.8	22.7	40.3	48.3		Single-Shot Refinement Neural Network for Object Detection		2017
165	DeformConv-R-FCN (Aligned-Inception-ResNet)	37.5	58.0		19.4	40.1	52.5		Deformable Convolutional Networks		2017
166	Faster R-CNN (ImageNet+300M)	37.4	58	40.1	17.5	41.1	51.2		Revisiting Unreasonable Effectiveness of Data in Deep Learning Era		2017
167	Mask R-CNN (Bottleneck-injected ResNet-50, FPN)	36.9							torchdistill: A Modular, Configuration-Driven Framework for Knowledge Distillation		2020	FPN ！！ResNet
168	Faster R-CNN + TDM	36.8							Beyond Skip Connections: Top-Down Modulation for Object Detection		2016
169	Cascade R-CNN (ResNet-50-FPN+)	36.5	59	39.2	20.3	38.8	46.4		Cascade R-CNN: Delving into High Quality Object Detection		2017	FPN; ResNet
170	RefineDet512 (ResNet-101)	36.4	57.5	39.5	16.6	39.9	51.4		Single-Shot Refinement Neural Network for Object Detection		2017	ResNet
171	Faster R-CNN + FPN	36.2							Feature Pyramid Networks for Object Detection		2016	FPN
172	Faster R-CNN (Bottleneck-injected ResNet-50 and FPN)	35.9							torchdistill: A Modular, Configuration-Driven Framework for Knowledge Distillation		2020	FPN; ResNet
173	Faster R-CNN (box refinement, context, multi-scale testing)	34.9							Deep Residual Learning for Image Recognition		2015	multiscale
174	Faster R-CNN	34.7							Speed/accuracy trade-offs for modern convolutional object detectors		2016
175	CornerNet-Squeeze	34.4							CornerNet-Lite: Efficient Keypoint Based Object Detection		2019
176	MultiPath Network	33.2							A MultiPath Network for Object Detection		2016
177	ION	33.1	55.7	34.6	14.5	35.2	47.2		Inside-Outside Net: Detecting Objects in Context with Skip Pooling and Recurrent Neural Networks		2015
178	RefineDet512 (VGG-16)	33	54.5	35.5	16.3	36.3	44.3		Single-Shot Refinement Neural Network for Object Detection		2017
179	YOLOv3 + Darknet-53	33.0							YOLOv3: An Incremental Improvement		2018	YOLO
180	SSD512	28.8	48.5	30.3					SSD: Single Shot MultiBox Detector		2015
181	MnasFPN (MobileNetV2)	26.1							MnasFPN: Learning Latency-aware Pyramid Architecture for Object Detection on Mobile Devices		2019	FPN
182	ESPNetv2-512	26.0							ESPNetv2: A Light-weight, Power Efficient, and General Purpose Convolutional Neural Network		2018
183	MnasFPN (MobileNetV3)	25.5							MnasFPN: Learning Latency-aware Pyramid Architecture for Object Detection on Mobile Devices		2019	FPN
184	MnasFPN (MNASNet-B1)	24.6							MnasFPN: Learning Latency-aware Pyramid Architecture for Object Detection on Mobile Devices		2019	FPN
185	MnasFPN x0.7 (MobileNetV2)	23.8							MnasFPN: Learning Latency-aware Pyramid Architecture for Object Detection on Mobile Devices		2019	FPN
186	MobielNet-v1-SSD-300x300+CGD	21.4							Compact Global Descriptor for Neural Networks		2019
187	Fast-RCNN	19.7							Fast R-CNN		2015
188	MobileNet	19.3							MobileNets: Efficient Convolutional Neural Networks for Mobile Vision Applications		2017
189	DAT-S (RetinaNet)		69.6	51.2	32.3	51.8	63.4	47.9	Vision Transformer with Deformable Attention		2022
190	CenterMask-VoVNet99 (multi-scale)		68.3	53.2	32.4		60.0		CenterMask : Real-Time Anchor-Free Instance Segmentation		2019	multiscale
191	Mask R-CNN (HRNetV2p-W32 + cascade)		62.5	48.6			56.3		Deep High-Resolution Representation Learning for Visual Recognition		2019
192	FoveaBox (ResNeXt-101)		61.9	45.2		46.8			FoveaBox: Beyond Anchor-based Object Detector		2019	ResNeXt
193	VirTex Mask R-CNN (ResNet-50-FPN)		61.7	44.8					VirTex: Learning Visual Representations from Textual Annotations		2020	FPN; ResNet
194	Centermask + ResNet101		61.6	46.9					CenterMask : Real-Time Anchor-Free Instance Segmentation		2019	ResNet
195	PAFNet (ResNet50-vd)		59.8	45.3	22.8	45.8	59.2		PAFNet: An Efficient Anchor-Free Object Detector Guidance		2021	ResNet
196	IoU-Net+EnergyRegression		58.5	41.8					Energy-Based Models for Deep Probabilistic Regression		2019
197	Cascade R-CNN (HRNetV2p-W48)			48.6	26.0	47.3	56.3		Deep High-Resolution Representation Learning for Visual Recognition		2019
198	ISTR (ResNet50-FPN-3x, single-scale)				27.8	48.7	59.9		ISTR: End-to-End Instance Segmentation with Transformers		2021
199	FoveaBox (ResNeXt-101)				24.9				FoveaBox: Beyond Anchor-based Object Detector		2019	ResNeXt
200	EfficientDet-D7x (single-scale)					57.9			EfficientDet: Scalable and Efficient Object Detection		2019	single scale

posted @ 2022-02-16 18:12 Xu_Lin 阅读(1077) 评论(0) 编辑收藏举报

刷新页面返回顶部

Xu_Lin

Do more; Learn more; Be more

基于COCO数据集验证的目标检测算法天梯排行榜

基于COCO数据集验证的目标检测算法天梯排行榜

AP50

AP75

APS

公告