文档识别论文

第一次整理

题目	链接	摘要	代码
FOTS: Fast oriented text spotting with a unified network	链接	商汤，实现了端到端多方向文字的检测和识别。	https://github.com/xieyufei1993/FOTS
WordSup: Exploiting Word Annotations for Character Based Text Detection.	链接	现存的字符级别的标定太少,引入弱监督来调整字符坐标的定位。	暂无
Towards end-to-end license plate detection and recognition: A large dataset and baseline	链接	使用 CNN 网络端对端解决车牌识别问题	暂无
CLEval: Character-Level Evaluation for Text Detection and Recognition Tasks	链接	提出了一个字符级评价度量(CLEval)	https://github.com/clovaai/CLEval
ABCNet: Real-time Scene Text Spotting with Adaptive Bezier-Curve Network	链接	ABCNet:基于自适应贝兹曲线网络的实时场景文本识别	https://github.com/Yuliang-Liu/bezier_curve_text_spotting
Towards Robust Curve Text Detection with Conditional Spatial Expansion	链接	提出了条件空间膨胀(CSE)机制解决检测不规则形状和尺度的曲线文字问题。	无
Look More Than Once: An Accurate Detector for Text of Arbitrary Shapes Screen reader support enabled	链接	提出的文本检测算法（LOMO）主要致力于去解决极端长文本与任意形状的文本检测问题	无
Rotation-Sensitive Regression for Oriented Scene Text Detection	链接	对自然场景中的文字检测，分类问题对于旋转不敏感，但回归问题对于旋转是敏感的，因此两个任务应当采用不同的特征。之前的方法对两个任务均采用共享特征，导致性能下降。因此提出采用不同的特征来进行分类和回归任务，回归分支网络通过旋转卷积核提取旋转敏感的特征，分类分支通过池化旋转敏感性特征来提取旋转不变性特征。效果有较大提升。	https://github.com/MhLiao/RRD
Geometry-Aware Scene Text Detection with Instance Transformation Network	链接	基于实例变换网络的几何感知场景文本检测	https://github.com/LianaWang/itn
A new feature pyramid network for object detection	链接	本文提出一种Feature Pyramid Networks（FPN）网络结构，能够在不影响速度的前提下融合多层特征，使每个level的特征都具有丰富的语义信息，提高CNN网络特征提取能力。理论上，FPN在CNN方法中是一个通用的方法。	暂无
Detecting Oriented Text in Natural Images by Linking Segments	https://arxiv.org/abs/1703.06520	实现把文本行框出来,通过链接段检测自然图像中的定向文本	https://github.com/dengdan/seglink
Canny Text Detector: Fast and Robust Scene Text Localization Algorithm	链接	主要是解决图像中的文字定位问题的。将每个文字看做 Canny 算法中的边缘像素,用 Canny 边缘提取的思路来检测文字。	暂无
GA-DAN: Geometry-Aware Domain Adaptation Network for Scene Text Detection and Recognition	链接	提出了一种新的方法基于几何感知的领域自适应网络，该网络能够同时在几何空间和外观空间上建立跨领域移动模型，以及能够真实地转换有不同特征的跨领域图像。	暂无
Symmetry-constrained Rectification Network for Scene Text Recognition	链接	将文本的对称限制引入到文本校正网络中，显著提高了场景文本识别的精度。	暂无
EnhanceNet: Single Image Super-Resolution Through Automated Texture Synthesis	https://doi.org/10.1109/ICCV.2017.481	提出了一种结合纹理损失的自动纹理合成的新颖应用，该损失专注于创建逼真的纹理，而不是针对训练过程中像素精度的地面真实图像再现进行优化。	https://github.com/msmsajjadi/EnhanceNet-Code
PlugNet: Degradation Aware Scene Text Recognition Supervised by a Pluggable Super-Resolution Unit	链接	低质退化文本识别算法PlugNet	暂无
Scene Text Image Super-resolution in the wild	链接	提出一个专门用来进行文本超分辨的数据集，并且提出了一个专门用来进行文本超分辨的网络。	暂无
Acquisition of Localization Confidence for Accurate Object Detection	链接	重复框抑制、非线性的位置信息回归	https://github.com/vacancy/PreciseRoIPooling
Enhancing Place Recognition Using Joint Intensity - Depth Analysis and Synthetic Data	链接	利用联合强度-深度分析和综合数据增强地点识别	暂无
PixelLink: Detecting Scene Text via Instance Segmentation	链接	浙江大学在2018年AAAI发表的一篇文章。文章的总体思路还是先用分割的方法来获得文本区域	https://github.com/ZJULearning/pixel_link
Deep TextSpotter: An End-to-End Trainable Scene Text Localization and Recognition Framework	链接	整体流程，输入图片先进行一个基于YOLOv2 的全卷积网络，然后经过RPN网络，输出经过NMS过滤后的ROI边框，然后根据该边框在最后一层卷积层上通过类似于STN的方式映射出高度固定的patch块。然后基于ctc进行识别。再根据识别的结果进行NMS过滤，得到最终结果。	https://github.com/MichalBusta/DeepTextSpotter
ESIR: End-to-end Scene Text Recognition via Iterative Image Rectification	链接		https://github.com/cassie1728/ESIR-a-little-impove
What Machines See Is Not What They Get: Fooling Scene Text Recognition Models with Adversarial Text Images	链接	利用对抗性文本图像欺骗场景文本识别模型	无
An accurate segmentation-based scene text detector with context attention and repulsive text border	链接	提出了一个精确的基于分块的检测器，它具有上下文注意和排斥的文本边界。	无
Handwriting Recognition in Low-resource Scripts using Adversarial Learning	链接	提出了一个基于生成对抗学习的小样本手写字符识别方法	https://github.com/AyanKumarBhunia/Handwriting_Recogition_using_Adversarial_Learning
Efficientnet: Rethinking model scaling for convolutional neural networks	链接	Google出品，ImageNet新的State-of-the-art	https://github.com/tensorflow/tpu/tree/master/models/official/efficientnet
Multi-oriented Scene Text Detection via Corner Localization and Region Segmentation	链接	多方位	无
Deep Matching Prior Network : Toward Tighter Multi-oriented Text Detection	链接	多方位	暂无
Beyond Deep Residual Learning for Image Restoration	链接	超越深度残差学习的图像恢复方法	暂无
Inverse compositional spatial transformer networks	链接	本文提出的空间变换网络STN(Spatial Transformer Networks)可以使得模型具有空间不变性。	https://github.com/chenhsuanlin/inverse-compositional-STN
Multi-oriented Text Detection with Fully Convolutional Networks	链接	多方位	暂无
Deep Residual Learning for Image Recognition	链接	图像识别：深度残差网络	https://github.com/KaimingHe/deep-residual-networks
Single shot text detector with regional attention	链接	Attention机制强化文字特征，引入Inception来增强detector对文字大小的鲁棒性	https://github.com/BestSonny/SSTD
Chinese Street View Text: Large-scale Chinese Text Reading with Partially Supervised Learning	链接	提出新的中文街景数据集：C-SVT 和场景文本定位&识别新网络：End2End-PSL，表现 SOTA	暂无
Wetext: Scene text detection under weak supervision	链接	用半监督和无监督来学习字符分类器，解决字符标注数据量少的问题	暂无
Sequential Deformation for Accurate Scene Text Detection	链接	CNN模型对于文本检测的框的geometry variance的覆盖范围是有限的	暂无
RobustScanner: Dynamically Enhancing Positional Clues for Robust Text Recognition	链接	给予网络位置信息来解决attention漂移的问题	暂无
Using Object Information for Spotting Text	链接	利用物体识别文字	暂无
Scene Text Detection and Recognition: The Deep Learning Era	链接	深度学习时代的场景文本检测和识别
Detecting curve text in the wild: New dataset and new solution	链接	第一篇做曲文检测，还提出一个数据集CTW1500。使用14个点多边形来表示曲文。提出了一个结合CNN-RPN+RNN的检测方法专门做曲文检测	暂无
STN-OCR: A single Neural Network for Text Detection and Text Recognition	链接	STN-OCR	https://github.com/Bartzi/stn-ocr
Towards accurate scene text recognition with semantic reasoning networks	链接	本文提出了一种用于精确场景文本识别的端到端的可训练框架——语义推理网络(SRN)	https://github.com/chenjun2hao/SRN.pytorch
Scatter: Selective context attentional scene text recognizer	链接	介绍了一种新的场景文本识别体系结构——选择性上下文注意文本识别器(SCATTER)	https://github.com/scottwedge/cvpr20-scatter-text-recognizer
Tightness-aware Evaluation Protocol for Scene Text Detection	链接	场景文本检测的紧密性感知评估协议	https://github.com/Yuliang-Liu/TIoU-metric
Learning Shape-Aware Embedding for Scene Text Detection	链接	一种基于实例分割以及嵌入特征的文本检测方法	无
Edit probability for scene text recognition	链接	用于场景文本识别的编辑概率	无
A-FaST-RCNN: Hard positive generation via adversary for object detection	链接	提出使用对抗生成有遮挡或形变的样本，这些样本对检测器来说比较困难，使用这些困难的正样本训练可以增加检测器的鲁棒性。与Fast-RCNN比较，在VOC2007上，mAP增加了2.3%，VOC2012上增加了2.6%。	暂无
Multi-scale FCN with Cascaded Instance Aware Segmentation for Arbitrary Oriented Word Spotting In The Wild	链接	具有级联实例感知分割的多尺度FCN，用于在野外任意定向的单词识别	暂无
Unambiguous text localization and retrieval for cluttered scenes	链接	对于混乱的场景，明确的文本定位和检索	暂无
Robust Scene Text Recognition with Automatic Rectification	链接	本文提出了RARE（Robust text recognizer with Automatic REctification），一个对于不规则的文字具有鲁棒性的识别模型。RARE是一个深度神经网络，包括一个空间变换网络Spatial Transformer Network (STN)和一个序列识别网络Sequence Recognition Network (SRN)，两个网络同时用BP算法进行训练	https://github.com/guojm14/TPS-SRN-tensorflow
Recursive Recurrent Nets with Attention Modeling for OCR in the Wild	链接	在野外OCR的注意建模递归递归网络	暂无
SPGNet: Semantic Prediction Guidance for Scene Parsing	链接	SPGNet：场景分割的语义预测指南	暂无
Visual Semantic Reasoning for Image-Text Matching	链接	图像文本匹配，通过将两者映射同一嵌入空间，推断出一个完整句子和图像之间的相似度。	https://github.com/KunpengLi1994/VSRN
Dynamic Context Correspondence Network for Semantic Alignment	链接	建立语义对应关系。以一种灵活的方式合并全局语义上下文，以克服以往依赖于局部语义表示的工作的局限性	https://github.com/ShuaiyiHuang/DCCNet
Self-Organized Text Detection with Minimal Post-processing via Border Learning.	https://doi.org/10.1109/ICCV.2017.535	第一次提出了Border Learning的概念。作者认为传统的文本检测中将文字框和背景部分的像素分为两种类别，直接使用分割很容易会导致文本框之间出现粘连的情况，	https://github.com/saicoco/tf-sotd
AutoSTR: Efficient Backbone Search for Scene Text Recognition	链接	使用网络结构搜索（NAS）技术来自动化设计文本识别网络中的特征序列提取器，以提升文本识别任务的性能	https://github.com/AutoML-4Paradigm/AutoSTR
Accurate Scene Text Detection Through Border Semantics Awareness and Bootstrapping	链接	本文方法是直接回归的方法，除了学习text/non-text分类任务，四个点到边界的回归任务（类似EAST），还增加了四条边界的border学习任务，最后输出不是直接用prediction的bounding box，而是用了text score map和四个border map来获得textline。	暂无
Verisimilar Image Synthesis for Accurate Detection and Recognition of Texts in Scenes	链接	主要通过人工生成数据的方式来辅助训练	https://github.com/GodOfSmallThings/Verisimilar-Image-Synthesis-for-Accurate-Detection-and-Recognition-of-Texts-in-Scenes
Detecting Text in Natural Image with Connectionist Text Proposal Network	链接	检测一个一个小的，固定宽度的文本段，然后在后处理部分再将这些小的文本段连接起来，得到文本行。	https://github.com/tianzhi0549/CTPN
Feature Enhancement Network: A Refined Scene Text Detector
A method for detecting text of arbitrary shapes in natural scenes that improves text spotting	链接	本文介绍了一个基于流水线的文本识别框架，该框架可以在复杂背景的自然场景图像中检测和识别各种字体、形状和方向的文本。	https://github.com/wqtwjt1996/UHT
An OCR for Classical Indic Documents Containing Arbitrarily Long Words	链接	长文本古典印度文文档OCR	https://github.com/ihdia/sanskrit-ocr
Shape Robust Text Detection with Progressive Scale Expansion Network(文本检测)	链接	基于渐进式尺寸可扩展网络的形状鲁棒文本检测	https://github.com/WenmuZhou/PSENet.pytorch
Character Region Awareness for Text Detection	链接	利用字符分割进行文本检测	https://github.com/clovaai/CRAFT-pytorch
An End-to-End TextSpotter with Explicit Alignment and Attention	链接	一个简单高效的框架，它能在一个统一的架构当中连续性的处理两个任务	无
Single Shot TextSpotter with Explicit Alignment and Attention	链接	具有明确对齐和注意力的单发文本观测者	https://github.com/tonghe90/textspotter
Deep matching prior network: Toward tighter multi-oriented text detection	链接	使用四边形滑动窗口来对文本进行粗略的检测，并提出一种共享的蒙特卡洛方法来对多边形面积进行快速而精确的计算，然后基于一种序贯协议确定四个点的顺序，并回归最终的四边形预测结果。	暂无
East: An Efficient and Accurate Scene Text Detector	链接	一种高效、准确的场景文本检测器	https://github.com/songdejia/EAST
Textboxes++: A single-shot oriented scene text detector	[链接](!https://arxiv.org/pdf/1801.02765	旋转文本检测
Towards Unconstrained End-to-End Text Spotting	链接	利用卷积递归神经网络实现端到端的文本识别	暂无
A Novel Integrated Framework for Learning both Text Detection and Recognition	链接	集检测和识别一体	暂无
An End-to-End OCR Text Re-organization Sequence Learning for Rich-text Detail Image Comprehension	链接	一种端到端的OCR文本重组序列学习，用于丰富的文本细节图像理解	暂无
Adaptive Text Recognition through Visual Matching	https://arxiv.org/abs/2009.06610	解决文档识别中的文本识别的多样性和泛化性问题。通过视觉匹配的方法来做文本识别。	https://github.com/Chuhanxx/FontAdaptor
Mask TextSpotter v3: Segmentation Proposal Network for Robust Scene Text Spotting	https://arxiv.org/abs/2007.09482	用于场景文字检测和识别的分割Proposal网络	https://github.com/MhLiao/MaskTextSpotterV3
AE TextSpotter: Learning Visual and Linguistic Representation for Ambiguous Text Spotting	https://arxiv.org/abs/2008.00714	通过添加语言信息，并将语言信息结合到网络的训练过程中，来辅助视觉信息	暂无
Character Region Attention For Text Spotting	https://arxiv.org/abs/2007.09629	本文提出的CRAFTS网络(Character Region Attention For Text Spotting)可分为三个阶段; 检测阶段,共享阶段和识别阶段。	暂无
Mask textspotter: An end-to-end trainable neural network for spotting text with arbitrary shapes	链接	利用了端对端学习流程的简单和顺利的优势,通过语义分割获得更准确的文本检测和识别。	https://github.com/lvpengyuan/masktextspotter.caffe2
TextSnake: A Flexible Representation for Detecting Text of Arbitrary Shapes	链接	为了解决任意形状文本的检测难题，文章提出了基于全卷积网络（Fully Convolutional Network，FCN）形式的文本检测器TextSnake，它是基于分割的文本检测方法。在该模型中将文本实例通过一串有序且重叠的圆盘组成，每个圆盘都有决定其属性的半径（文本宽度）与方向（文本方向），从而可以自由检测出任意形状的文本区域。	https://github.com/princewang1994/TextSnake.pytorch
SSD: Single Shot MultiBox Detector	https://arxiv.org/pdf/1512.02325.pdf	在既保证速度，又要保证精度的情况下，提出了 SSD 物体检测模型，与现在流行的检测模型一样，将检测过程整个成一个 single deep neural network。便于训练与优化，同时提高检测速度。	https://github.com/sgrvinod/a-PyTorch-Tutorial-to-Object-Detection

一些链接

https://github.com/hwalsuklee/awesome-deep-text-detection-recognition

Text Detection

Conf.	Date	Title	IC13	IC15	Resources
'14-ECCV	14/10/07	Robust Scene Text Detection with Convolution Neural Network Induced MSER Trees
15-CVPR	15/06/01	Symmetry-based text line detection in natural scenes	0.8043		`PRJ` `CODE`
'16-TIP	15/10/12	Text-Attentional Convolutional Neural Networks for Scene Text Detection	0.8165
'15-ICCV	15/12/13	Text Flow : A Unified Text Detection System in Natural Scene Images	0.8025
'16-arXiv	16/03/31	Accurate Text Localization in Natural Image with Cascaded Convolutional TextNetwork	0.86
'16-CVPR	16/04/14	Multi-Oriented Text Detection with Fully Convolutional Networks	0.83	0.54	`*TORCH(M)`
'16-CVPR	16/04/22	Synthetic Data for Text Localisation in Natural Images	0.847 (L)0.8359		`CODE` `DB`
'16-arXiv	16/06/29	Scene Text Detection Via Holistic, Multi-Channel Prediction	0.8433	0.6477
'16-ECCV	16/09/12	Detecting Text in Natural Image with Connectionist Text Proposal Network	0.8215	0.6085	`*CAFFE(M)` `CAFFE` `TF(M)` `TF` `DEMO` `BLOG(CH)`
'17-AAAI	16/11/21	TextBoxes: A fast text detector with a single deep neural network	0.85 (L)0.8767		`*CAFFE(M)` `TF` `BLOG(KR)`
'18-TM	17/03/03	Arbitrary-Oriented Scene Text Detection via Rotation Proposals	0.9125	0.8020	`*CAFFE`
'17-CVPR	17/03/04	Deep Matching Prior Network: Toward Tighter Multi-oriented Text Detection		0.7064
'17-CVPR	17/03/19	Detecting Oriented Text in Natural Images by Linking Segments	0.853	0.75 (L)0.7636	`*TF(M)` `TF(M)` `SLIDE` `VIDEO`
'17-arXiv	17/03/24	Deep Direct Regression for Multi-Oriented Scene Text Detection	0.86	0.81
'17-arXiv	17/04/03	Cascaded Segmentation-Detection Networks for Word-Level Text Spotting	0.86	0.71
'17-CVPR	17/04/11	EAST: An Efficient and Accurate Scene Text Detector		0.8072 (L)0.8038	`TF(M)` `TF` `PYTORCH(M)` `PYTORCH` `DEMO` `KERAS(M)` `VIDEO`
'17-ICIP	17/05/15	WordFence: Text Detection in Natural Images with Border Awareness	0.86
'17-arXiv	17/06/30	R2CNN: Rotational Region CNN for Orientation Robust Scene Text Detection	0.8773	0.8254	`TF(M)` `CAFFE(M)`
'17-CVPR	17/07/21	Multi-scale FCN with Cascaded Instance Aware Segmentation for Arbitrary Oriented Word Spotting In The Wild	0.85	0.63
'17-arXiv	17/08/17	Deep Scene Text Detection with Connected Component Proposals	0.919
'17-ICCV	17/08/22	WordSup: Exploiting Word Annotations for Character based Text Detection	0.9064	0.7816
'17-ICCV	17/09/01	Single Shot Text Detector with Regional Attention	0.8704	0.7691	`*CAFFE(M)` `PYTORCH` `VIDEO`
'17-arXiv	17/09/11	Fused Text Segmentation Networks for Multi-oriented Scene Text Detection		0.8414
'17-ICCV	17/10/13	WeText: Scene Text Detection under Weak Supervision	0.869 (L)0.8313
'17-ICCV	17/10/22	Self-organized Text Detection with Minimal Post-processing via Border Learning	0.84		`*KERAS(M)`
'17-ICDAR	17/11/11	Deep Residual Text Detection Network for Scene Text	0.9117 (L)0.8925
'18-AAAI	17/11/12	Feature Enhancement Network: A Refined Scene Text Detector	0.9161
'17-arXiv	17/11/30	ArbiText: Arbitrary-Oriented Text Detection in Unconstrained Scene		0.759
'18-AAAI	18/01/04	PixelLink: Detecting Scene Text via Instance Segmentation	0.881	0.8519	`*TF(M)` `TF`
'18-CVPR	18/01/05	FOTS: Fast Oriented Text Spotting with a Unified Network	0.925	0.8984	`PYTORCH` `PYTORCH` `VIDEO`
'18-TIP	18/01/09	TextBoxes++: A Single-Shot Oriented Scene Text Detector	0.88	0.829 (L)0.8475	`*CAFFE(M)`
'18-CVPR	18/02/27	Multi-Oriented Scene Text Detection via Corner Localization and Region Segmentation	0.88	0.843	`*PYTORCH(M)`
'18-CVPR	18/03/09	An end-to-end TextSpotter with Explicit Alighment and Attention	0.9	0.87	`*CAFFE(M)`
'18-CVPR	18/03/14	Rotation-Sensitive Regression for Oriented Scene Text Detection	0.89	0.838	`*CAFFE(M)`
'18-arXiv	18/04/08	Detecting Multi-Oriented Text with Corner-based Region Proposals	0.876	0.845	`*CAFFE(M)`
'18-arXiv	18/04/24	An Anchor-Free Region Proposal Network for Faster R-CNN based Text Detection Approaches	0.92	0.86
'18-IJCAI	18/05/03	IncepText: A New Inception-Text Module with Deformable PSROI Pooling for Multi-Oriented Scene Text Detection		0.9047
'18-arXiv	18/06/07	Shape Robust Text Detection with Progressive Scale Expansion Network		0.8721	`PRJ`
'18-ECCV	18/07/04	TextSnake: A Flexible Representation for Detecting Text of Arbitrary Shapes		0.826	`PYTORCH`
'18-ECCV	18/07/06	Mask TextSpotter: An End-to-End Trainable Neural Network for Spotting Text with Arbitrary Shapes	0.917	0.86
'18-ECCV	18/07/10	Accurate Scene Text Detection through Border Semantics Awareness and Bootstrapping	0.892
'19-AAAI	18/11/21	Scene Text Detection with Supervised Pyramid Context Network	0.921	0.872
'19-TIP	18/12/04	TextField: Learning A Deep Direction Field for Irregular Scene Text Detection		0.824	`*CAFFE(M)`
'19-CVPR	19/03/21	Towards Robust Curve Text Detection with Conditional Spatial Expansion
'19-CVPR	19/03/28	Shape Robust Text Detection with Progressive Scale Expansion Network		0.857	`TF(M)`
'19-CVPR	19/04/03	Character Region Awareness for Text Detection	0.952	0.869	`*PYTORCH(M)` `VIDEO` `PYTORCH` `TF(M)` `KERAS` `BLOG_CH` `BLOG_KR` `BLOG_KR` `BLOG_KR`
'19-CVPR	19/04/13	Look More Than Once: An Accurate Detector for Text of Arbitrary Shapes Screen reader support enabled		0.877
'19-CVPR	19/06/16	Learning Shape-Aware Embedding for Scene Text Detection		0.877
'19-CVPR	19/06/16	Arbitrary Shape Scene Text Detection with Adaptive Text Region Representation	0.917	0.876
'19-ICCV	19/08/16	Efficient and Accurate Arbitrary-Shaped Text Detection with Pixel Aggregation Network		0.829
'19-ICCV	19/09/02	Geometry Normalization Networks for Accurate Scene Text Detection		0.8852
'19-AAAI	19/11/20	Real-time Scene Text Detection with Differentiable Binarization		0.847

Text Recognition

Conf.	Date	Title	SVT	IIIT5k	IC03	IC13	Resources
'15-ICLR	14/12/18	Deep structured output learning for unconstrained text recognition	0.717		0.896	0.818	`TF` `SLIDE` `VIDEO`
'16-IJCV	15/05/07	Reading text in the wild with convolutional neural networks	0.807		0.933	0.908	`KERAS`
'16-AAAI	15/06/14	Reading Scene Text in Deep Convolutional Sequences
'17-TPAMI	15/07/21	An end-to-end trainable neural network for image-based sequence recognition and its application to scene text recognition	0.808	0.782	0.894	0.867	`TORCH(M)` `TF` `TF` `TF` `TF` `PYTORCH` `PYTORCH(M)` `BLOG(KR)`
'16-CVPR	16/03/09	Recursive Recurrent Nets with Attention Modeling for OCR in the Wild	0.807	0.784	0.887	0.9
'16-CVPR	16/03/12	Robust scene text recognition with automatic rectification	0.819	0.819	0.901	0.886	`PYTORCH` `PYTORCH`
'16-CVPR	16/06/27	CNN-N-Gram for Handwriting Word Recognition	0.8362				`VIDEO`
'16-BMVC	16/09/19	STAR-Net: A SpaTial Attention Residue Network for Scene Text Recognition	0.836	0.833	0.899	0.891
'17-arXiv	17/07/27	STN-OCR: A single Neural Network for Text Detection and Text Recognition	0.798	0.86		0.903	`*MXNET(M)` `PRJ` `BLOG`
'17-IJCAI	17/08/19	Learning to Read Irregular Text with Attention Mechanisms
'17-arXiv	17/09/06	Scene Text Recognition with Sliding Convolutional Character Models	0.765	0.816	0.845	0.852
'17-ICCV	17/09/07	Focusing Attention: Towards Accurate Text Recognition in Natural Images	0.859	0.874	0.942	0.933
'18-CVPR	17/11/12	AON: Towards Arbitrarily-Oriented Text Recognition	0.828	0.87	0.915		`TF`
'17-NIPS	17/12/04	Gated Recurrent Convolution Neural Network for OCR	0.815	0.808	0.978		`*TORCH(M)`
'18-AAAI	18/01/04	Char-Net: A Character-Aware Neural Network for Distorted Scene Text Recognition	0.844	0.836	0.915	0.908
'18-AAAI	18/01/04	SqueezedText: A Real-time Scene Text Recognition by Binary Convolutional Encoder-decoder Network		0.87	0.931	0.929
'18-CVPR	18/05/09	Edit Probability for Scene Text Recognition	0.875	0.883	0.946	0.944
'18-TPAMI	18/06/25	ASTER: An Attentional Scene Text Recognizer with Flexible Rectification	0.936	0.934	0.945	0.918	`*TF(M)` `PYTORCH`
'18-ECCV	18/09/08	Synthetically Supervised Feature Learning for Scene Text Recognition	0.871	0.894	0.947	0.94
'19-AAAI	18/09/18	Scene Text Recognition from Two-Dimensional Perspective	0.821	0.92		0.914
'19-AAAI	18/11/02	Show, Attend and Read: A Simple and Strong Baseline for Irregular Text Recognition	0.845	0.915		0.91	`*TORCH(M)`
'19-CVPR	18/12/14	ESIR: End-to-end Scene Text Recognition via Iterative Image Rectification	0.902	0.933		0.913	PRJ
'19-PR	19/01/10	MORAN: A Multi-Object Rectified Attention Network for Scene Text Recognition	0.883	0.912	0.950	0.924	`*PYTORCH(M)`
'19-ICCV	19/04/03	What is wrong with scene text recognition model comparisons? dataset and model analysis	0.875		0.949	0.936	`*PYTORCH(M)` `BLOG_KR`
'19-CVPR	19/04/18	Aggregation Cross-Entropy for Sequence Recognition	0.826	0.823	0.921	0.897	`*PYTORCH`
'19-CVPR	19/06/16	Sequence-to-Sequence Domain Adaptation Network for Robust Text Image Recognition	0.845	0.838	0.921	0.918
'19-ICCV	19/08/06	Symmetry-constrained Rectification Network for Scene Text Recognition	0.889	0.944	0.95	0.939
'20-AAAI	19/12/28	TextScanner: Reading Characters in Order for Robust Scene Text Recognition	0.895	0.926		0.925
'20-AAAI	19/12/21	Decoupled Attention Network for Text Recognition	0.892	0.943	0.95	0.939	`*PYTORCH(M)`
'20-AAAI	20/02/04	GTC: Guided Training of CTC	0.929	0.955	0.952	0.943

End-to-End Text Recognition

Conf.	Date	Title	IC03	IC13	IC15	Resources
'12-ICPR	12/11/11	End-to-end text recognition with convolutional neural networks	0.67			`*CODE`
'14-ECCV	14/09/06	Deep Features for Text Spotting	0.75			`PRJ` `MATLAB`
'15-IJCV	15/05/07	Reading Text in the Wild with Convolutional Neural Networks	0.70	0.77		`KERAS`
'15-TPAMI	15/10/30	Real-time Lexicon-free Scene Text Localization and Recognition		0.542	0.156
'16-arXiv	16/04/10	TextProposals: a Text-specific Selective Search Algorithm for Word Spotting in the Wild		0.6843	0.4718 (L)0.533	`*CAFFE(M)`
'17-AAAI	16/11/21	TextBoxes: A fast text detector with a single deep neural network		0.84		`TF` `*CAFFE(M)` `BLOG_KR`
'17-ICCV	17/07/13	Towards End-to-end Text Spotting with Convolution Recurrent Neural Network		0.8459		`VIDEO`
'17-ICCV	17/10/22	Deep TextSpotter An End-to-End Trainable Scene Text Localization and Recognition Framework		0.77	0.47	`VIDEO` `*CAFFE(M)`
'18-CVPR	18/01/05	FOTS: Fast Oriented Text Spotting with a Unified Network		0.8477	0.6533	`VIDEO` `TF(M)`
'18-TIP	18/01/09	TextBoxes++: A Single-Shot Oriented Scene Text Detector		0.8465	0.519	`*CAFFE(M)`
'18-CVPR	18/03/09	An end-to-end TextSpotter with Explicit Alignment and Attention		0.86	0.63	`*CAFFE(M)`
'18-TPAMI	18/06/25	ASTER: An Attentional Scene Text Recognizer with Flexible Rectification			0.64	`*TF(M)`
'18-ECCV	18/07/06	Mask TextSpotter: An End-to-End Trainable Neural Network for Spotting Text with Arbitrary Shapes		0.865	0.624
'19-ICCV	19/08/24	Towards Unconstrained End-to-End Text Spotting			0.6994	`BLOG_KR`
'19-ICCV	19/10/17	Convolutional Character Networks			0.7108	`*PYTORCH(M)`
'19-ICCV	19/10/27	TextDragon: An End-to-End Framework for Arbitrary Shaped Text Spotting			0.6537
'20-AAAI	19/11/21	All You Need Is Boundary: Toward Arbitrary-Shaped Text Spotting		0.841	0.641
'20-AAAI	20/02/12	Text Perceptron: Towards End-to-End Arbitrary-Shaped Text Spotting		0.858	0.651

Others

Papers are sorted by published date.
*CODE means official code and CODE(M) means that trained model is provided.

Conf.	Date	Title	Description	Resources
'14-NIPS	14/06/09	Synthetic Data and Artificial Neural Networks for Natural Scene Text Recognition	Dataset	`PRJ`
'17-ECCV	17/02/13	End-to-End Interpretation of the French Street Name Signs Dataset	Dataset (FSNS)	`*TF(M)`
'17-arXiv	17/04/11	Attention-based Extraction of Structured Information from Street View Imagery	FSNS	`*TF(M)` `TF` `TF` `LUA` `BLOG_KR`
'17-CVPR	17/07/21	Unambiguous Text Localization and Retrieval for Cluttered Scenes	Text Retrieval
'17-AAAI	17/10/22	Detection and Recognition of Text Embedded in Online Images via Neural Context Models	Dataset	`PRJ`
'18-CVPR	17/11/17	Separating Style and Content for Generalized Style Transfer	Font Style
'17-arXiv	17/12/06	Detecting Curve Text in the Wild New Dataset and New Solution	Dataset (CTW 1500)	`PRJ`
'18-AAAI	17/12/14	SEE: Towards Semi-Supervised End-to-End Scene Text Recognition	FSNS	`PRJ` `*CHAINER(M)`
'17-CVPR	18/06/07	Learning to Extract Semantic Structure from Documents Using Multimodal Fully Convolutional Neural Networks	Document Layout	`PRJ`
'18-CVPR	18/06/19	DocUNet: Document Image Unwarping via A Stacked U-Net	Document Dewarping	`PRJ`
'18-CVPR	18/06/19	Document Enhancement using Visibility Detection	Document Enhancement	`PRJ`
'18-IJCAI	18/06/22	Multi-Task Handwritten Document Layout Analysis	Document Layout
'18-ECCV	18/07/09	Verisimilar Image Synthesis for Accurate Detection and Recognition of Texts in Scenes	Dataset	`PRJ`
'19-AAAI	18/12/03	EnsNet: Ensconce Text in the Wild	Text Removal	`DB`
'19-CVPR	18/12/14	Spatial Fusion GAN for Image Synthesis	Dataset	`DB`
'19-AAAI	19/01/27	Hierarchical Encoder with Auxiliary Supervision for Table-to-text Generation: Learning Better Representation for Tables	TableToText
'19-AAAI	19/01/27	A Radical-aware Attention-based Model for Chinese Text Classification	Chinese Character Classification
'19-CVPR	19/02/25	Handwriting Recognition in Low-resource Scripts using Adversarial Learning	Handwritting Recognition	`TF`
'19-CVPR	19/03/27	Tightness-aware Evaluation Protocol for Scene Text Detection	Evaluation	`CODE`
'19-ICCV	19/05/31	Scene Text Visual Question Answering	Dataset	`ICDAR_DB`
'19-CVPR	19/06/16	DynTypo: Example-based Dynamic Text Effects Transfer	Text Effects	`PRJ` `VIDEO`
'19-CVPR	19/06/16	Typography with Decor: Intelligent Text Style Transfer	Text Effects	`*PYTORCH(M)`
'19-CVPR	19/06/16	An Alternative Deep Feature Approach to Line Level Keyword Spotting	Kyeword Spotting
'19-ICCV	19/07/23	GA-DAN: Geometry-Aware Domain Adaptation Network for Scene Text Detection and Recognition	Domain Adaptation
'19-ICCV	19/09/17	Chinese Street View Text: Large-scale Chinese Text Reading with Partially Supervised Learning	Dataset	`ICDAR_DB`
'19-ICCV	19/10/02	Large-scale Tag-based Font Retrieval with Generative Feature Learning	Font Retrieval
'19-ICCV	19/10/27	TextPlace: Visual Place Recognition and Topological Localization Through Reading Scene Texts	Place Recognition	`DB`
'19-ICCV	19/10/27	DewarpNet: Single-Image Document Unwarping With Stacked 3D and 2D Regression Networks	Document Dewarping	`*PYTORCH(M)`

Other lists

OCR Paper Curation

Tutorial Materials

补充

A Cost Efficient Approach to Correct OCR Errors in Large Document Collections https://arxiv.org/pdf/1905.11739
Cascaded Segmentation-Detection Networks for Word-Level Text Spotting
Detecting Multi-Oriented Text with Corner-based Region Proposals
Detection and Recognition of Text Embedded in Online Images via Neural Context Models
DynTypo: Example-based Dynamic Text Effects Transfer
Towards End-to-end Text Spotting with Convolution Recurrent Neural Network
Semi-Synthetic Data Augmentation of Scanned Historical Documents
Attend, Copy, Parse – End-to-end information extraction from documents
A Spatio-Spectral Hybrid Convolutional Architecture for Hyperspectral Document Authentication
Discourse descriptor for document incremental classification, Comparison with Deep Learning
A Character Attention Generative Adversarial Network for Degraded Historical Document Restoration
A Robust Data Hiding Scheme using Generated Content for Securing Genuine Documents
Simultaneous Optimisation of Image Quality Improvement and Text Content Extraction from Scanned Documents
A New Document Image Quality Assessment Method Based on Hast Derivations
A meaningful information extraction system for interactive analysis of documents
An End-to-End trainable framework for joint optimization of document enhancement & recognition.
A Deep Transfer Learning Approach to Document Image Quality Assessment
Learning Free Document Image Binarization Based on Fast Fuzzy C-Means Clustering
A Robust Hybrid Approach for Textual Document Classification
Document Domain Adaptation with Generative Adversarial Networks
Chemical Structure Recognition (CSR) System: Automatic Analysis of 2D Chemical Structures in

Document Images
A Quality and Time Assessment of Binarization Algorithms for Scanned Documents
Blind Source Separation based Framework for Multispectral Document Image Binarization