文档识别论文

第一次整理

题目 链接 摘要 代码
FOTS: Fast oriented text spotting with a unified network 链接 商汤,实现了端到端多方向文字的检测和识别。 https://github.com/xieyufei1993/FOTS
WordSup: Exploiting Word Annotations for Character Based Text Detection. 链接 现存的字符级别的标定太少,引入弱监督来调整字符坐标的定位。 暂无
Towards end-to-end license plate detection and recognition: A large dataset and baseline 链接 使用 CNN 网络端对端解决 车牌识别问题 暂无
CLEval: Character-Level Evaluation for Text Detection and Recognition Tasks 链接 提出了一个字符级评价度量(CLEval) https://github.com/clovaai/CLEval
ABCNet: Real-time Scene Text Spotting with Adaptive Bezier-Curve Network 链接 ABCNet:基于自适应贝兹曲线网络的实时场景文本识别 https://github.com/Yuliang-Liu/bezier_curve_text_spotting
Towards Robust Curve Text Detection with Conditional Spatial Expansion 链接 提出了条件空间膨胀(CSE)机制解决检测不规则形状和尺度的曲线文字问题。
Look More Than Once: An Accurate Detector for Text of Arbitrary Shapes Screen reader support enabled 链接 提出的文本检测算法(LOMO)主要致力于去解决极端长文本与任意形状的文本检测问题
Rotation-Sensitive Regression for Oriented Scene Text Detection 链接 对自然场景中的文字检测,分类问题对于旋转不敏感,但回归问题对于旋转是敏感的,因此两个任务应当采用不同的特征。之前的方法对两个任务均采用共享特征,导致性能下降。因此提出采用不同的特征来进行分类和回归任务,回归分支网络通过旋转卷积核提取旋转敏感的特征,分类分支通过池化旋转敏感性特征来提取旋转不变性特征。效果有较大提升。 https://github.com/MhLiao/RRD
Geometry-Aware Scene Text Detection with Instance Transformation Network 链接 基于实例变换网络的几何感知场景文本检测 https://github.com/LianaWang/itn
A new feature pyramid network for object detection 链接 本文提出一种Feature Pyramid Networks(FPN)网络结构,能够在不影响速度的前提下融合多层特征,使每个level的特征都具有丰富的语义信息,提高CNN网络特征提取能力。理论上,FPN在CNN方法中是一个通用的方法。 暂无
Detecting Oriented Text in Natural Images by Linking Segments https://arxiv.org/abs/1703.06520 实现把文本行框出来,通过链接段检测自然图像中的定向文本 https://github.com/dengdan/seglink
Canny Text Detector: Fast and Robust Scene Text Localization Algorithm 链接 主要是解决图像中的文字定位问题的。将每个文字看做 Canny 算法中的边缘像素,用 Canny 边缘提取的思路来检测文字。 暂无
GA-DAN: Geometry-Aware Domain Adaptation Network for Scene Text Detection and Recognition 链接 提出了一种新的方法基于几何感知的领域自适应网络,该网络能够同时在几何空间和外观空间上建立跨领域移动模型,以及能够真实地转换有不同特征的跨领域图像。 暂无
Symmetry-constrained Rectification Network for Scene Text Recognition 链接 将文本的对称限制引入到文本校正网络中,显著提高了场景文本识别的精度。 暂无
EnhanceNet: Single Image Super-Resolution Through Automated Texture Synthesis https://doi.org/10.1109/ICCV.2017.481 提出了一种结合纹理损失的自动纹理合成的新颖应用,该损失专注于创建逼真的纹理,而不是针对训练过程中像素精度的地面真实图像再现进行优化。 https://github.com/msmsajjadi/EnhanceNet-Code
PlugNet: Degradation Aware Scene Text Recognition Supervised by a Pluggable Super-Resolution Unit 链接 低质退化文本识别算法PlugNet 暂无
Scene Text Image Super-resolution in the wild 链接 提出一个专门用来进行文本超分辨的数据集,并且提出了一个专门用来进行文本超分辨的网络。 暂无
Acquisition of Localization Confidence for Accurate Object Detection 链接 重复框抑制、非线性的位置信息回归 https://github.com/vacancy/PreciseRoIPooling
Enhancing Place Recognition Using Joint Intensity - Depth Analysis and Synthetic Data 链接 利用联合强度-深度分析和综合数据增强地点识别 暂无
PixelLink: Detecting Scene Text via Instance Segmentation 链接 浙江大学在2018年AAAI发表的一篇文章。文章的总体思路还是先用分割的方法来获得文本区域 https://github.com/ZJULearning/pixel_link
Deep TextSpotter: An End-to-End Trainable Scene Text Localization and Recognition Framework 链接 整体流程,输入图片先进行一个基于YOLOv2 的全卷积网络,然后经过RPN网络,输出经过NMS过滤后的ROI边框,然后根据该边框在最后一层卷积层上通过类似于STN的方式映射出高度固定的patch块。然后基于ctc进行识别。再根据识别的结果进行NMS过滤,得到最终结果。 https://github.com/MichalBusta/DeepTextSpotter
ESIR: End-to-end Scene Text Recognition via Iterative Image Rectification 链接 https://github.com/cassie1728/ESIR-a-little-impove
What Machines See Is Not What They Get: Fooling Scene Text Recognition Models with Adversarial Text Images 链接 利用对抗性文本图像欺骗场景文本识别模型
An accurate segmentation-based scene text detector with context attention and repulsive text border 链接 提出了一个精确的基于分块的检测器,它具有上下文注意和排斥的文本边界。
Handwriting Recognition in Low-resource Scripts using Adversarial Learning 链接 提出了一个基于生成对抗学习的小样本手写字符识别方法 https://github.com/AyanKumarBhunia/Handwriting_Recogition_using_Adversarial_Learning
Efficientnet: Rethinking model scaling for convolutional neural networks 链接 Google出品,ImageNet新的State-of-the-art https://github.com/tensorflow/tpu/tree/master/models/official/efficientnet
Multi-oriented Scene Text Detection via Corner Localization and Region Segmentation 链接 多方位
Deep Matching Prior Network : Toward Tighter Multi-oriented Text Detection 链接 多方位 暂无
Beyond Deep Residual Learning for Image Restoration 链接 超越深度残差学习的图像恢复方法 暂无
Inverse compositional spatial transformer networks 链接 本文提出的空间变换网络STN(Spatial Transformer Networks)可以使得模型具有空间不变性。 https://github.com/chenhsuanlin/inverse-compositional-STN
Multi-oriented Text Detection with Fully Convolutional Networks 链接 多方位 暂无
Deep Residual Learning for Image Recognition 链接 图像识别:深度残差网络 https://github.com/KaimingHe/deep-residual-networks
Single shot text detector with regional attention 链接 Attention机制强化文字特征,引入Inception来增强detector对文字大小的鲁棒性 https://github.com/BestSonny/SSTD
Chinese Street View Text: Large-scale Chinese Text Reading with Partially Supervised Learning 链接 提出新的中文街景数据集:C-SVT 和场景文本定位&识别新网络:End2End-PSL,表现 SOTA 暂无
Wetext: Scene text detection under weak supervision 链接 用半监督和无监督来学习字符分类器,解决字符标注数据量少的问题 暂无
Sequential Deformation for Accurate Scene Text Detection 链接 CNN模型对于文本检测的框的geometry variance的覆盖范围是有限的 暂无
RobustScanner: Dynamically Enhancing Positional Clues for Robust Text Recognition 链接 给予网络位置信息来解决attention漂移的问题 暂无
Using Object Information for Spotting Text 链接 利用物体识别文字 暂无
Scene Text Detection and Recognition: The Deep Learning Era 链接 深度学习时代的场景文本检测和识别
Detecting curve text in the wild: New dataset and new solution 链接 第一篇做曲文检测,还提出一个数据集CTW1500。使用14个点多边形来表示曲文。提出了一个结合CNN-RPN+RNN的检测方法专门做曲文检测 暂无
STN-OCR: A single Neural Network for Text Detection and Text Recognition 链接 STN-OCR https://github.com/Bartzi/stn-ocr
Towards accurate scene text recognition with semantic reasoning networks 链接 本文提出了一种用于精确场景文本识别的端到端的可训练框架——语义推理网络(SRN) https://github.com/chenjun2hao/SRN.pytorch
Scatter: Selective context attentional scene text recognizer 链接 介绍了一种新的场景文本识别体系结构——选择性上下文注意文本识别器(SCATTER) https://github.com/scottwedge/cvpr20-scatter-text-recognizer
Tightness-aware Evaluation Protocol for Scene Text Detection 链接 场景文本检测的紧密性感知评估协议 https://github.com/Yuliang-Liu/TIoU-metric
Learning Shape-Aware Embedding for Scene Text Detection 链接 一种基于实例分割以及嵌入特征的文本检测方法
Edit probability for scene text recognition 链接 用于场景文本识别的编辑概率
A-FaST-RCNN: Hard positive generation via adversary for object detection 链接 提出使用对抗生成有遮挡或形变的样本,这些样本对检测器来说比较困难,使用这些困难的正样本训练可以增加检测器的鲁棒性。与Fast-RCNN比较,在VOC2007上,mAP增加了2.3%,VOC2012上增加了2.6%。 暂无
Multi-scale FCN with Cascaded Instance Aware Segmentation for Arbitrary Oriented Word Spotting In The Wild 链接 具有级联实例感知分割的多尺度FCN,用于在野外任意定向的单词识别 暂无
Unambiguous text localization and retrieval for cluttered scenes 链接 对于混乱的场景,明确的文本定位和检索 暂无
Robust Scene Text Recognition with Automatic Rectification 链接 本文提出了RARE(Robust text recognizer with Automatic REctification),一个对于不规则的文字具有鲁棒性的识别模型。RARE是一个深度神经网络,包括一个空间变换网络Spatial Transformer Network (STN)和一个序列识别网络Sequence Recognition Network (SRN),两个网络同时用BP算法进行训练 https://github.com/guojm14/TPS-SRN-tensorflow
Recursive Recurrent Nets with Attention Modeling for OCR in the Wild 链接 在野外OCR的注意建模递归递归网络 暂无
SPGNet: Semantic Prediction Guidance for Scene Parsing 链接 SPGNet:场景分割的语义预测指南 暂无
Visual Semantic Reasoning for Image-Text Matching 链接 图像文本匹配,通过将两者映射同一嵌入空间,推断出一个完整句子和图像之间的相似度 。 https://github.com/KunpengLi1994/VSRN
Dynamic Context Correspondence Network for Semantic Alignment 链接 建立语义对应关系。以一种灵活的方式合并全局语义上下文,以克服以往依赖于局部语义表示的工作的局限性 https://github.com/ShuaiyiHuang/DCCNet
Self-Organized Text Detection with Minimal Post-processing via Border Learning. https://doi.org/10.1109/ICCV.2017.535 第一次提出了Border Learning的概念。作者认为传统的文本检测中将文字框和背景部分的像素分为两种类别,直接使用分割很容易会导致文本框之间出现粘连的情况, https://github.com/saicoco/tf-sotd
AutoSTR: Efficient Backbone Search for Scene Text Recognition 链接 使用网络结构搜索(NAS)技术来自动化设计文本识别网络中的特征序列提取器,以提升文本识别任务的性能 https://github.com/AutoML-4Paradigm/AutoSTR
Accurate Scene Text Detection Through Border Semantics Awareness and Bootstrapping 链接 本文方法是直接回归的方法,除了学习text/non-text分类任务,四个点到边界的回归任务(类似EAST),还增加了四条边界的border学习任务,最后输出不是直接用prediction的bounding box,而是用了text score map和四个border map来获得textline。 暂无
Verisimilar Image Synthesis for Accurate Detection and Recognition of Texts in Scenes 链接 主要通过人工生成数据的方式来辅助训练 https://github.com/GodOfSmallThings/Verisimilar-Image-Synthesis-for-Accurate-Detection-and-Recognition-of-Texts-in-Scenes
Detecting Text in Natural Image with Connectionist Text Proposal Network 链接 检测一个一个小的,固定宽度的文本段,然后在后处理部分再将这些小的文本段连接起来,得到文本行。 https://github.com/tianzhi0549/CTPN
Feature Enhancement Network: A Refined Scene Text Detector
A method for detecting text of arbitrary shapes in natural scenes that improves text spotting 链接 本文介绍了一个基于流水线的文本识别框架,该框架可以在复杂背景的自然场景图像中检测和识别各种字体、形状和方向的文本。 https://github.com/wqtwjt1996/UHT
An OCR for Classical Indic Documents Containing Arbitrarily Long Words 链接 长文本古典印度文文档OCR https://github.com/ihdia/sanskrit-ocr
Shape Robust Text Detection with Progressive Scale Expansion Network(文本检测) 链接 基于渐进式尺寸可扩展网络的形状鲁棒文本检测 https://github.com/WenmuZhou/PSENet.pytorch
Character Region Awareness for Text Detection 链接 利用字符分割进行文本检测 https://github.com/clovaai/CRAFT-pytorch
An End-to-End TextSpotter with Explicit Alignment and Attention 链接 一个简单高效的框架,它能在一个统一的架构当中连续性的处理两个任务
Single Shot TextSpotter with Explicit Alignment and Attention 链接 具有明确对齐和注意力的单发文本观测者 https://github.com/tonghe90/textspotter
Deep matching prior network: Toward tighter multi-oriented text detection 链接 使用四边形滑动窗口来对文本进行粗略的检测,并提出一种共享的蒙特卡洛方法来对多边形面积进行快速而精确的计算,然后基于一种序贯协议确定四个点的顺序,并回归最终的四边形预测结果。 暂无
East: An Efficient and Accurate Scene Text Detector 链接 一种高效、准确的场景文本检测器 https://github.com/songdejia/EAST
Textboxes++: A single-shot oriented scene text detector [链接](!https://arxiv.org/pdf/1801.02765 旋转文本检测
Towards Unconstrained End-to-End Text Spotting 链接 利用卷积递归神经网络实现端到端的文本识别 暂无
A Novel Integrated Framework for Learning both Text Detection and Recognition 链接 集检测和识别一体 暂无
An End-to-End OCR Text Re-organization Sequence Learning for Rich-text Detail Image Comprehension 链接 一种端到端的OCR文本重组序列学习,用于丰富的文本细节图像理解 暂无
Adaptive Text Recognition through Visual Matching https://arxiv.org/abs/2009.06610 解决文档识别中的文本识别的多样性和泛化性问题。通过视觉匹配的方法来做文本识别。 https://github.com/Chuhanxx/FontAdaptor
Mask TextSpotter v3: Segmentation Proposal Network for Robust Scene Text Spotting https://arxiv.org/abs/2007.09482 用于场景文字检测和识别的分割Proposal网络 https://github.com/MhLiao/MaskTextSpotterV3
AE TextSpotter: Learning Visual and Linguistic Representation for Ambiguous Text Spotting https://arxiv.org/abs/2008.00714 通过添加语言信息,并将语言信息结合到网络的训练过程中,来辅助视觉信息 暂无
Character Region Attention For Text Spotting https://arxiv.org/abs/2007.09629 本文提出的CRAFTS网络(Character Region Attention For Text Spotting)可分为三个阶段; 检测阶段,共享阶段和识别阶段。 暂无
Mask textspotter: An end-to-end trainable neural network for spotting text with arbitrary shapes 链接 利用了端对端学习流程的简单和顺利的优势,通过语义分割获得更准确的文本检测和识别。 https://github.com/lvpengyuan/masktextspotter.caffe2
TextSnake: A Flexible Representation for Detecting Text of Arbitrary Shapes 链接 为了解决任意形状文本的检测难题,文章提出了基于全卷积网络(Fully Convolutional Network,FCN)形式的文本检测器TextSnake,它是基于分割的文本检测方法。在该模型中将文本实例通过一串有序且重叠的圆盘组成,每个圆盘都有决定其属性的半径(文本宽度)与方向(文本方向),从而可以自由检测出任意形状的文本区域。 https://github.com/princewang1994/TextSnake.pytorch
SSD: Single Shot MultiBox Detector https://arxiv.org/pdf/1512.02325.pdf 在既保证速度,又要保证精度的情况下,提出了 SSD 物体检测模型,与现在流行的检测模型一样,将检测过程整个成一个 single deep neural network。便于训练与优化,同时提高检测速度。 https://github.com/sgrvinod/a-PyTorch-Tutorial-to-Object-Detection

一些链接

https://github.com/hwalsuklee/awesome-deep-text-detection-recognition

Text Detection

Conf. Date Title IC13 IC15 Resources
'14-ECCV 14/10/07 Robust Scene Text Detection with Convolution Neural Network Induced MSER Trees
15-CVPR 15/06/01 Symmetry-based text line detection in natural scenes 0.8043 PRJ
CODE
'16-TIP 15/10/12 Text-Attentional Convolutional Neural Networks for Scene Text Detection 0.8165
'15-ICCV 15/12/13 Text Flow : A Unified Text Detection System in Natural Scene Images 0.8025
'16-arXiv 16/03/31 Accurate Text Localization in Natural Image with Cascaded Convolutional TextNetwork 0.86
'16-CVPR 16/04/14 Multi-Oriented Text Detection with Fully Convolutional Networks 0.83 0.54 *TORCH(M)
'16-CVPR 16/04/22 Synthetic Data for Text Localisation in Natural Images 0.847
(L)0.8359
CODE
DB
'16-arXiv 16/06/29 Scene Text Detection Via Holistic, Multi-Channel Prediction 0.8433 0.6477
'16-ECCV 16/09/12 Detecting Text in Natural Image with Connectionist Text Proposal Network 0.8215 0.6085 *CAFFE(M)
CAFFE
TF(M)
TF
DEMO
BLOG(CH)
'17-AAAI 16/11/21 TextBoxes: A fast text detector with a single deep neural network 0.85
(L)0.8767
*CAFFE(M)
TF
BLOG(KR)
'18-TM 17/03/03 Arbitrary-Oriented Scene Text Detection via Rotation Proposals 0.9125 0.8020 *CAFFE
'17-CVPR 17/03/04 Deep Matching Prior Network: Toward Tighter Multi-oriented Text Detection 0.7064
'17-CVPR 17/03/19 Detecting Oriented Text in Natural Images by Linking Segments 0.853 0.75
(L)0.7636
*TF(M)
TF(M)
SLIDE
VIDEO
'17-arXiv 17/03/24 Deep Direct Regression for Multi-Oriented Scene Text Detection 0.86 0.81
'17-arXiv 17/04/03 Cascaded Segmentation-Detection Networks for Word-Level Text Spotting 0.86 0.71
'17-CVPR 17/04/11 EAST: An Efficient and Accurate Scene Text Detector 0.8072
(L)0.8038
TF(M)
TF
PYTORCH(M)
PYTORCH
DEMO
KERAS(M)
VIDEO
'17-ICIP 17/05/15 WordFence: Text Detection in Natural Images with Border Awareness 0.86
'17-arXiv 17/06/30 R2CNN: Rotational Region CNN for Orientation Robust Scene Text Detection 0.8773 0.8254 TF(M)
CAFFE(M)
'17-CVPR 17/07/21 Multi-scale FCN with Cascaded Instance Aware Segmentation for Arbitrary Oriented Word Spotting In The Wild 0.85 0.63
'17-arXiv 17/08/17 Deep Scene Text Detection with Connected Component Proposals 0.919
'17-ICCV 17/08/22 WordSup: Exploiting Word Annotations for Character based Text Detection 0.9064 0.7816
'17-ICCV 17/09/01 Single Shot Text Detector with Regional Attention 0.8704 0.7691 *CAFFE(M)
PYTORCH
VIDEO
'17-arXiv 17/09/11 Fused Text Segmentation Networks for Multi-oriented Scene Text Detection 0.8414
'17-ICCV 17/10/13 WeText: Scene Text Detection under Weak Supervision 0.869
(L)0.8313
'17-ICCV 17/10/22 Self-organized Text Detection with Minimal Post-processing via Border Learning 0.84 *KERAS(M)
'17-ICDAR 17/11/11 Deep Residual Text Detection Network for Scene Text 0.9117
(L)0.8925
'18-AAAI 17/11/12 Feature Enhancement Network: A Refined Scene Text Detector 0.9161
'17-arXiv 17/11/30 ArbiText: Arbitrary-Oriented Text Detection in Unconstrained Scene 0.759
'18-AAAI 18/01/04 PixelLink: Detecting Scene Text via Instance Segmentation 0.881 0.8519 *TF(M) TF
'18-CVPR 18/01/05 FOTS: Fast Oriented Text Spotting with a Unified Network 0.925 0.8984 PYTORCH
PYTORCH
VIDEO
'18-TIP 18/01/09 TextBoxes++: A Single-Shot Oriented Scene Text Detector 0.88 0.829
(L)0.8475
*CAFFE(M)
'18-CVPR 18/02/27 Multi-Oriented Scene Text Detection via Corner Localization and Region Segmentation 0.88 0.843 *PYTORCH(M)
'18-CVPR 18/03/09 An end-to-end TextSpotter with Explicit Alighment and Attention 0.9 0.87 *CAFFE(M)
'18-CVPR 18/03/14 Rotation-Sensitive Regression for Oriented Scene Text Detection 0.89 0.838 *CAFFE(M)
'18-arXiv 18/04/08 Detecting Multi-Oriented Text with Corner-based Region Proposals 0.876 0.845 *CAFFE(M)
'18-arXiv 18/04/24 An Anchor-Free Region Proposal Network for Faster R-CNN based Text Detection Approaches 0.92 0.86
'18-IJCAI 18/05/03 IncepText: A New Inception-Text Module with Deformable PSROI Pooling for Multi-Oriented Scene Text Detection 0.9047
'18-arXiv 18/06/07 Shape Robust Text Detection with Progressive Scale Expansion Network 0.8721 PRJ
'18-ECCV 18/07/04 TextSnake: A Flexible Representation for Detecting Text of Arbitrary Shapes 0.826 PYTORCH
'18-ECCV 18/07/06 Mask TextSpotter: An End-to-End Trainable Neural Network for Spotting Text with Arbitrary Shapes 0.917 0.86
'18-ECCV 18/07/10 Accurate Scene Text Detection through Border Semantics Awareness and Bootstrapping 0.892
'19-AAAI 18/11/21 Scene Text Detection with Supervised Pyramid Context Network 0.921 0.872
'19-TIP 18/12/04 TextField: Learning A Deep Direction Field for Irregular Scene Text Detection 0.824 *CAFFE(M)
'19-CVPR 19/03/21 Towards Robust Curve Text Detection with Conditional Spatial Expansion
'19-CVPR 19/03/28 Shape Robust Text Detection with Progressive Scale Expansion Network 0.857 TF(M)
'19-CVPR 19/04/03 Character Region Awareness for Text Detection 0.952 0.869 *PYTORCH(M)
VIDEO
PYTORCH
TF(M)
KERAS
BLOG_CH
BLOG_KR
BLOG_KR
BLOG_KR
'19-CVPR 19/04/13 Look More Than Once: An Accurate Detector for Text of Arbitrary Shapes Screen reader support enabled 0.877
'19-CVPR 19/06/16 Learning Shape-Aware Embedding for Scene Text Detection 0.877
'19-CVPR 19/06/16 Arbitrary Shape Scene Text Detection with Adaptive Text Region Representation 0.917 0.876
'19-ICCV 19/08/16 Efficient and Accurate Arbitrary-Shaped Text Detection with Pixel Aggregation Network 0.829
'19-ICCV 19/09/02 Geometry Normalization Networks for Accurate Scene Text Detection 0.8852
'19-AAAI 19/11/20 Real-time Scene Text Detection with Differentiable Binarization 0.847

Text Recognition

Conf. Date Title SVT IIIT5k IC03 IC13 Resources
'15-ICLR 14/12/18 Deep structured output learning for unconstrained text recognition 0.717 0.896 0.818 TF
SLIDE
VIDEO
'16-IJCV 15/05/07 Reading text in the wild with convolutional neural networks 0.807 0.933 0.908 KERAS
'16-AAAI 15/06/14 Reading Scene Text in Deep Convolutional Sequences
'17-TPAMI 15/07/21 An end-to-end trainable neural network for image-based sequence recognition and its application to scene text recognition 0.808 0.782 0.894 0.867 TORCH(M)
TF
TF
TF
TF
PYTORCH
PYTORCH(M)
BLOG(KR)
'16-CVPR 16/03/09 Recursive Recurrent Nets with Attention Modeling for OCR in the Wild 0.807 0.784 0.887 0.9
'16-CVPR 16/03/12 Robust scene text recognition with automatic rectification 0.819 0.819 0.901 0.886 PYTORCH
PYTORCH
'16-CVPR 16/06/27 CNN-N-Gram for Handwriting Word Recognition 0.8362 VIDEO
'16-BMVC 16/09/19 STAR-Net: A SpaTial Attention Residue Network for Scene Text Recognition 0.836 0.833 0.899 0.891
'17-arXiv 17/07/27 STN-OCR: A single Neural Network for Text Detection and Text Recognition 0.798 0.86 0.903 *MXNET(M)
PRJ
BLOG
'17-IJCAI 17/08/19 Learning to Read Irregular Text with Attention Mechanisms
'17-arXiv 17/09/06 Scene Text Recognition with Sliding Convolutional Character Models 0.765 0.816 0.845 0.852
'17-ICCV 17/09/07 Focusing Attention: Towards Accurate Text Recognition in Natural Images 0.859 0.874 0.942 0.933
'18-CVPR 17/11/12 AON: Towards Arbitrarily-Oriented Text Recognition 0.828 0.87 0.915 TF
'17-NIPS 17/12/04 Gated Recurrent Convolution Neural Network for OCR 0.815 0.808 0.978 *TORCH(M)
'18-AAAI 18/01/04 Char-Net: A Character-Aware Neural Network for Distorted Scene Text Recognition 0.844 0.836 0.915 0.908
'18-AAAI 18/01/04 SqueezedText: A Real-time Scene Text Recognition by Binary Convolutional Encoder-decoder Network 0.87 0.931 0.929
'18-CVPR 18/05/09 Edit Probability for Scene Text Recognition 0.875 0.883 0.946 0.944
'18-TPAMI 18/06/25 ASTER: An Attentional Scene Text Recognizer with Flexible Rectification 0.936 0.934 0.945 0.918 *TF(M)
PYTORCH
'18-ECCV 18/09/08 Synthetically Supervised Feature Learning for Scene Text Recognition 0.871 0.894 0.947 0.94
'19-AAAI 18/09/18 Scene Text Recognition from Two-Dimensional Perspective 0.821 0.92 0.914
'19-AAAI 18/11/02 Show, Attend and Read: A Simple and Strong Baseline for Irregular Text Recognition 0.845 0.915 0.91 *TORCH(M)
'19-CVPR 18/12/14 ESIR: End-to-end Scene Text Recognition via Iterative Image Rectification 0.902 0.933 0.913 PRJ
'19-PR 19/01/10 MORAN: A Multi-Object Rectified Attention Network for Scene Text Recognition 0.883 0.912 0.950 0.924 *PYTORCH(M)
'19-ICCV 19/04/03 What is wrong with scene text recognition model comparisons? dataset and model analysis 0.875 0.949 0.936 *PYTORCH(M)
BLOG_KR
'19-CVPR 19/04/18 Aggregation Cross-Entropy for Sequence Recognition 0.826 0.823 0.921 0.897 *PYTORCH
'19-CVPR 19/06/16 Sequence-to-Sequence Domain Adaptation Network for Robust Text Image Recognition 0.845 0.838 0.921 0.918
'19-ICCV 19/08/06 Symmetry-constrained Rectification Network for Scene Text Recognition 0.889 0.944 0.95 0.939
'20-AAAI 19/12/28 TextScanner: Reading Characters in Order for Robust Scene Text Recognition 0.895 0.926 0.925
'20-AAAI 19/12/21 Decoupled Attention Network for Text Recognition 0.892 0.943 0.95 0.939 *PYTORCH(M)
'20-AAAI 20/02/04 GTC: Guided Training of CTC 0.929 0.955 0.952 0.943

End-to-End Text Recognition

Conf. Date Title IC03 IC13 IC15 Resources
'12-ICPR 12/11/11 End-to-end text recognition with convolutional neural networks 0.67 *CODE
'14-ECCV 14/09/06 Deep Features for Text Spotting 0.75 PRJ
MATLAB
'15-IJCV 15/05/07 Reading Text in the Wild with Convolutional Neural Networks 0.70 0.77 KERAS
'15-TPAMI 15/10/30 Real-time Lexicon-free Scene Text Localization and Recognition 0.542 0.156
'16-arXiv 16/04/10 TextProposals: a Text-specific Selective Search Algorithm for Word Spotting in the Wild 0.6843 0.4718
(L)0.533
*CAFFE(M)
'17-AAAI 16/11/21 TextBoxes: A fast text detector with a single deep neural network 0.84 TF
*CAFFE(M)
BLOG_KR
'17-ICCV 17/07/13 Towards End-to-end Text Spotting with Convolution Recurrent Neural Network 0.8459 VIDEO
'17-ICCV 17/10/22 Deep TextSpotter An End-to-End Trainable Scene Text Localization and Recognition Framework 0.77 0.47 VIDEO
*CAFFE(M)
'18-CVPR 18/01/05 FOTS: Fast Oriented Text Spotting with a Unified Network 0.8477 0.6533 VIDEO
TF(M)
'18-TIP 18/01/09 TextBoxes++: A Single-Shot Oriented Scene Text Detector 0.8465 0.519 *CAFFE(M)
'18-CVPR 18/03/09 An end-to-end TextSpotter with Explicit Alignment and Attention 0.86 0.63 *CAFFE(M)
'18-TPAMI 18/06/25 ASTER: An Attentional Scene Text Recognizer with Flexible Rectification 0.64 *TF(M)
'18-ECCV 18/07/06 Mask TextSpotter: An End-to-End Trainable Neural Network for Spotting Text with Arbitrary Shapes 0.865 0.624
'19-ICCV 19/08/24 Towards Unconstrained End-to-End Text Spotting 0.6994 BLOG_KR
'19-ICCV 19/10/17 Convolutional Character Networks 0.7108 *PYTORCH(M)
'19-ICCV 19/10/27 TextDragon: An End-to-End Framework for Arbitrary Shaped Text Spotting 0.6537
'20-AAAI 19/11/21 All You Need Is Boundary: Toward Arbitrary-Shaped Text Spotting 0.841 0.641
'20-AAAI 20/02/12 Text Perceptron: Towards End-to-End Arbitrary-Shaped Text Spotting 0.858 0.651

Others

  • Papers are sorted by published date.
  • *CODE means official code and CODE(M) means that trained model is provided.
Conf. Date Title Description Resources
'14-NIPS 14/06/09 Synthetic Data and Artificial Neural Networks for Natural Scene Text Recognition Dataset PRJ
'17-ECCV 17/02/13 End-to-End Interpretation of the French Street Name Signs Dataset Dataset (FSNS) *TF(M)
'17-arXiv 17/04/11 Attention-based Extraction of Structured Information from Street View Imagery FSNS *TF(M)
TF
TF
LUA
BLOG_KR
'17-CVPR 17/07/21 Unambiguous Text Localization and Retrieval for Cluttered Scenes Text Retrieval
'17-AAAI 17/10/22 Detection and Recognition of Text Embedded in Online Images via Neural Context Models Dataset PRJ
'18-CVPR 17/11/17 Separating Style and Content for Generalized Style Transfer Font Style
'17-arXiv 17/12/06 Detecting Curve Text in the Wild New Dataset and New Solution Dataset (CTW 1500) PRJ
'18-AAAI 17/12/14 SEE: Towards Semi-Supervised End-to-End Scene Text Recognition FSNS PRJ
*CHAINER(M)
'17-CVPR 18/06/07 Learning to Extract Semantic Structure from Documents Using Multimodal Fully Convolutional Neural Networks Document Layout PRJ
'18-CVPR 18/06/19 DocUNet: Document Image Unwarping via A Stacked U-Net Document Dewarping PRJ
'18-CVPR 18/06/19 Document Enhancement using Visibility Detection Document Enhancement PRJ
'18-IJCAI 18/06/22 Multi-Task Handwritten Document Layout Analysis Document Layout
'18-ECCV 18/07/09 Verisimilar Image Synthesis for Accurate Detection and Recognition of Texts in Scenes Dataset PRJ
'19-AAAI 18/12/03 EnsNet: Ensconce Text in the Wild Text Removal DB
'19-CVPR 18/12/14 Spatial Fusion GAN for Image Synthesis Dataset DB
'19-AAAI 19/01/27 Hierarchical Encoder with Auxiliary Supervision for Table-to-text Generation: Learning Better Representation for Tables TableToText
'19-AAAI 19/01/27 A Radical-aware Attention-based Model for Chinese Text Classification Chinese Character Classification
'19-CVPR 19/02/25 Handwriting Recognition in Low-resource Scripts using Adversarial Learning Handwritting Recognition TF
'19-CVPR 19/03/27 Tightness-aware Evaluation Protocol for Scene Text Detection Evaluation CODE
'19-ICCV 19/05/31 Scene Text Visual Question Answering Dataset ICDAR_DB
'19-CVPR 19/06/16 DynTypo: Example-based Dynamic Text Effects Transfer Text Effects PRJ
VIDEO
'19-CVPR 19/06/16 Typography with Decor: Intelligent Text Style Transfer Text Effects *PYTORCH(M)
'19-CVPR 19/06/16 An Alternative Deep Feature Approach to Line Level Keyword Spotting Kyeword Spotting
'19-ICCV 19/07/23 GA-DAN: Geometry-Aware Domain Adaptation Network for Scene Text Detection and Recognition Domain Adaptation
'19-ICCV 19/09/17 Chinese Street View Text: Large-scale Chinese Text Reading with Partially Supervised Learning Dataset ICDAR_DB
'19-ICCV 19/10/02 Large-scale Tag-based Font Retrieval with Generative Feature Learning Font Retrieval
'19-ICCV 19/10/27 TextPlace: Visual Place Recognition and Topological Localization Through Reading Scene Texts Place Recognition DB
'19-ICCV 19/10/27 DewarpNet: Single-Image Document Unwarping With Stacked 3D and 2D Regression Networks Document Dewarping *PYTORCH(M)

Other lists

Tutorial Materials

补充

  • A Cost Efficient Approach to Correct OCR Errors in Large Document Collections https://arxiv.org/pdf/1905.11739

  • Cascaded Segmentation-Detection Networks for Word-Level Text Spotting

  • Detecting Multi-Oriented Text with Corner-based Region Proposals

  • Detection and Recognition of Text Embedded in Online Images via Neural Context Models

  • DynTypo: Example-based Dynamic Text Effects Transfer

  • Towards End-to-end Text Spotting with Convolution Recurrent Neural Network

  • Semi-Synthetic Data Augmentation of Scanned Historical Documents

  • Attend, Copy, Parse – End-to-end information extraction from documents

  • A Spatio-Spectral Hybrid Convolutional Architecture for Hyperspectral Document Authentication

  • Discourse descriptor for document incremental classification, Comparison with Deep Learning

  • A Character Attention Generative Adversarial Network for Degraded Historical Document Restoration

  • A Robust Data Hiding Scheme using Generated Content for Securing Genuine Documents

  • Simultaneous Optimisation of Image Quality Improvement and Text Content Extraction from Scanned Documents

  • A New Document Image Quality Assessment Method Based on Hast Derivations

  • A meaningful information extraction system for interactive analysis of documents

  • An End-to-End trainable framework for joint optimization of document enhancement & recognition.

  • A Deep Transfer Learning Approach to Document Image Quality Assessment

  • Learning Free Document Image Binarization Based on Fast Fuzzy C-Means Clustering

  • A Robust Hybrid Approach for Textual Document Classification

  • Document Domain Adaptation with Generative Adversarial Networks

  • Chemical Structure Recognition (CSR) System: Automatic Analysis of 2D Chemical Structures in

    Document Images

  • A Quality and Time Assessment of Binarization Algorithms for Scanned Documents

  • Blind Source Separation based Framework for Multispectral Document Image Binarization

学者

白翔[K1] (华中科技大学)

刘成林[K2] (中科学自动化所)

乔宇[K3] (中科院深圳先进技术研究所)

金连文[K4] (华南理工大学)

殷绪成[K5] (北科大)

高良才[K6] (北大)

苏统华[K7] (哈工大)

丁晓青[K8] (清华)


\1. [K1]https://blog.csdn.net/xwukefr2tnh4/article/details/78139737

\2. http://u-pat.org/ICDAR2017/keynotes/ICDAR2017_Keynote_Prof_Bai.pdf

\3. CRNN

[K2]文档图像识别技术回顾与展望.pdf

\1. [K3]http://school.freekaoyan.com/bj/gscas/daoshi/2016/05-10/1462864045596171.shtml

\2. CTPN

[K4]https://baike.baidu.com/item/%E9%87%91%E8%BF%9E%E6%96%87/4003005?fr=aladdin#5

[K5]http://scce.ustb.edu.cn/shiziduiwu/jiaoshixinxi/2018-04-12/62.html

[K6]数字出版、版面分析与理解

https://www.icst.pku.edu.cn/szwdclyjs/kycgx11/index.htm

[K7]http://homepage.hit.edu.cn/tonghuasu

CUDA、手写识别

[K8]http://www.opticsjournal.net/Experts/dingxiaoqing.htm

1962年大学毕业

posted on 2021-01-19 14:47  宋岳庭  阅读(682)  评论(0编辑  收藏  举报