文档识别论文
第一次整理
题目 | 链接 | 摘要 | 代码 |
---|---|---|---|
FOTS: Fast oriented text spotting with a unified network | 链接 | 商汤,实现了端到端多方向文字的检测和识别。 | https://github.com/xieyufei1993/FOTS |
WordSup: Exploiting Word Annotations for Character Based Text Detection. | 链接 | 现存的字符级别的标定太少,引入弱监督来调整字符坐标的定位。 | 暂无 |
Towards end-to-end license plate detection and recognition: A large dataset and baseline | 链接 | 使用 CNN 网络端对端解决 车牌识别问题 | 暂无 |
CLEval: Character-Level Evaluation for Text Detection and Recognition Tasks | 链接 | 提出了一个字符级评价度量(CLEval) | https://github.com/clovaai/CLEval |
ABCNet: Real-time Scene Text Spotting with Adaptive Bezier-Curve Network | 链接 | ABCNet:基于自适应贝兹曲线网络的实时场景文本识别 | https://github.com/Yuliang-Liu/bezier_curve_text_spotting |
Towards Robust Curve Text Detection with Conditional Spatial Expansion | 链接 | 提出了条件空间膨胀(CSE)机制解决检测不规则形状和尺度的曲线文字问题。 | 无 |
Look More Than Once: An Accurate Detector for Text of Arbitrary Shapes Screen reader support enabled | 链接 | 提出的文本检测算法(LOMO)主要致力于去解决极端长文本与任意形状的文本检测问题 | 无 |
Rotation-Sensitive Regression for Oriented Scene Text Detection | 链接 | 对自然场景中的文字检测,分类问题对于旋转不敏感,但回归问题对于旋转是敏感的,因此两个任务应当采用不同的特征。之前的方法对两个任务均采用共享特征,导致性能下降。因此提出采用不同的特征来进行分类和回归任务,回归分支网络通过旋转卷积核提取旋转敏感的特征,分类分支通过池化旋转敏感性特征来提取旋转不变性特征。效果有较大提升。 | https://github.com/MhLiao/RRD |
Geometry-Aware Scene Text Detection with Instance Transformation Network | 链接 | 基于实例变换网络的几何感知场景文本检测 | https://github.com/LianaWang/itn |
A new feature pyramid network for object detection | 链接 | 本文提出一种Feature Pyramid Networks(FPN)网络结构,能够在不影响速度的前提下融合多层特征,使每个level的特征都具有丰富的语义信息,提高CNN网络特征提取能力。理论上,FPN在CNN方法中是一个通用的方法。 | 暂无 |
Detecting Oriented Text in Natural Images by Linking Segments | https://arxiv.org/abs/1703.06520 | 实现把文本行框出来,通过链接段检测自然图像中的定向文本 | https://github.com/dengdan/seglink |
Canny Text Detector: Fast and Robust Scene Text Localization Algorithm | 链接 | 主要是解决图像中的文字定位问题的。将每个文字看做 Canny 算法中的边缘像素,用 Canny 边缘提取的思路来检测文字。 | 暂无 |
GA-DAN: Geometry-Aware Domain Adaptation Network for Scene Text Detection and Recognition | 链接 | 提出了一种新的方法基于几何感知的领域自适应网络,该网络能够同时在几何空间和外观空间上建立跨领域移动模型,以及能够真实地转换有不同特征的跨领域图像。 | 暂无 |
Symmetry-constrained Rectification Network for Scene Text Recognition | 链接 | 将文本的对称限制引入到文本校正网络中,显著提高了场景文本识别的精度。 | 暂无 |
EnhanceNet: Single Image Super-Resolution Through Automated Texture Synthesis | https://doi.org/10.1109/ICCV.2017.481 | 提出了一种结合纹理损失的自动纹理合成的新颖应用,该损失专注于创建逼真的纹理,而不是针对训练过程中像素精度的地面真实图像再现进行优化。 | https://github.com/msmsajjadi/EnhanceNet-Code |
PlugNet: Degradation Aware Scene Text Recognition Supervised by a Pluggable Super-Resolution Unit | 链接 | 低质退化文本识别算法PlugNet | 暂无 |
Scene Text Image Super-resolution in the wild | 链接 | 提出一个专门用来进行文本超分辨的数据集,并且提出了一个专门用来进行文本超分辨的网络。 | 暂无 |
Acquisition of Localization Confidence for Accurate Object Detection | 链接 | 重复框抑制、非线性的位置信息回归 | https://github.com/vacancy/PreciseRoIPooling |
Enhancing Place Recognition Using Joint Intensity - Depth Analysis and Synthetic Data | 链接 | 利用联合强度-深度分析和综合数据增强地点识别 | 暂无 |
PixelLink: Detecting Scene Text via Instance Segmentation | 链接 | 浙江大学在2018年AAAI发表的一篇文章。文章的总体思路还是先用分割的方法来获得文本区域 | https://github.com/ZJULearning/pixel_link |
Deep TextSpotter: An End-to-End Trainable Scene Text Localization and Recognition Framework | 链接 | 整体流程,输入图片先进行一个基于YOLOv2 的全卷积网络,然后经过RPN网络,输出经过NMS过滤后的ROI边框,然后根据该边框在最后一层卷积层上通过类似于STN的方式映射出高度固定的patch块。然后基于ctc进行识别。再根据识别的结果进行NMS过滤,得到最终结果。 | https://github.com/MichalBusta/DeepTextSpotter |
ESIR: End-to-end Scene Text Recognition via Iterative Image Rectification | 链接 | https://github.com/cassie1728/ESIR-a-little-impove | |
What Machines See Is Not What They Get: Fooling Scene Text Recognition Models with Adversarial Text Images | 链接 | 利用对抗性文本图像欺骗场景文本识别模型 | 无 |
An accurate segmentation-based scene text detector with context attention and repulsive text border | 链接 | 提出了一个精确的基于分块的检测器,它具有上下文注意和排斥的文本边界。 | 无 |
Handwriting Recognition in Low-resource Scripts using Adversarial Learning | 链接 | 提出了一个基于生成对抗学习的小样本手写字符识别方法 | https://github.com/AyanKumarBhunia/Handwriting_Recogition_using_Adversarial_Learning |
Efficientnet: Rethinking model scaling for convolutional neural networks | 链接 | Google出品,ImageNet新的State-of-the-art | https://github.com/tensorflow/tpu/tree/master/models/official/efficientnet |
Multi-oriented Scene Text Detection via Corner Localization and Region Segmentation | 链接 | 多方位 | 无 |
Deep Matching Prior Network : Toward Tighter Multi-oriented Text Detection | 链接 | 多方位 | 暂无 |
Beyond Deep Residual Learning for Image Restoration | 链接 | 超越深度残差学习的图像恢复方法 | 暂无 |
Inverse compositional spatial transformer networks | 链接 | 本文提出的空间变换网络STN(Spatial Transformer Networks)可以使得模型具有空间不变性。 | https://github.com/chenhsuanlin/inverse-compositional-STN |
Multi-oriented Text Detection with Fully Convolutional Networks | 链接 | 多方位 | 暂无 |
Deep Residual Learning for Image Recognition | 链接 | 图像识别:深度残差网络 | https://github.com/KaimingHe/deep-residual-networks |
Single shot text detector with regional attention | 链接 | Attention机制强化文字特征,引入Inception来增强detector对文字大小的鲁棒性 | https://github.com/BestSonny/SSTD |
Chinese Street View Text: Large-scale Chinese Text Reading with Partially Supervised Learning | 链接 | 提出新的中文街景数据集:C-SVT 和场景文本定位&识别新网络:End2End-PSL,表现 SOTA | 暂无 |
Wetext: Scene text detection under weak supervision | 链接 | 用半监督和无监督来学习字符分类器,解决字符标注数据量少的问题 | 暂无 |
Sequential Deformation for Accurate Scene Text Detection | 链接 | CNN模型对于文本检测的框的geometry variance的覆盖范围是有限的 | 暂无 |
RobustScanner: Dynamically Enhancing Positional Clues for Robust Text Recognition | 链接 | 给予网络位置信息来解决attention漂移的问题 | 暂无 |
Using Object Information for Spotting Text | 链接 | 利用物体识别文字 | 暂无 |
Scene Text Detection and Recognition: The Deep Learning Era | 链接 | 深度学习时代的场景文本检测和识别 | |
Detecting curve text in the wild: New dataset and new solution | 链接 | 第一篇做曲文检测,还提出一个数据集CTW1500。使用14个点多边形来表示曲文。提出了一个结合CNN-RPN+RNN的检测方法专门做曲文检测 | 暂无 |
STN-OCR: A single Neural Network for Text Detection and Text Recognition | 链接 | STN-OCR | https://github.com/Bartzi/stn-ocr |
Towards accurate scene text recognition with semantic reasoning networks | 链接 | 本文提出了一种用于精确场景文本识别的端到端的可训练框架——语义推理网络(SRN) | https://github.com/chenjun2hao/SRN.pytorch |
Scatter: Selective context attentional scene text recognizer | 链接 | 介绍了一种新的场景文本识别体系结构——选择性上下文注意文本识别器(SCATTER) | https://github.com/scottwedge/cvpr20-scatter-text-recognizer |
Tightness-aware Evaluation Protocol for Scene Text Detection | 链接 | 场景文本检测的紧密性感知评估协议 | https://github.com/Yuliang-Liu/TIoU-metric |
Learning Shape-Aware Embedding for Scene Text Detection | 链接 | 一种基于实例分割以及嵌入特征的文本检测方法 | 无 |
Edit probability for scene text recognition | 链接 | 用于场景文本识别的编辑概率 | 无 |
A-FaST-RCNN: Hard positive generation via adversary for object detection | 链接 | 提出使用对抗生成有遮挡或形变的样本,这些样本对检测器来说比较困难,使用这些困难的正样本训练可以增加检测器的鲁棒性。与Fast-RCNN比较,在VOC2007上,mAP增加了2.3%,VOC2012上增加了2.6%。 | 暂无 |
Multi-scale FCN with Cascaded Instance Aware Segmentation for Arbitrary Oriented Word Spotting In The Wild | 链接 | 具有级联实例感知分割的多尺度FCN,用于在野外任意定向的单词识别 | 暂无 |
Unambiguous text localization and retrieval for cluttered scenes | 链接 | 对于混乱的场景,明确的文本定位和检索 | 暂无 |
Robust Scene Text Recognition with Automatic Rectification | 链接 | 本文提出了RARE(Robust text recognizer with Automatic REctification),一个对于不规则的文字具有鲁棒性的识别模型。RARE是一个深度神经网络,包括一个空间变换网络Spatial Transformer Network (STN)和一个序列识别网络Sequence Recognition Network (SRN),两个网络同时用BP算法进行训练 | https://github.com/guojm14/TPS-SRN-tensorflow |
Recursive Recurrent Nets with Attention Modeling for OCR in the Wild | 链接 | 在野外OCR的注意建模递归递归网络 | 暂无 |
SPGNet: Semantic Prediction Guidance for Scene Parsing | 链接 | SPGNet:场景分割的语义预测指南 | 暂无 |
Visual Semantic Reasoning for Image-Text Matching | 链接 | 图像文本匹配,通过将两者映射同一嵌入空间,推断出一个完整句子和图像之间的相似度 。 | https://github.com/KunpengLi1994/VSRN |
Dynamic Context Correspondence Network for Semantic Alignment | 链接 | 建立语义对应关系。以一种灵活的方式合并全局语义上下文,以克服以往依赖于局部语义表示的工作的局限性 | https://github.com/ShuaiyiHuang/DCCNet |
Self-Organized Text Detection with Minimal Post-processing via Border Learning. | https://doi.org/10.1109/ICCV.2017.535 | 第一次提出了Border Learning的概念。作者认为传统的文本检测中将文字框和背景部分的像素分为两种类别,直接使用分割很容易会导致文本框之间出现粘连的情况, | https://github.com/saicoco/tf-sotd |
AutoSTR: Efficient Backbone Search for Scene Text Recognition | 链接 | 使用网络结构搜索(NAS)技术来自动化设计文本识别网络中的特征序列提取器,以提升文本识别任务的性能 | https://github.com/AutoML-4Paradigm/AutoSTR |
Accurate Scene Text Detection Through Border Semantics Awareness and Bootstrapping | 链接 | 本文方法是直接回归的方法,除了学习text/non-text分类任务,四个点到边界的回归任务(类似EAST),还增加了四条边界的border学习任务,最后输出不是直接用prediction的bounding box,而是用了text score map和四个border map来获得textline。 | 暂无 |
Verisimilar Image Synthesis for Accurate Detection and Recognition of Texts in Scenes | 链接 | 主要通过人工生成数据的方式来辅助训练 | https://github.com/GodOfSmallThings/Verisimilar-Image-Synthesis-for-Accurate-Detection-and-Recognition-of-Texts-in-Scenes |
Detecting Text in Natural Image with Connectionist Text Proposal Network | 链接 | 检测一个一个小的,固定宽度的文本段,然后在后处理部分再将这些小的文本段连接起来,得到文本行。 | https://github.com/tianzhi0549/CTPN |
Feature Enhancement Network: A Refined Scene Text Detector | |||
A method for detecting text of arbitrary shapes in natural scenes that improves text spotting | 链接 | 本文介绍了一个基于流水线的文本识别框架,该框架可以在复杂背景的自然场景图像中检测和识别各种字体、形状和方向的文本。 | https://github.com/wqtwjt1996/UHT |
An OCR for Classical Indic Documents Containing Arbitrarily Long Words | 链接 | 长文本古典印度文文档OCR | https://github.com/ihdia/sanskrit-ocr |
Shape Robust Text Detection with Progressive Scale Expansion Network(文本检测) | 链接 | 基于渐进式尺寸可扩展网络的形状鲁棒文本检测 | https://github.com/WenmuZhou/PSENet.pytorch |
Character Region Awareness for Text Detection | 链接 | 利用字符分割进行文本检测 | https://github.com/clovaai/CRAFT-pytorch |
An End-to-End TextSpotter with Explicit Alignment and Attention | 链接 | 一个简单高效的框架,它能在一个统一的架构当中连续性的处理两个任务 | 无 |
Single Shot TextSpotter with Explicit Alignment and Attention | 链接 | 具有明确对齐和注意力的单发文本观测者 | https://github.com/tonghe90/textspotter |
Deep matching prior network: Toward tighter multi-oriented text detection | 链接 | 使用四边形滑动窗口来对文本进行粗略的检测,并提出一种共享的蒙特卡洛方法来对多边形面积进行快速而精确的计算,然后基于一种序贯协议确定四个点的顺序,并回归最终的四边形预测结果。 | 暂无 |
East: An Efficient and Accurate Scene Text Detector | 链接 | 一种高效、准确的场景文本检测器 | https://github.com/songdejia/EAST |
Textboxes++: A single-shot oriented scene text detector | [链接](!https://arxiv.org/pdf/1801.02765 | 旋转文本检测 | |
Towards Unconstrained End-to-End Text Spotting | 链接 | 利用卷积递归神经网络实现端到端的文本识别 | 暂无 |
A Novel Integrated Framework for Learning both Text Detection and Recognition | 链接 | 集检测和识别一体 | 暂无 |
An End-to-End OCR Text Re-organization Sequence Learning for Rich-text Detail Image Comprehension | 链接 | 一种端到端的OCR文本重组序列学习,用于丰富的文本细节图像理解 | 暂无 |
Adaptive Text Recognition through Visual Matching | https://arxiv.org/abs/2009.06610 | 解决文档识别中的文本识别的多样性和泛化性问题。通过视觉匹配的方法来做文本识别。 | https://github.com/Chuhanxx/FontAdaptor |
Mask TextSpotter v3: Segmentation Proposal Network for Robust Scene Text Spotting | https://arxiv.org/abs/2007.09482 | 用于场景文字检测和识别的分割Proposal网络 | https://github.com/MhLiao/MaskTextSpotterV3 |
AE TextSpotter: Learning Visual and Linguistic Representation for Ambiguous Text Spotting | https://arxiv.org/abs/2008.00714 | 通过添加语言信息,并将语言信息结合到网络的训练过程中,来辅助视觉信息 | 暂无 |
Character Region Attention For Text Spotting | https://arxiv.org/abs/2007.09629 | 本文提出的CRAFTS网络(Character Region Attention For Text Spotting)可分为三个阶段; 检测阶段,共享阶段和识别阶段。 | 暂无 |
Mask textspotter: An end-to-end trainable neural network for spotting text with arbitrary shapes | 链接 | 利用了端对端学习流程的简单和顺利的优势,通过语义分割获得更准确的文本检测和识别。 | https://github.com/lvpengyuan/masktextspotter.caffe2 |
TextSnake: A Flexible Representation for Detecting Text of Arbitrary Shapes | 链接 | 为了解决任意形状文本的检测难题,文章提出了基于全卷积网络(Fully Convolutional Network,FCN)形式的文本检测器TextSnake,它是基于分割的文本检测方法。在该模型中将文本实例通过一串有序且重叠的圆盘组成,每个圆盘都有决定其属性的半径(文本宽度)与方向(文本方向),从而可以自由检测出任意形状的文本区域。 | https://github.com/princewang1994/TextSnake.pytorch |
SSD: Single Shot MultiBox Detector | https://arxiv.org/pdf/1512.02325.pdf | 在既保证速度,又要保证精度的情况下,提出了 SSD 物体检测模型,与现在流行的检测模型一样,将检测过程整个成一个 single deep neural network。便于训练与优化,同时提高检测速度。 | https://github.com/sgrvinod/a-PyTorch-Tutorial-to-Object-Detection |
一些链接
https://github.com/hwalsuklee/awesome-deep-text-detection-recognition
Text Detection
Text Recognition
End-to-End Text Recognition
Others
- Papers are sorted by published date.
*CODE
means official code andCODE(M)
means that trained model is provided.
Other lists
- OCR Paper Curation
Tutorial Materials
- Lecture slides
- Survey Paper
补充
-
A Cost Efficient Approach to Correct OCR Errors in Large Document Collections https://arxiv.org/pdf/1905.11739
-
Cascaded Segmentation-Detection Networks for Word-Level Text Spotting
-
Detecting Multi-Oriented Text with Corner-based Region Proposals
-
Detection and Recognition of Text Embedded in Online Images via Neural Context Models
-
DynTypo: Example-based Dynamic Text Effects Transfer
-
Towards End-to-end Text Spotting with Convolution Recurrent Neural Network
-
Semi-Synthetic Data Augmentation of Scanned Historical Documents
-
Attend, Copy, Parse – End-to-end information extraction from documents
-
A Spatio-Spectral Hybrid Convolutional Architecture for Hyperspectral Document Authentication
-
Discourse descriptor for document incremental classification, Comparison with Deep Learning
-
A Character Attention Generative Adversarial Network for Degraded Historical Document Restoration
-
A Robust Data Hiding Scheme using Generated Content for Securing Genuine Documents
-
Simultaneous Optimisation of Image Quality Improvement and Text Content Extraction from Scanned Documents
-
A New Document Image Quality Assessment Method Based on Hast Derivations
-
A meaningful information extraction system for interactive analysis of documents
-
An End-to-End trainable framework for joint optimization of document enhancement & recognition.
-
A Deep Transfer Learning Approach to Document Image Quality Assessment
-
Learning Free Document Image Binarization Based on Fast Fuzzy C-Means Clustering
-
A Robust Hybrid Approach for Textual Document Classification
-
Document Domain Adaptation with Generative Adversarial Networks
-
Chemical Structure Recognition (CSR) System: Automatic Analysis of 2D Chemical Structures in
Document Images
-
A Quality and Time Assessment of Binarization Algorithms for Scanned Documents
-
Blind Source Separation based Framework for Multispectral Document Image Binarization
学者
白翔[K1] (华中科技大学)
刘成林[K2] (中科学自动化所)
乔宇[K3] (中科院深圳先进技术研究所)
金连文[K4] (华南理工大学)
殷绪成[K5] (北科大)
高良才[K6] (北大)
苏统华[K7] (哈工大)
丁晓青[K8] (清华)
\1. [K1]https://blog.csdn.net/xwukefr2tnh4/article/details/78139737
\2. http://u-pat.org/ICDAR2017/keynotes/ICDAR2017_Keynote_Prof_Bai.pdf
\3. CRNN
[K2]文档图像识别技术回顾与展望.pdf
\1. [K3]http://school.freekaoyan.com/bj/gscas/daoshi/2016/05-10/1462864045596171.shtml
\2. CTPN
[K4]https://baike.baidu.com/item/%E9%87%91%E8%BF%9E%E6%96%87/4003005?fr=aladdin#5
[K5]http://scce.ustb.edu.cn/shiziduiwu/jiaoshixinxi/2018-04-12/62.html
[K6]数字出版、版面分析与理解
https://www.icst.pku.edu.cn/szwdclyjs/kycgx11/index.htm
[K7]http://homepage.hit.edu.cn/tonghuasu
CUDA、手写识别
[K8]http://www.opticsjournal.net/Experts/dingxiaoqing.htm
1962年大学毕业