OCR 综述

OCR 发展趋势

场景文本检测
场景文字识别
端到端场景文本识别

场景文字检测

方法举例:

基于回归的方法
- Gupta et al, CVPR 2016; Tian et al, ECCV 2016;
- Shi, Bai, et al, ICCV 2017; Liu et al, CVPR 2017;
- Liao et al, AAAI 2017; Hu et al, ICCV 2017 ...
基于分割的方法
- Zhong et al, CVPR 2016; Zhou et al, CVPR 2017;
- Wu et al, ICCV 2017; Dent et al, AAAI 2018;
- X Li, CVPR 2019; W Wang, et al, CVPR 2019 ...
混合方法 (分割+回归)
- He et al, ICCV 2017; Lyu et al, CVPR 2018;
- Liao et al, CVPR 2018; Long et al, ECCV 2018;
- Liu et al, IJCAI 2019 ...

发展趋势:

水平矩形框检测 $\longrightarrow$ 多方向矩形框 $\longrightarrow$ 多方向四边形 $\longrightarrow$ 曲线文本 $\longrightarrow$ 任意形状

注:

Segmentation based 的方法不容易准确区分相邻或重叠文本
Regression based 的方法对长文本不易检测完整
- Bounding box regression 方法需要设置合理的 anchor 参数

Anchor & RPN 调参问题:

Anchor free 回归方法举例:

Segmentation based methods
C.He et al, Direct Regression..., ICCV 2017, TIP 2018.
Z Zhong et al, An Anchor-Free Region Proposal Network..., IJDAR 2019.
Zhi Tian, Chunhua Shen, et al, FCOS, CVPR, 2019.
Chenchen Zhu, Yihui He, et al, FSAF, CVPR, 2019.
Tao Kong, Fuchun Sun et al, FoveaBox, arXiv 2019.

Why anchor free?
大多数 RPN regression 方法需要设置合理的 anchors 参数
Eg: SSD $\longrightarrow$ TextBox (AAAI 2017)

Alternative anchor design?
Lele Xie, Yuliang Liu, Lianwen Jin, Zecheng Xie, DeRPN: Taking a further step toward more general object detection, AAAI 2019.

场景文字识别

场景文字识别方法:

基于 CTC 的方法
- P.He et al, AAAI 2016 (DTRN: CNN+RNN+CTC)
- B.shi et al, TPAMI 2017 (CRNN: CNN+RNN+CTC)
- F Yin, et al, arXiv 2017 (CNN+CTC)
- Y Wu, etal, arXiv 2018 (CNN+CTC)
- Y Liu et al, ECCV 2018 (GAN+CTC)
基于 attention 的方法
- C Lee et al, CVPR 2016; B shi 二图案例, CVPR 2016
- X Yang et al, IJCAI 2017
- Bai et al, CVPR 2018; Liu et al, AAAI 2018
- Shi et al, TPAMI 2018 (ASTER)
- Luo et al, PR 2019 (MORAN)

发展趋势:

规则文本 $\longrightarrow$ 不规则文本识别
CTC $\longrightarrow$ Attention (1D, 2D)
检测 + 识别 $\longrightarrow$ 检测识别端到端

Attention or CTC ?

长文本 CTC 好, 短文本 attention 好

Limitation of Attention and CTC

CTC:

Can hardly be directly applied to 2D prediction
Large computation involved for long sequence
Performance degradation for repeat patterns

Attention:

Misalignment problem (attention drift)
More memory size required

Why End2End ?

Prevent training errors be accumulater
- errors can accumulate in a cascade of detection + recognition which may lead to large fraction of garbage predictions
Jointly optimization to help improve overall performance
Easier to maintain and adapt to new domain
- maintaining a cascaded pipeline with data and model dependencied requires substantial engineering effort
Faster, Smaller, Stronger

Some new technique to bridge between detector and recognizer

RoI Rotate (多方向 e2e)
- X Liu, et al, FOTS, CVPR 2018
Tailored RoI pooling (保持长宽比重采样)
- H Li et al. Towards End-to-EndText Spotting in Natural Scenes, arXiv 20190617 (extionsion of "H Li et al ICCV 2017")
RoI Masking (任意形状e2e)
- S Qin, A Bissacco, et al(Google AI), Towards Unconstrained End-to-End Text Spotting, ICCV 2019

posted @ 2019-10-18 09:22 larkii 阅读(1080) 评论(0) 编辑收藏举报

刷新页面返回顶部

登录后才能查看或发表评论，立即登录或者逛逛博客园首页

阅读排行：
· Spring AI + Ollama 实现 deepseek-r1 的API服务和调用
· 《HelloGitHub》第 106 期
· 数据库服务器 SQL Server 版本升级公告
· 深入理解Mybatis分库分表执行原理
· 使用 Dify + LLM 构建精确任务处理应用

larkii

OCR 综述

OCR 发展趋势

场景文字检测

方法举例:

发展趋势:

Anchor & RPN 调参问题:

场景文字识别

场景文字识别方法:

发展趋势:

Attention or CTC ?

Limitation of Attention and CTC

Why End2End ?

Some new technique to bridge between detector and recognizer

常用链接

我的标签

随笔分类

阅读排行榜