OCR 综述

OCR 发展趋势

  • 场景文本检测
  • 场景文字识别
  • 端到端场景文本识别

场景文字检测

方法举例:

  • 基于回归的方法

    • Gupta et al, CVPR 2016; Tian et al, ECCV 2016;
    • Shi, Bai, et al, ICCV 2017; Liu et al, CVPR 2017;
    • Liao et al, AAAI 2017; Hu et al, ICCV 2017 ...
  • 基于分割的方法

    • Zhong et al, CVPR 2016; Zhou et al, CVPR 2017;
    • Wu et al, ICCV 2017; Dent et al, AAAI 2018;
    • X Li, CVPR 2019; W Wang, et al, CVPR 2019 ...
  • 混合方法 (分割+回归)

    • He et al, ICCV 2017; Lyu et al, CVPR 2018;
    • Liao et al, CVPR 2018; Long et al, ECCV 2018;
    • Liu et al, IJCAI 2019 ...

发展趋势:

水平矩形框检测 \(\longrightarrow\) 多方向矩形框 \(\longrightarrow\) 多方向四边形 \(\longrightarrow\) 曲线文本 \(\longrightarrow\) 任意形状

注:

  • Segmentation based 的方法不容易准确区分相邻或重叠文本
  • Regression based 的方法对长文本不易检测完整
    • Bounding box regression 方法需要设置合理的 anchor 参数

Anchor & RPN 调参问题:

Anchor free 回归方法举例:

  • Segmentation based methods
  • C.He et al, Direct Regression..., ICCV 2017, TIP 2018.
  • Z Zhong et al, An Anchor-Free Region Proposal Network..., IJDAR 2019.
  • Zhi Tian, Chunhua Shen, et al, FCOS, CVPR, 2019.
  • Chenchen Zhu, Yihui He, et al, FSAF, CVPR, 2019.
  • Tao Kong, Fuchun Sun et al, FoveaBox, arXiv 2019.

Why anchor free?
大多数 RPN regression 方法需要设置合理的 anchors 参数
Eg: SSD \(\longrightarrow\) TextBox (AAAI 2017)

Alternative anchor design?
Lele Xie, Yuliang Liu, Lianwen Jin, Zecheng Xie, DeRPN: Taking a further step toward more general object detection, AAAI 2019.

场景文字识别

场景文字识别方法:

  • 基于 CTC 的方法

    • P.He et al, AAAI 2016 (DTRN: CNN+RNN+CTC)
    • B.shi et al, TPAMI 2017 (CRNN: CNN+RNN+CTC)
    • F Yin, et al, arXiv 2017 (CNN+CTC)
    • Y Wu, etal, arXiv 2018 (CNN+CTC)
    • Y Liu et al, ECCV 2018 (GAN+CTC)
  • 基于 attention 的方法

    • C Lee et al, CVPR 2016; B shi 二图案例, CVPR 2016
    • X Yang et al, IJCAI 2017
    • Bai et al, CVPR 2018; Liu et al, AAAI 2018
    • Shi et al, TPAMI 2018 (ASTER)
    • Luo et al, PR 2019 (MORAN)

发展趋势:

规则文本 \(\longrightarrow\) 不规则文本识别
CTC \(\longrightarrow\) Attention (1D, 2D)
检测 + 识别 \(\longrightarrow\) 检测识别端到端

Attention or CTC ?

长文本 CTC 好, 短文本 attention 好

Limitation of Attention and CTC

CTC:

  • Can hardly be directly applied to 2D prediction
  • Large computation involved for long sequence
  • Performance degradation for repeat patterns

Attention:

  • Misalignment problem (attention drift)
  • More memory size required

Why End2End ?

  • Prevent training errors be accumulater
    • errors can accumulate in a cascade of detection + recognition which may lead to large fraction of garbage predictions
  • Jointly optimization to help improve overall performance
  • Easier to maintain and adapt to new domain
    • maintaining a cascaded pipeline with data and model dependencied requires substantial engineering effort
  • Faster, Smaller, Stronger

Some new technique to bridge between detector and recognizer

  • RoI Rotate (多方向 e2e)
    • X Liu, et al, FOTS, CVPR 2018
  • Tailored RoI pooling (保持长宽比重采样)
    • H Li et al. Towards End-to-EndText Spotting in Natural Scenes, arXiv 20190617 (extionsion of "H Li et al ICCV 2017")
  • RoI Masking (任意形状e2e)
    • S Qin, A Bissacco, et al(Google AI), Towards Unconstrained End-to-End Text Spotting, ICCV 2019
posted @ 2019-10-18 09:22  larkii  阅读(1074)  评论(0编辑  收藏  举报