ICCV2019 TEXT PAPERS
- 1.Efficient and Accurate Arbitrary-Shaped Text Detection with Pixel Aggregation Network
提出了一个高速检测网络。
Scene text detection, an important step of scene text reading systems, has witnessed rapid development with convolutional neural networks. Nonetheless, two main challenges still exist and hamper its deployment to real-world applications. The first problem is the trade-off between speed and accuracy. The second one is to model the arbitrary-shaped text instance. Recently, some methods have been proposed to tackle arbitrary-shaped text detection, but they rarely take the speed of the entire pipeline into consideration, which may fall short in practical applications. In this paper, we propose an efficient and accurate arbitrary-shaped text detector, termed Pixel Aggregation Network (PAN), which is equipped with a low computational-cost segmentation head and a learnable post-processing. More specifically, the segmentation head is made up of Feature Pyramid Enhancement Module (FPEM) and Feature Fusion Module (FFM). FPEM is a cascadable U-shaped module, which can introduce multi-level information to guide the better segmentation. FFM can gather the features given by the FPEMs of different depths into a final feature for segmentation. The learnable post-processing is implemented by Pixel Aggregation (PA), which can precisely aggregate text pixels by predicted similarity vectors. Experiments on several standard benchmarks validate the superiority of the proposed PAN. It is worth noting that our method can achieve a competitive F-measure of 79.9% at 84.2 FPS on CTW1500.
场景文本检测是场景文本阅读系统的重要组成部分,卷积神经网络使其得到了快速发展。尽管如此,仍然存在两个主要的挑战,并且阻碍了它在实际应用程序中的部署。第一个问题是速度和准确度之间的权衡。第二种方法是对任意形状的文本实例建模。近年来,针对任意形状文本检测提出了一些方法,但很少考虑整个管道的速度,在实际应用中可能存在不足。本文提出了一种高效、准确的任意形状文本检测器——像素汇聚网络(PAN)。具体来说,分割头由特征金字塔增强模块(FPEM)和特征融合模块(FFM)组成。FPEM是一个可级联的u型模块,它可以引入多级信息来指导更好的分割。FFM可以将不同深度的FPEMs给出的特征集合成一个最终的特征进行分割。该方法通过像素聚合(PA)实现可学习的后处理,通过预测相似度向量对文本像素进行精确的聚类。在几个标准基准上的实验验证了所提出的PAN的优越性。值得注意的是,我们的方法可以在CTW1500上以84.2 FPS的速度获得79.9%的F-measure。
- 2.GA-DAN_ Geometry-Aware Domain Adaptation Network for Scene Text Detection and Recognition
Recent adversarial learning research has achieved very impressive progress for modelling cross-domain data shifts in appearance space but its counterpart in modelling crossdomain shifts in geometry space lags far behind. This paper presents an innovative Geometry-Aware Domain Adaptation Network (GA-DAN) that is capable of modelling crossdomain shifts concurrently in both geometry space and appearance space and realistically converting images across domains with very different characteristics. In the proposed GA-DAN, a novel multi-modal spatial learning technique is designed which converts a source-domain image into multiple images of different spatial views as in the target domain. A new disentangled cycle-consistency loss is introduced which balances the cycle consistency in appearance and geometry spaces and improves the learning of the whole network greatly. The proposed GA-DAN has been evaluated for the classic scene text detection and recognition tasks, and experiments show that the domain-adapted images achieve superior scene text detection and recognition performance while applied to network training.
近年来,对抗性学习研究在外观空间跨域数据转移建模方面取得了令人瞩目的进展,但在几何空间跨域数据转移建模方面却远远落后。本文提出了一种新颖的几何感知域自适应网络(GA-DAN),它能够在几何空间和外观空间中同时建模跨域移动,并在具有不同特征的域之间真实地转换图像。提出了一种新的多模态空间学习方法,将源域图像转换为目标域不同空间视图的多幅图像。提出了一种新的解纠缠周期一致性损失方法,该方法平衡了表面和几何空间的周期一致性,大大提高了整个网络的学习性能。针对经典的场景文本检测和识别任务,对提出的GA-DAN算法进行了评估,实验表明,自适应域图像在网络训练中具有较好的场景文本检测和识别性能。