从图像中检测和识别表格,北航&微软提出新型数据集 TableBank

纯学术 的识别表格的文章:







该研究中,来自北航和微软亚研的研究者联合创建了一个基于图像的表格检测和识别新型数据集 TableBank,该数据集是通过对网上的 Word 和 Latex 文档进行弱监督而建立的。该数据集包含 417K 个高质量标注表格,通过此数据集作者利用深度神经网络 SOTA 模型建立了数个强大的基线,从而助力更多研究将深度学习方法应用到表格检测与识别任务中。目前 TableBank 已开源。






提取码:   ****









Because some data has copyright issues and should not be released, we filtered all the data and excluded them. We also retrain all the baseline model on the changed dataset and list them on the leaderboard website.




Leaderboard: https://doc-analysis.github.io/

If you use the corpus in published work, please cite it:


  title={TableBank: Table Benchmark for Image-based Table Detection and Recognition},

  author={Li, Minghao and Cui, Lei and Huang, Shaohan and Wei, Furu and Zhou, Ming and Li, Zhoujun},

  journal={arXiv preprint arXiv:1903.01949},












Related Resources

  • [Gilani et al., 2017] A. Gilani, S. R. Qasim, I. Malik, and F. Shafait. Table detection using deep learning. In Proc. of ICDAR 2017, volume 01, pages 771–776, Nov 2017.


posted on 2019-04-02 13:09  Angry_Panda  阅读(1461)  评论(3编辑  收藏  举报
