OCR 02: Tesseract-OCR

Catalog

Project Host And Brief

Tesseract Windows Release

Link: https://github.com/UB-Mannheim/tesseract
Download: https://digi.bib.uni-mannheim.de/tesseract/

Installation

Windows

The installation size is small

  1. Download w64 binary from https://github.com/UB-Mannheim/tesseract/wiki
  2. Run it, and check the Chinese language traineddata during the install process.

Ubuntu

TBD

Usage

"C:\Program Files\Tesseract-OCR\tesseract.exe" fp01.jpg result_fp01 -l chi_sim

Performance

  • Run on CPU, Speed is much faster than EasyOCR
  • The correct rate is a little bit higher than EasyOCR, while the situation is almost the same, text can hardly be read when handling with the photos taken by a cellphone

Improving the quality of the output

Link: https://tesseract-ocr.github.io/tessdoc/ImproveQuality.html

Image processing

  • Rescaling
  • Binarisation
  • Noise Removal
  • Dilation / Erosion
  • Rotation / Deskewing
  • Borders
  • Transparency / Alpha channel
  • Tools / Libraries
  • Examples
  • Tables recognitions
  • Page segmentation method
  • Dictionaries, word lists, and patterns

posted on 2022-10-31 15:08  Milton  阅读(138)  评论(0编辑  收藏  举报

导航