OCR 01: EasyOCR

Catalog

Related Links

Installation

Install python3 and pip3

sudo apt install python3-pip

Install EasyOCR, this will take a long time for downloading around 1GiB files

q3w:~$ pip install easyocr
Defaulting to user installation because normal site-packages is not writeable
Collecting easyocr
  WARNING: Retrying (Retry(total=4, connect=None, read=None, redirect=None, status=None)) after connection broken by 'NewConnectionError('<pip._vendor.urllib3.connection.HTTPSConnection object at 0x7f42ce149930>: Failed to establish a new connection: [Errno 101] Network is unreachable')': /packages/bc/7f/389e1a886ff219682b5a56ea84f91ed785999665ac9ec1f220c7fdcd150f/easyocr-1.6.2-py3-none-any.whl
  Downloading easyocr-1.6.2-py3-none-any.whl (2.9 MB)
     ━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━ 2.9/2.9 MB 481.8 kB/s eta 0:00:00
Collecting torch
  Downloading torch-1.12.1-cp310-cp310-manylinux1_x86_64.whl (776.3 MB)
     ━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━ 776.3/776.3 MB 404.3 kB/s eta 0:00:00
Collecting scipy
  Downloading scipy-1.9.3-cp310-cp310-manylinux_2_17_x86_64.manylinux2014_x86_64.whl (33.7 MB)
     ━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━ 33.7/33.7 MB 673.8 kB/s eta 0:00:00
Collecting pyclipper
  Downloading pyclipper-1.3.0.post3-cp310-cp310-manylinux_2_12_x86_64.manylinux2010_x86_64.whl (813 kB)
     ━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━ 813.8/813.8 KB 650.2 kB/s eta 0:00:00
Collecting torchvision>=0.5
  Downloading torchvision-0.13.1-cp310-cp310-manylinux1_x86_64.whl (19.1 MB)
     ━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━ 19.1/19.1 MB 704.0 kB/s eta 0:00:00
Collecting ninja
  Downloading ninja-1.10.2.4-py2.py3-none-manylinux_2_5_x86_64.manylinux1_x86_64.whl (120 kB)
     ━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━ 120.7/120.7 KB 667.3 kB/s eta 0:00:00
Collecting opencv-python-headless<=4.5.4.60
  Downloading opencv_python_headless-4.5.4.60-cp310-cp310-manylinux_2_17_x86_64.manylinux2014_x86_64.whl (47.6 MB)
     ━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━ 47.6/47.6 MB 659.3 kB/s eta 0:00:00
Collecting numpy
  Downloading numpy-1.23.4-cp310-cp310-manylinux_2_17_x86_64.manylinux2014_x86_64.whl (17.1 MB)
     ━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━ 17.1/17.1 MB 686.0 kB/s eta 0:00:00
Requirement already satisfied: PyYAML in /usr/lib/python3/dist-packages (from easyocr) (5.4.1)
Collecting Shapely
  Downloading Shapely-1.8.5.post1-cp310-cp310-manylinux_2_12_x86_64.manylinux2010_x86_64.whl (2.0 MB)
     ━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━ 2.0/2.0 MB 679.9 kB/s eta 0:00:00
Collecting scikit-image
  Downloading scikit_image-0.19.3-cp310-cp310-manylinux_2_17_x86_64.manylinux2014_x86_64.whl (13.9 MB)
     ━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━ 13.9/13.9 MB 708.9 kB/s eta 0:00:00
Requirement already satisfied: Pillow in /usr/lib/python3/dist-packages (from easyocr) (9.0.1)
Collecting python-bidi
  Downloading python_bidi-0.4.2-py2.py3-none-any.whl (30 kB)
Collecting typing-extensions
  Downloading typing_extensions-4.4.0-py3-none-any.whl (26 kB)
Requirement already satisfied: requests in /usr/lib/python3/dist-packages (from torchvision>=0.5->easyocr) (2.25.1)
Requirement already satisfied: six in /usr/lib/python3/dist-packages (from python-bidi->easyocr) (1.16.0)
Collecting tifffile>=2019.7.26
  Downloading tifffile-2022.10.10-py3-none-any.whl (210 kB)
     ━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━ 210.3/210.3 KB 579.4 kB/s eta 0:00:00
Collecting PyWavelets>=1.1.1
  Downloading PyWavelets-1.4.1-cp310-cp310-manylinux_2_17_x86_64.manylinux2014_x86_64.whl (6.8 MB)
     ━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━ 6.8/6.8 MB 699.7 kB/s eta 0:00:00
Collecting networkx>=2.2
  Downloading networkx-2.8.7-py3-none-any.whl (2.0 MB)
     ━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━ 2.0/2.0 MB 686.2 kB/s eta 0:00:00
Collecting imageio>=2.4.1
  Downloading imageio-2.22.2-py3-none-any.whl (3.4 MB)
     ━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━ 3.4/3.4 MB 675.2 kB/s eta 0:00:00
Collecting packaging>=20.0
  Downloading packaging-21.3-py3-none-any.whl (40 kB)
     ━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━ 40.8/40.8 KB 1.1 MB/s eta 0:00:00
Requirement already satisfied: pyparsing!=3.0.5,>=2.0.2 in /usr/lib/python3/dist-packages (from packaging>=20.0->scikit-image->easyocr) (2.4.7)
...

Usage

It will download the trained data in the first run

$ python3
Python 3.10.6 (main, Aug 10 2022, 11:40:04) [GCC 11.3.0] on linux
Type "help", "copyright", "credits" or "license" for more information.
>>> import easyocr
>>> reader = easyocr.Reader(['ch_sim','en'])
CUDA not available - defaulting to CPU. Note: This module is much faster with a GPU.
Downloading detection model, please wait. This may take several minutes depending upon your network connection.
Progress: |██████████████████████████████████████████████████| 100.0% CompleteDownloading recognition model, please wait. This may take several minutes depending upon your network connection.
Progress: |██████████████████████████████████████████████████| 100.0% Complete>>> 

Recognize

# 带坐标
result = reader.readtext('Documents/fp01.png')

# 不带坐标, 合并相邻text box
result = reader.readtext('Documents/tu01.jpg', detail = 0, paragraph=True)
print(result)

Performance

  • Speed is slow when using CPU
  • The correct rate is good when extracting text from e-print or screenshot pictures
  • The correct rate drops a lot when handling the photos taken by a cellphone

Reference

posted on 2022-10-31 14:31  Milton  阅读(430)  评论(0编辑  收藏  举报

导航