Python 进行 OCR识别 -- pytesseract库

pip install pytesseract

报错：tesseract is not installed or it's not in your path

下载安装 Tesseract-OCR
- https://pan.baidu.com/s/1qXumxdltxOnb0geaE_1U-Q
修改 pytesseract 源码中的路径
- 文件位置： Python安装目录 \Lib\site-packages\pytesseract\pytesseract.py
- 将 tesseract_cmd 的值改为 Tesseract-OCR 的安装路径\tesseract.exe

识别中文需要新的字库

https://pan.baidu.com/s/1GfspC5uef73B2Oa8YudBgQ
将下载的中文库放在 Tesseract-OCR 安装目录下的 tessdata 文件夹中

图片：English.png

图片：Chinese.png

识别

import pytesseract
from PIL import Image

im_en = Image.open('English.png')
im_ch = Image.open('Chinese.png')

print('========识别字母========')
print(pytesseract.image_to_string(im_en), '\n\n')

print('========识别中文========')
print(pytesseract.image_to_string(im_ch, lang='chi_sim'))

结果

posted @ 2020-01-14 13:17 三个零阅读(6348) 评论(2) 收藏举报

刷新页面返回顶部

三个零

无论走到哪里都应该记住过去都是假的回忆是一条没有尽头的路

Python 进行 OCR识别 -- pytesseract库

pip install pytesseract

报错：tesseract is not installed or it's not in your path

识别中文需要新的字库

图片：English.png

图片：Chinese.png

识别

结果

公告

三个零

无论走到哪里 都应该记住 过去都是假的 回忆是一条没有尽头的路

Python 进行 OCR识别 -- pytesseract库

pip install pytesseract

报错：tesseract is not installed or it's not in your path

识别中文需要新的字库

图片：English.png

图片：Chinese.png

识别

结果

公告

无论走到哪里都应该记住过去都是假的回忆是一条没有尽头的路