离线OCR

1.下载最新版 https://digi.bib.uni-mannheim.de/tesseract/

 

 

 

2.安装后添加环境变量,成功后测试tesseract -v

 

 

 

 

4.自带英文识别包:eng.traineddata,下载识别额外所需语言包,https://github.com/tesseract-ocr/tessdata,比如chi_sim.traineddata

5.安装python第三方库pip install pytesseract

6.识别示例

from PIL import Image
import pytesseract
words = pytesseract.image_to_string(Image.open('...xxx/test.png'), lang='chi_sim+eng')
print(words)

 

 

此文参考:

https://blog.csdn.net/ad_yangang/article/details/121294009

https://zhuanlan.zhihu.com/p/122495884

https://zhuanlan.zhihu.com/p/35687577

 

 
posted @ 2023-02-09 13:52  lingwang3  阅读(141)  评论(0编辑  收藏  举报