离线OCR

1.下载最新版 https://digi.bib.uni-mannheim.de/tesseract/

2.安装后添加环境变量，成功后测试tesseract -v

4.自带英文识别包：eng.traineddata，下载识别额外所需语言包，https://github.com/tesseract-ocr/tessdata，比如chi_sim.traineddata

5.安装python第三方库pip install pytesseract

6.识别示例

from PIL import Image
import pytesseract
words = pytesseract.image_to_string(Image.open('...xxx/test.png'), lang='chi_sim+eng')
print(words)

此文参考：

https://blog.csdn.net/ad_yangang/article/details/121294009

https://zhuanlan.zhihu.com/p/122495884

https://zhuanlan.zhihu.com/p/35687577

posted @ 2023-02-09 13:52 lingwang3 阅读(144) 评论(0) 编辑收藏举报

刷新页面返回顶部

lingwang3