OCR工具 - tesseract

操作不复杂，自己看参考资料吧；

可用命令行，也可用代码

from pytesseract import pytesseract
from PIL import Image

# The path of installed tesseract
tesseract_path = r"D:/Program Files/Tesseract-OCR/tesseract.exe"

# The image path
path_img1 = r"E:\002_nlp\现场设备故障屏幕照片\2、串焊机\奥特维历史报警界面.jpg"
path_img1 = r"e:\2.jpg"
# Start the tesseract engine
pytesseract.tesseract_cmd = tesseract_path

# Open the image with PIL
img1 = Image.open(path_img1)

# Extract the text from image
text1 = pytesseract.image_to_string(img1, lang='chi_sim')
print("Image 1 text:\n", text1)

识别中文需额外装个字库，其他没啥；

识别效果受图片质量影响较大

参考资料：

https://mp.weixin.qq.com/s/jhR0eu2pjhQEndo-zngkMg　　使用 Python 的 Tesseract 库从图像中读取文本

https://www.jianshu.com/p/3326c7216696　　Tesseract-OCR 安装、中文识别与训练字库

https://gitcode.net/mirrors/tesseract-ocr/tessdata?utm_source=csdn_github_accelerator　　源码

https://www.jianshu.com/p/3326c7216696

发表于 2022-12-06 20:03 努力的孔子阅读(195) 评论(0) 收藏举报

刷新页面返回顶部

OCR工具 - tesseract

导航