文字识别
案例1
-
双击安装
-
同意
-
为所有人安装
-
下一步
-
指定安装目录
-
开始安装
-
完成
-
配置环境变量
-
配置如下
C:\Program Files (x86)\Tesseract-OCR
- 验证
# 打开cmd测试 C:\Users\ychen>tesseract -v tesseract 4.00.00alpha leptonica-1.74.1 libgif 4.1.6(?) : libjpeg 8d (libjpeg-turbo 1.5.0) : libpng 1.6.20 : libtiff 4.0.6 : zlib 1.2.8 : libwebp 0.4.3 : libopenjp2 2.1.0
- 测试
# 文件夹中放1张图片,cmd进入改目录,执行如下命令,识别文字后在当前目录保存为result.txt tesseract XXX.png result
案例2
- 安装依赖
C:\Users\ychen\Downloads>pip install pytesseract Collecting pytesseract Using cached https://mirrors.aliyun.com/pypi/packages/8b/0d/6efe2a9bddf1b1efe82a86fdd057f4affaeebd14347f32d03bbbbc45821c/pytesseract-0.3.9-py2.py3-none-any.whl pytesseract requires Python '>=3.7' but the running Python is 3.6.3 You are using pip version 9.0.1, however version 24.0 is available. You should consider upgrading via the 'python -m pip install --upgrade pip' command. C:\Users\ychen\Downloads>python -m pip install --upgrade pip Cache entry deserialization failed, entry ignored Collecting pip Downloading https://mirrors.aliyun.com/pypi/packages/a4/6d/6463d49a933f547439d6b5b98b46af8742cc03ae83543e4d7688c2420f8b/pip-21.3.1-py3-none-any.whl (1.7MB) 100% |████████████████████████████████| 1.7MB 690kB/s Installing collected packages: pip Found existing installation: pip 9.0.1 Uninstalling pip-9.0.1: Successfully uninstalled pip-9.0.1 Successfully installed pip-21.3.1 You are using pip version 21.3.1, however version 24.0 is available. You should consider upgrading via the 'python -m pip install --upgrade pip' command. C:\Users\ychen\Downloads>pip install pytesseract Looking in indexes: https://mirrors.aliyun.com/pypi/simple/ Collecting pytesseract Using cached https://mirrors.aliyun.com/pypi/packages/8b/0d/6efe2a9bddf1b1efe82a86fdd057f4affaeebd14347f32d03bbbbc45821c/pytesseract-0.3.9-py2.py3-none-any.whl (14 kB) Collecting packaging>=21.3 Downloading https://mirrors.aliyun.com/pypi/packages/05/8e/8de486cbd03baba4deef4142bd643a3e7bbe954a784dc1bb17142572d127/packaging-21.3-py3-none-any.whl (40 kB) |████████████████████████████████| 40 kB 523 kB/s Collecting Pillow>=8.0.0 Downloading https://mirrors.aliyun.com/pypi/packages/8f/10/c8dc9fff37b69b5962b7783ab4835611e83dada453cd9913d82ca2a1321b/Pillow-8.4.0-cp36-cp36m-win_amd64.whl (3.2 MB) |████████████████████████████████| 3.2 MB 731 kB/s Collecting pytesseract Downloading https://mirrors.aliyun.com/pypi/packages/a3/c9/d6e8903482bd6fb994c32722831d15842dd8b614f94ad9ca735807252671/pytesseract-0.3.8.tar.gz (14 kB) Preparing metadata (setup.py) ... done Requirement already satisfied: Pillow in c:\programdata\anaconda3\lib\site-packages (from pytesseract) (4.2.1) Requirement already satisfied: olefile in c:\programdata\anaconda3\lib\site-packages (from Pillow->pytesseract) (0.44) Building wheels for collected packages: pytesseract Building wheel for pytesseract (setup.py) ... done Created wheel for pytesseract: filename=pytesseract-0.3.8-py2.py3-none-any.whl size=18780 sha256=b49587077ddccb20cbf67c10130b4c15f04fc585cbc36dcf53563d169d9df4de Stored in directory: c:\users\ychen\appdata\local\pip\cache\wheels\ab\76\70\c080b97e409de2fe41cf2d9ecb97f0629a66c7126eb7c9eb44 Successfully built pytesseract Installing collected packages: pytesseract Successfully installed pytesseract-0.3.8
- 配置路径
# 使用编辑器打开如下文件 C:\ProgramData\Anaconda3\Lib\site-packages\pytesseract\pytesseract.py # 配置路径如下 #tesseract_cmd = 'tesseract' tesseract_cmd = 'C:/Program Files (x86)/Tesseract-OCR/tesseract.exe'
- 代码
from PIL import Image import pytesseract import cv2 import os preprocess = 'blur' #thresh image = cv2.imread('scan.jpg') gray = cv2.cvtColor(image, cv2.COLOR_BGR2GRAY) if preprocess == "thresh": gray = cv2.threshold(gray, 0, 255,cv2.THRESH_BINARY | cv2.THRESH_OTSU)[1] if preprocess == "blur": gray = cv2.medianBlur(gray, 3) filename = "{}.png".format(os.getpid()) cv2.imwrite(filename, gray) text = pytesseract.image_to_string(Image.open(filename)) print(text) os.remove(filename) cv2.imshow("Image", image) cv2.imshow("Output", gray) cv2.waitKey(0)
- 执行结果
点击查看详情
we owe oak wk ome owe ow wo Sk we %o %o %K WHOLE FOODS MARKET - WESTPORT,.CT 06880 399 POST RD WEST - (203) 227-6858 64 365 365 365 BACULN LS BACON LS BACON LS BACON iS BRO TH CHIC FLOUR ALMUNU CHKN BRST BNLSS SK HEAVY CREAM BALSMC REDUCT BEEF GRND JUICE COF CRSHEW 85/15 L. DOCS PINT QORGAK IC HNY ALMOND Bui TR * x ## TAX . 00 BAL NP NP NP NP NP NP NP NP NP NP NP NP NP 4 99 4.99 4.99 1 39 2.19 1.99 . 80 . 39 . 49 tl & on 8.99 14.49 9.99 101.33 m "Ti m n m
【推荐】国内首个AI IDE,深度理解中文开发场景,立即下载体验Trae
【推荐】编程新体验,更懂你的AI,立即体验豆包MarsCode编程助手
【推荐】抖音旗下AI助手豆包,你的智能百科全书,全免费不限次数
【推荐】轻量又高性能的 SSH 工具 IShell:AI 加持,快人一步
· TypeScript + Deepseek 打造卜卦网站:技术与玄学的结合
· 阿里巴巴 QwQ-32B真的超越了 DeepSeek R-1吗?
· 如何调用 DeepSeek 的自然语言处理 API 接口并集成到在线客服系统
· 【译】Visual Studio 中新的强大生产力特性
· 2025年我用 Compose 写了一个 Todo App
2021-02-28 MySQL基础(一)