展开
拓展 关闭
订阅号推广码
GitHub
视频
公告栏 关闭

文字识别

案例1

  • 下载tesseract-ocr

  • 双击安装

  • 同意

  • 为所有人安装

  • 下一步

  • 指定安装目录

  • 开始安装

  • 完成

  • 配置环境变量

  • 配置如下

C:\Program Files (x86)\Tesseract-OCR
  • 验证
# 打开cmd测试
C:\Users\ychen>tesseract -v
tesseract 4.00.00alpha
leptonica-1.74.1
libgif 4.1.6(?) : libjpeg 8d (libjpeg-turbo 1.5.0) : libpng 1.6.20 : libtiff 4.0.6 : zlib 1.2.8 : libwebp 0.4.3 : libopenjp2 2.1.0
  • 测试
# 文件夹中放1张图片,cmd进入改目录,执行如下命令,识别文字后在当前目录保存为result.txt
tesseract XXX.png result

案例2

  • 安装依赖
C:\Users\ychen\Downloads>pip install pytesseract
Collecting pytesseract
Using cached https://mirrors.aliyun.com/pypi/packages/8b/0d/6efe2a9bddf1b1efe82a86fdd057f4affaeebd14347f32d03bbbbc45821c/pytesseract-0.3.9-py2.py3-none-any.whl
pytesseract requires Python '>=3.7' but the running Python is 3.6.3
You are using pip version 9.0.1, however version 24.0 is available.
You should consider upgrading via the 'python -m pip install --upgrade pip' command.
C:\Users\ychen\Downloads>python -m pip install --upgrade pip
Cache entry deserialization failed, entry ignored
Collecting pip
Downloading https://mirrors.aliyun.com/pypi/packages/a4/6d/6463d49a933f547439d6b5b98b46af8742cc03ae83543e4d7688c2420f8b/pip-21.3.1-py3-none-any.whl (1.7MB)
100% |████████████████████████████████| 1.7MB 690kB/s
Installing collected packages: pip
Found existing installation: pip 9.0.1
Uninstalling pip-9.0.1:
Successfully uninstalled pip-9.0.1
Successfully installed pip-21.3.1
You are using pip version 21.3.1, however version 24.0 is available.
You should consider upgrading via the 'python -m pip install --upgrade pip' command.
C:\Users\ychen\Downloads>pip install pytesseract
Looking in indexes: https://mirrors.aliyun.com/pypi/simple/
Collecting pytesseract
Using cached https://mirrors.aliyun.com/pypi/packages/8b/0d/6efe2a9bddf1b1efe82a86fdd057f4affaeebd14347f32d03bbbbc45821c/pytesseract-0.3.9-py2.py3-none-any.whl (14 kB)
Collecting packaging>=21.3
Downloading https://mirrors.aliyun.com/pypi/packages/05/8e/8de486cbd03baba4deef4142bd643a3e7bbe954a784dc1bb17142572d127/packaging-21.3-py3-none-any.whl (40 kB)
|████████████████████████████████| 40 kB 523 kB/s
Collecting Pillow>=8.0.0
Downloading https://mirrors.aliyun.com/pypi/packages/8f/10/c8dc9fff37b69b5962b7783ab4835611e83dada453cd9913d82ca2a1321b/Pillow-8.4.0-cp36-cp36m-win_amd64.whl (3.2 MB)
|████████████████████████████████| 3.2 MB 731 kB/s
Collecting pytesseract
Downloading https://mirrors.aliyun.com/pypi/packages/a3/c9/d6e8903482bd6fb994c32722831d15842dd8b614f94ad9ca735807252671/pytesseract-0.3.8.tar.gz (14 kB)
Preparing metadata (setup.py) ... done
Requirement already satisfied: Pillow in c:\programdata\anaconda3\lib\site-packages (from pytesseract) (4.2.1)
Requirement already satisfied: olefile in c:\programdata\anaconda3\lib\site-packages (from Pillow->pytesseract) (0.44)
Building wheels for collected packages: pytesseract
Building wheel for pytesseract (setup.py) ... done
Created wheel for pytesseract: filename=pytesseract-0.3.8-py2.py3-none-any.whl size=18780 sha256=b49587077ddccb20cbf67c10130b4c15f04fc585cbc36dcf53563d169d9df4de
Stored in directory: c:\users\ychen\appdata\local\pip\cache\wheels\ab\76\70\c080b97e409de2fe41cf2d9ecb97f0629a66c7126eb7c9eb44
Successfully built pytesseract
Installing collected packages: pytesseract
Successfully installed pytesseract-0.3.8
  • 配置路径
# 使用编辑器打开如下文件
C:\ProgramData\Anaconda3\Lib\site-packages\pytesseract\pytesseract.py
# 配置路径如下
#tesseract_cmd = 'tesseract'
tesseract_cmd = 'C:/Program Files (x86)/Tesseract-OCR/tesseract.exe'
  • 代码
from PIL import Image
import pytesseract
import cv2
import os
preprocess = 'blur' #thresh
image = cv2.imread('scan.jpg')
gray = cv2.cvtColor(image, cv2.COLOR_BGR2GRAY)
if preprocess == "thresh":
gray = cv2.threshold(gray, 0, 255,cv2.THRESH_BINARY | cv2.THRESH_OTSU)[1]
if preprocess == "blur":
gray = cv2.medianBlur(gray, 3)
filename = "{}.png".format(os.getpid())
cv2.imwrite(filename, gray)
text = pytesseract.image_to_string(Image.open(filename))
print(text)
os.remove(filename)
cv2.imshow("Image", image)
cv2.imshow("Output", gray)
cv2.waitKey(0)
  • 执行结果
点击查看详情
we owe oak wk ome owe ow wo Sk we %o %o %K
WHOLE FOODS MARKET - WESTPORT,.CT 06880
399 POST RD WEST - (203) 227-6858
64
365
365
365
BACULN LS
BACON LS
BACON LS
BACON iS
BRO TH CHIC
FLOUR ALMUNU
CHKN BRST BNLSS SK
HEAVY CREAM
BALSMC REDUCT
BEEF
GRND
JUICE COF CRSHEW
85/15
L.
DOCS PINT QORGAK IC
HNY ALMOND Bui TR
* x ## TAX
. 00
BAL
NP
NP
NP
NP
NP
NP
NP
NP
NP
NP
NP
NP
NP
4 99
4.99
4.99
1 39
2.19
1.99
. 80
. 39
. 49
tl &
on
8.99
14.49
9.99
101.33
m
"Ti
m n m
posted @   DogLeftover  阅读(10)  评论(0编辑  收藏  举报
相关博文:
阅读排行:
· TypeScript + Deepseek 打造卜卦网站:技术与玄学的结合
· 阿里巴巴 QwQ-32B真的超越了 DeepSeek R-1吗?
· 如何调用 DeepSeek 的自然语言处理 API 接口并集成到在线客服系统
· 【译】Visual Studio 中新的强大生产力特性
· 2025年我用 Compose 写了一个 Todo App
历史上的今天:
2021-02-28 MySQL基础(一)
点击右上角即可分享
微信分享提示

目录导航