Python 安装 pytesser 处理验证码出现的问题
今天这个问题困扰了我好久,开始直接用
pip install pytesseract
安装了 pytesseract 然后出现了如下错误
Traceback (most recent call last): File "E:\eclipse_workspace\web_scraping\src\web_page_interaction\test.py", line 7, in <module> print pytesseract.image_to_string(image) # Run tesseract.exe on image File "F:\Python\lib\site-packages\pytesseract\pytesseract.py", line 161, in image_to_string config=config) File "F:\Python\lib\site-packages\pytesseract\pytesseract.py", line 94, in run_tesseract stderr=subprocess.PIPE) File "F:\Python\lib\subprocess.py", line 711, in __init__ errread, errwrite) File "F:\Python\lib\subprocess.py", line 959, in _execute_child startupinfo) WindowsError: [Error 2]
然后我去网上查找了各种方法无果,于是决定安装 pytesser ,其安装过程(windows)可以见:http://blog.csdn.net/evankaka/article/details/49533493
然后当安装好后运行:
from PIL import Image from pytesser import * image = Image.open('captcha3.png') print image_to_string(image)
然后会出现一个 ImportError:这里的解决方案是跟进出现错误的文件: 将 import Image 改为 from PIL import Image
然后还是会出现最开始的错误:这里的话是因为 'F:\Python\Lib\site-packages\pytesser\pytesser.py' 里面的 tesseract_exe_name='tesseract' 里面是相对路径,改成绝对路径就好了:tesseract_exe_name='F:\\Python\\Lib\\site-packages\\pytesser\\tesseract'
最后就可以运行了:
运行结果:
最后,关于处理图像的灰度等等问题可见: http://www.cnblogs.com/apexchu/p/4231041.html
这里我想问一个问题:为什么一张图 img 要经过转化成灰度图后才能够被识别?
t.png
t1.png
import pytesser from PIL import Image img = Image.open('t.png') print pytesser.image_to_string(img) img = img.convert('L') img.save('t1.png') print pytesser.image_to_string(img)
输出结果为:
X
Ariali Amazingly few discotheques
provide jukeboxes.