今天跟小老板讨论了一下,流程上基本上确定了,现在就差先简单实现一下然后看看有什么大方向的问题了。
首先是把之前测试的opencv的代码封装成方法,这样就可以调用了。
然后就是找一个靠谱的ocr库。
结果我就败在这儿了…………
这里使用了google的tesseract库,安装方法如下:
1. 下载安装包。https://github.com/tesseract-ocr/tesseract/wiki/Downloads
2. 安装python库。 pip install pytesseract
然后就可以开搞了。
这里注意两个问题:
一个是需要在python代码里加一句话指明你安装的tesseract的位置。
pytesseract.pytesseract.tesseract_cmd = r'C:\Program Files (x86)\Tesseract-OCR\tesseract.exe'
另一个是需要在系统环境变量里添加TESSDATA_PREFIX项:C:\Program Files (x86)\Tesseract-OCR\tessdata
最后就是,环境变量改了之后记得重启pycharm………………
还有一个点就是要把opencv中图片的格式np.ndarray转成tessdata要的格式(这里应该是PIL实现的)。
然后就可以试试看效果了。
结果效果真的差………………
不开心……
看来至少要找一个还可以的方法先对付上…………我决定看一下师姐的代码是怎么实现的,大概……
今天的代码如下:
import sys import os import cv2 import numpy as np from PIL import Image import pytesseract def td_test(img): print("test") print('\ntextdetection.py') print(' A demo script of the Extremal Region Filter algorithm described in:') print(' Neumann L., Matas J.: Real-Time Scene Text Localization and Recognition, CVPR 2012\n') # if (len(sys.argv) < 2): # print(' (ERROR) You must call this script with an argument (path_to_image_to_be_processed)\n') # quit() # pathname = os.path.dirname(sys.argv[0]) # pathname = os.path.dirname('D:/MyProject/PyCharm/testcode') # img = cv2.imread(str(sys.argv[1])) # img = cv2.imread('test.jpg') # for visualization # vis = img.copy() # Extract channels to be processed individually channels = cv2.text.computeNMChannels(img) # Append negative channels to detect ER- (bright regions over dark background) cn = len(channels) - 1 for c in range(0, cn): channels.append((255 - channels[c])) # Apply the default cascade classifier to each independent channel (could be done in parallel) print("Extracting Class Specific Extremal Regions from " + str(len(channels)) + " channels ...") print(" (...) this may take a while (...)") answer = [] for channel in channels: # erc1 = cv2.text.loadClassifierNM1(pathname+'/trained_classifierNM1.xml') erc1 = cv2.text.loadClassifierNM1('trained_classifierNM1.xml') er1 = cv2.text.createERFilterNM1(erc1, 16, 0.00015, 0.13, 0.2, True, 0.1) # erc2 = cv2.text.loadClassifierNM2(pathname+'/trained_classifierNM2.xml') erc2 = cv2.text.loadClassifierNM2('trained_classifierNM2.xml') er2 = cv2.text.createERFilterNM2(erc2, 0.5) regions = cv2.text.detectRegions(channel, er1, er2) rects = cv2.text.erGrouping(img, channel, [r.tolist() for r in regions]) # rects = cv2.text.erGrouping(img,channel,[x.tolist() for x in regions], cv2.text.ERGROUPING_ORIENTATION_ANY,'../../GSoC2014/opencv_contrib/modules/text/samples/trained_classifier_erGrouping.xml',0.5) # print(rects); # print(np.shape(rects)[0]); # Visualization for r in range(0, np.shape(rects)[0]): rect = rects[r] answer.append(rect) # print(rect) # cv2.rectangle(vis, (rect[0], rect[1]), (rect[0] + rect[2], rect[1] + rect[3]), (0, 0, 0), 2) # cv2.rectangle(vis, (rect[0], rect[1]), (rect[0] + rect[2], rect[1] + rect[3]), (255, 255, 255), 1) # Visualization # cv2.imshow("Text detection result", vis) # cv2.waitKey(0) return answer if __name__ == '__main__': img = cv2.imread('test.jpg') vis = img.copy() answer = td_test(img) pytesseract.pytesseract.tesseract_cmd = r'C:\Program Files (x86)\Tesseract-OCR\tesseract.exe' for rect in answer: print(rect) cv2.rectangle(vis, (rect[0], rect[1]), (rect[0] + rect[2], rect[1] + rect[3]), (0, 0, 0), 2) cv2.rectangle(vis, (rect[0], rect[1]), (rect[0] + rect[2], rect[1] + rect[3]), (255, 255, 255), 1) img1 = vis[rect[1]:rect[1] + rect[3], rect[0]:rect[0] + rect[2]] # (thresh, img1) = cv2.threshold(img1, 128, 255, cv2.THRESH_BINARY | cv2.THRESH_OTSU) img2 = Image.fromarray(img1) txt = pytesseract.image_to_string(img2) print(txt) # cv2.imshow("test", img1) # cv2.waitKey(0) cv2.imshow("Text detection result", vis) cv2.waitKey(0)
今天的工作参考了以下网页的内容:
https://pypi.python.org/pypi/pytesseract
https://stackoverflow.com/questions/30404756/how-to-pass-opencv-image-to-tesseract-in-python (这个是关于把opencv的np.ndarray转化为tesseract需要的图片格式的方法)
https://testerhome.com/topics/4615
https://github.com/upupnaway/digital-display-character-rec (这个是一个opencv和tesseract实现的文字提取,但是它文字提取的方法是用的腐蚀膨胀…………我有一种它的方法没准比我现在用的效果要好的预感……明天试试好了)
http://www.cnblogs.com/syqlp/p/5462459.html (这个是一个tesseract的例子,我觉得我提取文字位置的方法真的不靠谱…………但是真的有靠谱的方法么?好气啊)
https://stackoverflow.com/questions/14800730/tesseract-running-error (这个是tesseract的运行错误,但是并不是我过程中出错的原因)
http://blog.csdn.net/liqiancao/article/details/55670749 (剪裁图片参考了这个)
http://www.cnblogs.com/hupeng1234/p/7136442.html (这个说tesseract的错误说的很细)
大概是这样。