锋行_THU_SJTU

  博客园 :: 首页 :: 博问 :: 闪存 :: 新随笔 :: 联系 :: 订阅 订阅 :: 管理 ::

今天跟小老板讨论了一下,流程上基本上确定了,现在就差先简单实现一下然后看看有什么大方向的问题了。

首先是把之前测试的opencv的代码封装成方法,这样就可以调用了。

然后就是找一个靠谱的ocr库。

结果我就败在这儿了…………

这里使用了google的tesseract库,安装方法如下:

1. 下载安装包。https://github.com/tesseract-ocr/tesseract/wiki/Downloads

2. 安装python库。 pip install pytesseract

然后就可以开搞了。

这里注意两个问题:

一个是需要在python代码里加一句话指明你安装的tesseract的位置。

pytesseract.pytesseract.tesseract_cmd = r'C:\Program Files (x86)\Tesseract-OCR\tesseract.exe'

另一个是需要在系统环境变量里添加TESSDATA_PREFIX项:C:\Program Files (x86)\Tesseract-OCR\tessdata

最后就是,环境变量改了之后记得重启pycharm………………

还有一个点就是要把opencv中图片的格式np.ndarray转成tessdata要的格式(这里应该是PIL实现的)。

然后就可以试试看效果了。

结果效果真的差………………

不开心……

看来至少要找一个还可以的方法先对付上…………我决定看一下师姐的代码是怎么实现的,大概……

今天的代码如下:

import sys
import os

import cv2
import numpy as np
from PIL import Image
import pytesseract


def td_test(img):
    print("test")
    print('\ntextdetection.py')
    print('       A demo script of the Extremal Region Filter algorithm described in:')
    print('       Neumann L., Matas J.: Real-Time Scene Text Localization and Recognition, CVPR 2012\n')

    # if (len(sys.argv) < 2):
    #   print(' (ERROR) You must call this script with an argument (path_to_image_to_be_processed)\n')
    #   quit()

    # pathname = os.path.dirname(sys.argv[0])
    # pathname = os.path.dirname('D:/MyProject/PyCharm/testcode')

    # img      = cv2.imread(str(sys.argv[1]))
    # img = cv2.imread('test.jpg')
    # for visualization
    # vis = img.copy()

    # Extract channels to be processed individually
    channels = cv2.text.computeNMChannels(img)
    # Append negative channels to detect ER- (bright regions over dark background)
    cn = len(channels) - 1
    for c in range(0, cn):
        channels.append((255 - channels[c]))

    # Apply the default cascade classifier to each independent channel (could be done in parallel)
    print("Extracting Class Specific Extremal Regions from " + str(len(channels)) + " channels ...")
    print("    (...) this may take a while (...)")
    answer = []
    for channel in channels:
        # erc1 = cv2.text.loadClassifierNM1(pathname+'/trained_classifierNM1.xml')
        erc1 = cv2.text.loadClassifierNM1('trained_classifierNM1.xml')
        er1 = cv2.text.createERFilterNM1(erc1, 16, 0.00015, 0.13, 0.2, True, 0.1)

        # erc2 = cv2.text.loadClassifierNM2(pathname+'/trained_classifierNM2.xml')
        erc2 = cv2.text.loadClassifierNM2('trained_classifierNM2.xml')
        er2 = cv2.text.createERFilterNM2(erc2, 0.5)

        regions = cv2.text.detectRegions(channel, er1, er2)

        rects = cv2.text.erGrouping(img, channel, [r.tolist() for r in regions])
        # rects = cv2.text.erGrouping(img,channel,[x.tolist() for x in regions], cv2.text.ERGROUPING_ORIENTATION_ANY,'../../GSoC2014/opencv_contrib/modules/text/samples/trained_classifier_erGrouping.xml',0.5)

        # print(rects);
        # print(np.shape(rects)[0]);

        # Visualization
        for r in range(0, np.shape(rects)[0]):
            rect = rects[r]
            answer.append(rect)
            # print(rect)
            # cv2.rectangle(vis, (rect[0], rect[1]), (rect[0] + rect[2], rect[1] + rect[3]), (0, 0, 0), 2)
            # cv2.rectangle(vis, (rect[0], rect[1]), (rect[0] + rect[2], rect[1] + rect[3]), (255, 255, 255), 1)

    # Visualization
    # cv2.imshow("Text detection result", vis)
    # cv2.waitKey(0)

    return answer

if __name__ == '__main__':
    img = cv2.imread('test.jpg')
    vis = img.copy()
    answer = td_test(img)
    pytesseract.pytesseract.tesseract_cmd = r'C:\Program Files (x86)\Tesseract-OCR\tesseract.exe'
    for rect in answer:
        print(rect)
        cv2.rectangle(vis, (rect[0], rect[1]), (rect[0] + rect[2], rect[1] + rect[3]), (0, 0, 0), 2)
        cv2.rectangle(vis, (rect[0], rect[1]), (rect[0] + rect[2], rect[1] + rect[3]), (255, 255, 255), 1)
        img1 = vis[rect[1]:rect[1] + rect[3], rect[0]:rect[0] + rect[2]]
        # (thresh, img1) = cv2.threshold(img1, 128, 255, cv2.THRESH_BINARY | cv2.THRESH_OTSU)
        img2 = Image.fromarray(img1)
        txt = pytesseract.image_to_string(img2)
        print(txt)
        # cv2.imshow("test", img1)
        # cv2.waitKey(0)
    cv2.imshow("Text detection result", vis)
    cv2.waitKey(0)

今天的工作参考了以下网页的内容:

https://pypi.python.org/pypi/pytesseract

https://stackoverflow.com/questions/30404756/how-to-pass-opencv-image-to-tesseract-in-python (这个是关于把opencv的np.ndarray转化为tesseract需要的图片格式的方法)

https://testerhome.com/topics/4615

https://github.com/upupnaway/digital-display-character-rec (这个是一个opencv和tesseract实现的文字提取,但是它文字提取的方法是用的腐蚀膨胀…………我有一种它的方法没准比我现在用的效果要好的预感……明天试试好了)

http://www.cnblogs.com/syqlp/p/5462459.html (这个是一个tesseract的例子,我觉得我提取文字位置的方法真的不靠谱…………但是真的有靠谱的方法么?好气啊)

https://stackoverflow.com/questions/14800730/tesseract-running-error (这个是tesseract的运行错误,但是并不是我过程中出错的原因)

http://blog.csdn.net/liqiancao/article/details/55670749 (剪裁图片参考了这个)

http://www.cnblogs.com/hupeng1234/p/7136442.html (这个说tesseract的错误说的很细)

大概是这样。

posted on 2017-10-31 22:44  锋行_THU_SJTU  阅读(219)  评论(0编辑  收藏  举报