python 2.7使用腾讯云的OCR-通用印刷体识别

由于公司需要识别发票信息并录入到公司内部系统,故使用了腾讯云的OCR-通用印刷体识别

腾讯云的通用印刷体识别是1000/月的免费次数,对于需求不高的发票识别是比较友好的。

以下是接口链接

https://cloud.tencent.com/document/product/866/17600

以下是参考链接

https://www.cnblogs.com/d-l-k/p/8758555.html

以下为我的项目架构

以下为正文

#!/usr/bin/python2.7
# _*_ coding: utf-8 _*_

"""
@Author: MarkLiu
"""

import requests
import hmac
import hashlib
import base64
import time
import random
import re

appid = "" # https://console.cloud.tencent.com/cam/capi 中的APPID
bucket = "" # 随意写一个就好
secret_id = "" # https://console.cloud.tencent.com/cam/capi 中的SecretId
secret_key = "" # https://console.cloud.tencent.com/cam/capi 中的SecretKey
expired = time.time() + 2592000
onceExpired = 0
current = time.time()
rdm = ''.join(random.choice("0123456789") for i in range(10))
userid = "0"
fileid = "tencentyunSignTest"

info = "a=" + appid + "&b=" + bucket + "&k=" + secret_id + "&e=" + str(expired) + "&t=" + str(current) + "&r=" + str(rdm) + "&u=0&f="

signindex = hmac.new(secret_key, info, hashlib.sha1).digest() # HMAC-SHA1加密
sign = base64.b64encode(signindex + info) # base64转码

url = "http://recognition.image.myqcloud.com/ocr/general"
headers = {'Host': 'recognition.image.myqcloud.com',
"Authorization": sign,
}
files = {
'appid': (None, appid),
'bucket': (None, bucket),
'image': ('new_scan_doc01075720181122112953_001.jpg', open('C:\\Users\\Administrator\\PycharmProjects\\untitled\\untitled\\mysite\\static\\new_scan_doc01075720181122112953_001.jpg', 'rb'), 'image/jpeg')
}

r = requests.post(url, files=files, headers=headers)

responseinfo = r.content
responseinfo = responseinfo.decode('utf-8') # 解决中文乱码

r_index = r'itemstring":"(.*?)"' # 做一个正则匹配
result = re.findall(r_index, responseinfo)
for i in result:
print i

之后运行文件,输入python aa.py runserver即可识别

基本都可以识别出来,如需线上使用提供接口即可


posted on 2018-12-10 21:24  十五岁少年  阅读(1056)  评论(0编辑  收藏  举报