Python3 centos/windows pytesseract库的安装和使用

centos下安装:

1.安装依赖
yum install -y autoconf automake libtool libjpeg libpng libtiff zlib libjpeg-devel libpng-devel libtiff-devel zlib-devel

2.安装Leptonica

wget http://www.leptonica.org/source/leptonica-1.76.0.tar.gz
tar -zxvf leptonica-1.76.0.tar.gz
cd leptonica-1.76.0
./configure
make && make install
# 配置环境变量 etc/profile末尾添加
export LD_LIBRARY_PATH=$LD_LIBRARY_PAYT:/usr/local/lib
export LIBLEPT_HEADERSDIR=/usr/local/include
export PKG_CONFIG_PATH=/usr/local/lib/pkgconfig
. /etc/profile
3.安装Tesseract-OCR wget https://github.com/tesseract-ocr/tesseract/archive/4.0.0-beta.3.tar.gz tar -zxvf tesseract-4.0.0-beta.3.tar.gz cd tesseract-4.0.0-beta.3 ./autogen.sh ./configure --with-extra-includes=/usr/local/include --with-extra-libraries=/usr/local/include make && sudo make install
# 环境变量
TESSDATA_PREFIX=/usr/local/share/tessdata # linux

windows下安装:

1.安装tessersct
https://digi.bib.uni-mannheim.de/tesseract/
2.环境变量(语言库位置)
TESSDATA_PREFIX=C:\Program Files (x86)\Tesseract-OCR\tessdata # windows

语言库下载:

https://github.com/tesseract-ocr/tesseract/wiki/Data-Files

windows 放在安装目录的tessdata下
linux 放在/usr/local/share/tessdata,/usr/local/bin/tesseract --list-langs 命令可检测已导入的语言包

python库安装:

pip3 install pillow  # pytesseract依赖
pip3 install pytesseract

使用:

import pytesseract
from PIL import Image

# pytesseract.pytesseract.tesseract_cmd = 'C://Program Files (x86)/Tesseract-OCR/tesseract.exe'  # windows下,指向tesseract.exe
pytesseract.pytesseract.tesseract_cmd = '/usr/local/bin/tesseract' # linux下,指向tesseract
res = pytesseract.image_to_string(Image.open('xx.jpg'),lang='chi_sim')  # chi_sim 中文

print(res)
posted @ 2019-08-31 14:37  exception_d  阅读(1371)  评论(0编辑  收藏  举报