怎么搭建图片转文本GOT-OCR2.0
Github地址
http://gitlab.xiaoxingcloud.com/ai/GOT-OCR2.0.git
- 介绍
GOT-OCR2.0是一款用于图片转文字开源软件 - 环境查看
系统环境
# lsb_release -a
No LSB modules are available.
Distributor ID: Ubuntu
Description: Ubuntu 22.04.5 LTS
Release: 22.04
Codename: jammy
# uname -a
Linux AiServer003187 6.8.0-45-generic #45~22.04.1-Ubuntu SMP PREEMPT_DYNAMIC Wed Sep 11 15:25:05 UTC 2 x86_64 x86_64 x86_64 GNU/Linux
软件环境
#nvidia-smi
Thu Oct 24 09:34:48 2024
+-----------------------------------------------------------------------------------------+
| NVIDIA-SMI 550.107.02 Driver Version: 550.107.02 CUDA Version: 12.4 |
|-----------------------------------------+------------------------+----------------------+
| GPU Name Persistence-M | Bus-Id Disp.A | Volatile Uncorr. ECC |
| Fan Temp Perf Pwr:Usage/Cap | Memory-Usage | GPU-Util Compute M. |
| | | MIG M. |
|=========================================+========================+======================|
| 0 NVIDIA GeForce RTX 4090 Off | 00000000:18:00.0 Off | Off |
| 30% 34C P8 28W / 450W | 7MiB / 24564MiB | 0% Default |
| | | N/A |
+-----------------------------------------+------------------------+----------------------+
| 1 NVIDIA GeForce RTX 4090 Off | 00000000:3B:00.0 Off | Off |
| 30% 33C P8 35W / 450W | 7MiB / 24564MiB | 0% Default |
| | | N/A |
+-----------------------------------------+------------------------+----------------------+
| 2 NVIDIA GeForce RTX 4090 Off | 00000000:86:00.0 Off | Off |
| 30% 34C P8 19W / 450W | 7MiB / 24564MiB | 0% Default |
| | | N/A |
+-----------------------------------------+------------------------+----------------------+
+-----------------------------------------------------------------------------------------+
| Processes: |
| GPU GI CI PID Type Process name GPU Memory |
| ID ID Usage |
|=========================================================================================|
| No running processes found |
+-----------------------------------------------------------------------------------------+
# conda --version
conda 23.7.4
# python --version
Python 3.10.15
- 搭建
克隆代码
# git clone https://github.com/Ucas-HaoranWei/GOT-OCR2.0.git
创建虚拟环境安装依赖包
# cd GOT-OCR2.0/GOT-OCR-2.0-master
# conda create -n got python=3.10 -y
# conda activate got
# pip install -e .
安装Flash-Attention
# pip install ninja
# pip install flash-attn --no-build-isolation
下载权重
# huggingface
# https://huggingface.co/stepfun-ai/GOT-OCR2_0/blob/main/model.safetensors
# Google
# https://drive.google.com/drive/folders/1OdDtsJ8bFJYlNUzCQG4hRkUL6V-qBQaN
# 百度网盘code: OCR2
# https://pan.baidu.com/s/1G4aArpCOt6I_trHv_1SE2g#list/path=%2F
- Demo
- plain texts OCR
# python3 GOT/demo/run_ocr_2.0.py --model-name GOT_weights/ --image-file file.png --type orc
解析
# python3 GOT/demo/run_ocr_2.0.py 脚本
# --model-name GOT_weights/ 指定权重位置
# --image-file file.png 需要转换的图片
# --type orc 转换格式
输出如下
- format texts OCR
带格式转换
# python3 GOT/demo/run_ocr_2.0.py --model-name GOT_weights/ --image-file file.png --type format
输出如下
- fine-grained OCR
# python3 GOT/demo/run_ocr_2.0.py --model-name GOT_weights/ --image-file file.png --type format/ocr --box [x1,y1,x2,y2]
这里的参数x1 y1 x2 y2 我理解为一个坐标用于标识需要转换的区域
- multi-crop OCR
# python3 GOT/demo/run_ocr_2.0_crop.py --model-name GOT_weights/ --image-file file.png
输出如下
- multi-page OCR (the image path contains multiple .png files)
转换文件夹下的图片
# python GOT/demo/run_ocr_2.0_crop.py --model-name GOT_weights/ --image-file /root/GOT-OCR2.0/GOT-OCR-2.0-master/ --multi-page
输出如下
- render the formatted OCR results:
# python3 GOT/demo/run_ocr_2.0.py --model-name GOT_weights/ --image-file file.png --type format --render
输出如下