怎么搭建图片转文本GOT-OCR2.0

Github地址
http://gitlab.xiaoxingcloud.com/ai/GOT-OCR2.0.git

  1. 介绍
    GOT-OCR2.0是一款用于图片转文字开源软件
  2. 环境查看
    系统环境
# lsb_release -a
No LSB modules are available.
Distributor ID:	Ubuntu
Description:	Ubuntu 22.04.5 LTS
Release:	22.04
Codename:	jammy
# uname -a
Linux AiServer003187 6.8.0-45-generic #45~22.04.1-Ubuntu SMP PREEMPT_DYNAMIC Wed Sep 11 15:25:05 UTC 2 x86_64 x86_64 x86_64 GNU/Linux

软件环境

#nvidia-smi 
Thu Oct 24 09:34:48 2024       
+-----------------------------------------------------------------------------------------+
| NVIDIA-SMI 550.107.02             Driver Version: 550.107.02     CUDA Version: 12.4     |
|-----------------------------------------+------------------------+----------------------+
| GPU  Name                 Persistence-M | Bus-Id          Disp.A | Volatile Uncorr. ECC |
| Fan  Temp   Perf          Pwr:Usage/Cap |           Memory-Usage | GPU-Util  Compute M. |
|                                         |                        |               MIG M. |
|=========================================+========================+======================|
|   0  NVIDIA GeForce RTX 4090        Off |   00000000:18:00.0 Off |                  Off |
| 30%   34C    P8             28W /  450W |       7MiB /  24564MiB |      0%      Default |
|                                         |                        |                  N/A |
+-----------------------------------------+------------------------+----------------------+
|   1  NVIDIA GeForce RTX 4090        Off |   00000000:3B:00.0 Off |                  Off |
| 30%   33C    P8             35W /  450W |       7MiB /  24564MiB |      0%      Default |
|                                         |                        |                  N/A |
+-----------------------------------------+------------------------+----------------------+
|   2  NVIDIA GeForce RTX 4090        Off |   00000000:86:00.0 Off |                  Off |
| 30%   34C    P8             19W /  450W |       7MiB /  24564MiB |      0%      Default |
|                                         |                        |                  N/A |
+-----------------------------------------+------------------------+----------------------+
                                                                                         
+-----------------------------------------------------------------------------------------+
| Processes:                                                                              |
|  GPU   GI   CI        PID   Type   Process name                              GPU Memory |
|        ID   ID                                                               Usage      |
|=========================================================================================|
|  No running processes found                                                             |
+-----------------------------------------------------------------------------------------+

# conda --version
conda 23.7.4

# python --version
Python 3.10.15

  1. 搭建
    克隆代码
# git clone https://github.com/Ucas-HaoranWei/GOT-OCR2.0.git

创建虚拟环境安装依赖包

# cd GOT-OCR2.0/GOT-OCR-2.0-master
# conda create -n got python=3.10 -y
# conda activate got
# pip install -e .

安装Flash-Attention

# pip install ninja
# pip install flash-attn --no-build-isolation

下载权重

# huggingface
# https://huggingface.co/stepfun-ai/GOT-OCR2_0/blob/main/model.safetensors
# Google
# https://drive.google.com/drive/folders/1OdDtsJ8bFJYlNUzCQG4hRkUL6V-qBQaN
# 百度网盘code: OCR2
# https://pan.baidu.com/s/1G4aArpCOt6I_trHv_1SE2g#list/path=%2F
  1. Demo
  • plain texts OCR
# python3 GOT/demo/run_ocr_2.0.py --model-name GOT_weights/ --image-file file.png --type orc

解析

# python3 GOT/demo/run_ocr_2.0.py 脚本
# --model-name GOT_weights/ 指定权重位置
# --image-file file.png 需要转换的图片
# --type orc 转换格式

输出如下
image

  • format texts OCR
    带格式转换
# python3 GOT/demo/run_ocr_2.0.py --model-name GOT_weights/ --image-file file.png --type format

输出如下
image

  • fine-grained OCR
# python3 GOT/demo/run_ocr_2.0.py  --model-name  GOT_weights/  --image-file  file.png  --type format/ocr --box [x1,y1,x2,y2]

这里的参数x1 y1 x2 y2 我理解为一个坐标用于标识需要转换的区域
image

  • multi-crop OCR
# python3 GOT/demo/run_ocr_2.0_crop.py --model-name GOT_weights/ --image-file file.png

输出如下
image

  • multi-page OCR (the image path contains multiple .png files)
    转换文件夹下的图片
# python GOT/demo/run_ocr_2.0_crop.py --model-name GOT_weights/ --image-file /root/GOT-OCR2.0/GOT-OCR-2.0-master/ --multi-page

输出如下
image

  • render the formatted OCR results:
#  python3 GOT/demo/run_ocr_2.0.py --model-name GOT_weights/ --image-file file.png --type format --render

输出如下
image

posted @ 2024-10-24 10:30  minseo  阅读(56)  评论(0编辑  收藏  举报