无线表格识别模型LORE转换库:ConvertLOREToONNX
引言
总有小伙伴问到阿里的无线表格识别模型是如何转换为ONNX格式的。这个说来有些惭愧,现有的ONNX模型是很久之前转换的了,转换环境已经丢失,且没有做任何笔记。
今天下定决心再次尝试转换,庆幸的是转换成功了。于是有了转换笔记:ConvertLOREToONNX。
这次吸取教训,环境文件采用Anaconda导出的,更加详细记录当前转换环境。以下是转换仓库的README,感兴趣小伙伴可以点击文末的“阅读原文”跳转到转换仓库尝试。
1. Clone the source code.
git clone https://github.com/SWHL/ConvertLaTeXOCRToONNX.git
2. Install env.
conda install --yes --file requirements.txt
3. Run the demo, and the converted model is located in the moodels
directory.
python main.py
4. Install lineless_table_rec
pip install lineless_table_rec
5. Use
from pathlib import Path
from lineless_table_rec import LinelessTableRecognition
detect_path = "models/lore_detect.onnx"
process_path = "models/lore_process.onnx"
engine = LinelessTableRecognition(
detect_model_path=detect_path, process_model_path=process_path
)
img_path = "images/lineless_table_recognition.jpg"
table_str, elapse = engine(img_path)
print(table_str)
print(elapse)
with open(f"{Path(img_path).stem}.html", "w", encoding="utf-8") as f:
f.write(table_str)
print("ok")
-----------------------------------------
你驻足于春色中,于那独一无二的春色之中。