Python简历解析方案

Posted on 2023-08-28 16:47 Showker 阅读(360) 评论(0) 收藏举报

https://cloud.baidu.com/doc/NLP/s/Xkahvfeqa

新环境如何安装

主要参考https://aistudio.baidu.com/projectdetail/5420328?sUid=90149&shared=1&ts=1674895551833

pip install -i https://pypi.tuna.tsinghua.edu.cn/simple paddlepaddle

pip install -i https://pypi.tuna.tsinghua.edu.cn/simple --upgrade paddlenlp pip 

install -i https://pypi.tuna.tsinghua.edu.cn/simple PyMuPDF  报错 ImportError: libGL.so.1: cannot open shared object file: No such file or dir...解决Python import cv2报错 
apt-get update && apt-get install libgl1 

pip install -i https://pypi.tuna.tsinghua.edu.cn/simple pdfplumber

pip install -i https://pypi.tuna.tsinghua.edu.cn/simple openpyxl

pip install -i https://pypi.tuna.tsinghua.edu.cn/simple paddleocr

# 安装paddleocr后好像会卸载这个 pip install -i https://pypi.tuna.tsinghua.edu.cn/simple PyMuPDF

# 安装paddleocr后好像会卸载这个 pip install -i https://pypi.tuna.tsinghua.edu.cn/simple opencv-python

后来发现没有用GPU

安装GPU版本

python3 -m pip install paddlepaddle-gpu==2.5.0-rc1.post118 -f https://www.paddlepaddle.org.cn/whl/linux/cudnnin/stable.html

ImportError: libssl.so.1.1: cannot open shared object file: No such file or directory

https://www.cnblogs.com/hcxss/p/17635592.html

如何安装GPU版本的paddleocr

https://www.chinaoss.net/article/paddleocr-how-to-set-gpu-or-cpu-mode-and-how-to-code.html

nvida GPU 命令

root@mergpu001:/usr/src# nvcc -V

nvcc: NVIDIA (R) Cuda compiler driver

Built on Tue_Aug_15_22:02:13_PDT_2023

Cuda compilation tools, release 12.2, V12.2.140

Build cuda_12.2.r12.2/compiler.33191640_0

root@mergpu001:/usr/src# nvidia-smi

问题，最新的2.51paddle版本不能推理出结果，在论坛里有同学说用2.5.0rc,果然生效了

pip install paddlepaddle==2.5.0rc1  -i https://pypi.tuna.tsinghua.edu.cn/simple

GPU版本的也对应安装了 2.5.0的看看能否使用GPU

python3 -m pip install paddlepaddle-gpu==2.5.0.post120 -f https://www.paddlepaddle.org.cn/whl/linux/cudnnin/stable.html

安装好后，使用以下代码测试看是否能使用GPU了，返回True，就可以了

import paddle
gpu_available  = paddle.device.is_compiled_with_cuda()
print("GPU available:", gpu_available)

import paddle

print(paddle.__version__)

#有点变化变成2.5.0了，之前是2.5.0-rc1

2.5.0

看一下GPU后速度的变化，13个文档，原先需要3分钟，评价每个13秒，现在47秒，评价每个3秒。提升了不少

100%|█████████████████████████████████████████████████████████████████████| 13/13 [00:47<00:00, 3.68s/it]

但是打开解析结果，发现解析结果为空！重新安装2.5.0-rc1版本看看

python3 -m pip install paddlepaddle-gpu==2.5.0-rc1.post118 -f https://www.paddlepaddle.org.cn/whl/linux/cudnnin/stable.html

更换到cuda到12.0版本

https://blog.csdn.net/Netceor/article/details/129391904

再试试intelligence_document,看速度提升如何

关于教育经历和工作经历

resume_text = "姓名：张三\n联系方式：123456789\n教育经历：\n- 学校：ABC大学\n 专业：计算机科学\n 时间：2010-2014\n工作经历：\n- 公司：XYZ科技\n 职位：软件工程师\n 时间：2014-2018" name = resume_text.split("姓名：")[1].split("\n")[0] contact = resume_text.split("联系方式：")[1].split("\n")[0] education = resume_text.split("教育经历：")[1].split("工作经历：")[0].strip() work_experience = resume_text.split("工作经历：")[1].strip() print("姓名：", name) print("联系方式：", contact) print("教育经历：", education) print("工作经历：", work_experience)

还是不能用，正当我要放弃时，尝试官方的develop版本，居然成功了，而且有结果

python -m pip install paddlepaddle-gpu==0.0.0.post120 -f https://www.paddlepaddle.org.cn/whl/linux/gpu/develop.html

python搭建web服务

https://zhuanlan.zhihu.com/p/609054422?utm_id=0

这个搭配ngixn做转发

https://www.cnblogs.com/songzhixue/p/11353943.html

2023-09-03

最终要用这个方案，配合x-base?配合task?

https://github.com/PaddlePaddle/PaddleNLP/blob/develop/applications/information_extraction/taskflow_doc.md

======

安装uwsgi服务

[uwsgi]

http=0.0.0.0:81

chdir=/home/wwwroot/resume/src #视具体目录而定

wsgi-file=/home/wwwroot/resume/src/uploadresume.py #视具体目录、文件命名情况而定

callable=app

master=true

processes=4

threads=10

daemonize=uwsgi.log

disable-logging=false

pidfile=uwsgi.pid

buffer-size=65536

harakiri=60

vacuum=True

uWSGI 通过 xxx.ini 启动后会在相同目录下生成一个 xxx.pid 的文件，里面只有一行内容是 uWSGI 的主进程的进程号。

 //启动


uwsgi --ini uwsgi.ini

 //后台运行启动

uwsgi uwsgi.ini --deamonize

//停止服务

uwsgi --stop uwsgi.pid 

查看服务状态 ps aux|grep uwsgi

//可以无缝重启服务


uwsgi --reload uwsgi.pid

代码里控制重启

https://uwsgi-docs-zh.readthedocs.io/zh_CN/latest/Management.html#id3

uwsgi.reload()


文档抽取标记指南

https://github.com/PaddlePaddle/PaddleNLP/blob/develop/applications/information_extraction/label_studio_doc.md

模型抽取与训练

https://github.com/PaddlePaddle/PaddleNLP/blob/develop/applications/information_extraction/document/README.md

https://github.com/PaddlePaddle/PaddleNLP/blob/develop/docs/trainer.md

label-studio 命令找不到问题

/Users/showker/Library/Python/3.9/bin/label-studio start

python finetune.py  \
>     --device cpu \
>     --logging_steps 5 \
>     --save_steps 25 \
>     --eval_steps 25 \
>     --seed 42 \
>     --model_name_or_path uie-x-base \
>     --output_dir ./checkpoint/model_best \
>     --train_path data/train.txt \
>     --dev_path data/dev.txt  \
>     --max_seq_len 512  \
>     --per_device_train_batch_size  8 \
>     --per_device_eval_batch_size 8 \
>     --num_train_epochs 10 \
>     --learning_rate 1e-5 \
>     --do_train \
>     --do_eval \
>     --do_export \
>     --export_model_dir ./checkpoint/model_best \
>     --overwrite_output_dir \
>     --disable_tqdm True \
>     --metric_for_best_model eval_f1 \
>     --load_best_model_at_end  True \
>     --save_total_limit 1

————————————————
版权声明：本文为CSDN博主「Hi~晴天大圣」的原创文章，遵循CC 4.0 BY-SA版权协议，转载请附上原文出处链接及本声明。
原文链接：https://blog.csdn.net/baidu_24752135/article/details/123726280

刷新页面返回顶部

虚心使人进步

公告

Python简历解析方案