python将pdf转为txt

import PyPDF2
pdffile=").pdf"
txtfile="(1).txt"
with open(pdffile,"rb") as pdf:
reader=PyPDF2.PdfReader(pdf)
text = "".join(page.extract_text() for page in reader.pages)
with open(txtfile,'w',encoding = 'utf-8') as txt:
txt.write(text)

批量转换

import os
import PyPDF2
import re
pdf_path = '.\数据PDF'
txt_path = '.\数据TXT'
pdflists = os.listdir(pdf_path)
for pdflist in pdflists:
pdffile = pdf_path + '\\' + pdflist
txtfile = txt_path + '\\' + str(re.findall('(.+).pdf',pdflist)[0]) + '.txt'
print(txtfile)
with open(pdffile,"rb") as pdf:
reader=PyPDF2.PdfReader(pdf)
text = "".join(page.extract_text() for page in reader.pages)
with open(txtfile,'w',encoding = 'utf-8') as txt:
txt.write(text)
posted @   kuanleung  阅读(251)  评论(0编辑  收藏  举报  
相关博文:
阅读排行:
· Manus爆火,是硬核还是营销?
· 终于写完轮子一部分:tcp代理 了,记录一下
· 震惊!C++程序真的从main开始吗?99%的程序员都答错了
· 别再用vector<bool>了!Google高级工程师:这可能是STL最大的设计失误
· 单元测试从入门到精通
点击右上角即可分享
微信分享提示