随笔- 286 文章- 0 评论- 0 阅读- 16542

python将pdf转为txt

 import PyPDF2
pdffile=").pdf"
txtfile="(1).txt"
with open(pdffile,"rb") as pdf:
    reader=PyPDF2.PdfReader(pdf)
    text = "".join(page.extract_text() for page in reader.pages)
    with open(txtfile,'w',encoding = 'utf-8') as txt:
        txt.write(text)

批量转换

 import os
import PyPDF2
import re
 
pdf_path = '.\数据PDF'
 
txt_path = '.\数据TXT'
 
pdflists = os.listdir(pdf_path)
 
for pdflist in pdflists:
    
    pdffile = pdf_path + '\\' + pdflist
    
    txtfile = txt_path + '\\' + str(re.findall('(.+).pdf',pdflist)[0]) + '.txt'
    print(txtfile)
    with open(pdffile,"rb") as pdf:
        reader=PyPDF2.PdfReader(pdf)
        text = "".join(page.extract_text() for page in reader.pages)
        with open(txtfile,'w',encoding = 'utf-8') as txt:
            txt.write(text)

posted @ 2023-10-14 18:24 kuanleung 阅读(251) 评论(0) 编辑收藏举报来源

刷新页面返回顶部

登录后才能查看或发表评论，立即登录或者逛逛博客园首页

相关博文：

· PDF转word python程序

· 批量文件转换

· python将pdf转为txt

· python将word文档转为pdf

· python docx转pdf

阅读排行：
· Manus爆火，是硬核还是营销？
· 终于写完轮子一部分：tcp代理了，记录一下
· 震惊！C++程序真的从main开始吗？99%的程序员都答错了
· 别再用vector＜bool＞了！Google高级工程师：这可能是STL最大的设计失误
· 单元测试从入门到精通

公告

昵称： kuanleung
园龄： 4年5个月
粉丝： 5
关注： 1

+加关注

2025年3月

日

一

二

三

四

五

六

kuanleung

python将pdf转为txt

公告

搜索

常用链接

我的标签

随笔分类

随笔档案

阅读排行榜

推荐排行榜

	import PyPDF2
	pdffile=").pdf"
	txtfile="(1).txt"
	with open(pdffile,"rb") as pdf:
	reader=PyPDF2.PdfReader(pdf)
	text = "".join(page.extract_text() for page in reader.pages)
	with open(txtfile,'w',encoding = 'utf-8') as txt:
	txt.write(text)

	import os
	import PyPDF2
	import re

	pdf_path = '.\数据PDF'

	txt_path = '.\数据TXT'

	pdflists = os.listdir(pdf_path)

	for pdflist in pdflists:

	pdffile = pdf_path + '\\' + pdflist

	txtfile = txt_path + '\\' + str(re.findall('(.+).pdf',pdflist)[0]) + '.txt'
	print(txtfile)
	with open(pdffile,"rb") as pdf:
	reader=PyPDF2.PdfReader(pdf)
	text = "".join(page.extract_text() for page in reader.pages)
	with open(txtfile,'w',encoding = 'utf-8') as txt:
	txt.write(text)