python操作pdf

1、安装PyPDF2和pdfplumber库介绍

PyPDF2 可以更好的读取、写入、分割、合并 PDF 文件;
pdfplumber 可以更好地读取 PDF 文件内容和提取 PDF 中的表格;

2、利用pdfplumber提取文字

import pdfplumber,PyPDF2
with pdfplumber.open("python.pdf") as f:
    page = f.pages[0] # 选择打开哪一页
    print(page.extract_text()) # 提取页面上的文字

3、利用pdfplumber提取表格并写入excel

# extract_table():如果一页有一个表格
# extract_tables():如果一页有多个表格
import pdfplumber,PyPDF2
from openpyxl import Workbook
with pdfplumber.open("python.pdf") as f:
    page = f.pages[0]
    table = page.extract_table()
    workbook = Workbook()
    sheet = Workbook.active
    for row in table:
        sheet.append(row)
    workbook.save(filename="new_pdf.xlsx")

4、PDF合并及页面的排序和旋转
4.1 合并pdf

from PyPDF2 import PdfFileReader, PdfFileWriter
pdf_writer = PdfFileWriter() 
for i in range(1,len(os.listdir(r"G:\6Tipdm\7python 办公自动\concat_pdf"))+1):
    print(i*50+1,(i+1)*50)
    pdf_reader = PdfFileReader("G:\\6Tipdm\\7python 办公自动化\\concat_pdf\{}-
{}.pdf".format(i*50+1,(i+1)*50))
    for page in range(pdf_reader.getNumPages()):
        pdf_writer.addPage(pdf_reader.getPage(page))
with open("G:\\6Tipdm\\7python 办公自动化\\concat_pdf\merge.pdf", "wb") as out:
    pdf_writer.write(out) 

4.2 拆分pdf

from PyPDF2 import PdfFileReader, PdfFileWriter
pdf_reader = PdfFileReader(r"G:\6Tipdm\7python 办公自动化\concat_pdf\时间序
列.pdf")
for page in range(pdf_reader.getNumPages()):
    pdf_writer = PdfFileWriter()
    pdf_writer.addPage(pdf_reader.getPage(page))
    with open(f"G:\\6Tipdm\\7python 办公自动化\\concat_pdf\\{page}.pdf","wb") as out: pdf_writer.write(out) 
posted @ 2020-09-10 16:18  P-Z-W  阅读(203)  评论(0编辑  收藏  举报