[1000] Extract specific pages, split PDF files, add pages from different PDF files

PyPDF2 is a powerful Python library for working with PDF files. It provides various functionalities to manipulate and process PDFs.

Here are some of the useful objects and methods in PyPDF2:

PdfFileReader:
- Represents a PDF file reader.
- Allows you to open and read an existing PDF file.
- Provides methods like getNumPages() to get the total number of pages in the PDF.
PdfFileWriter:
- Represents a PDF file writer.
- Allows you to create a new PDF or modify an existing one.
- Provides methods like addPage(page) to add pages to the output PDF.
Reading PDF Files:
- Open a PDF file using PdfFileReader.
- Access individual pages using getPage(page_number).
- Extract text from pages using extractText().
Extracting PDF Metadata:
- Retrieve metadata (such as author, title, creation date) using getDocumentInfo().
Splitting and Merging PDF Files:
- Split a PDF into separate pages using PdfFileWriter.
- Merge multiple PDFs into a single file using addPage().
Adding Watermarks to PDF Files:
- Overlay text or images on existing pages using PdfFileWriter.
- Set transparency, position, and rotation for watermarks.
Encrypting and Decrypting PDF Files:
- Encrypt a PDF with a password using encrypt(password).
- Decrypt an encrypted PDF using the password.
Rotating PDF Pages:
- Rotate pages clockwise or counterclockwise using rotateClockwise(degrees) or rotateCounterClockwise(degrees).

Remember that PyPDF2 is lightweight, easy to use, and compatible with both Python 2.x and 3.x. Explore these methods to perform various tasks on PDF files! 😊

For more details, you can refer to the official PyPDF2 documentation.¹ ² ³

To split a PDF file into two different PDF files using the PyPDF2 library in Python, you can follow these steps:

Use the following script to split the PDF:

import PyPDF2

def split_pdf(input_pdf_path, split_page_number, output_pdf_path1, output_pdf_path2):
    # Open the input PDF file
    with open(input_pdf_path, 'rb') as input_pdf:
        # Create a PDF reader object
        reader = PyPDF2.PdfReader(input_pdf)
        
        # Create two PDF writer objects
        writer1 = PyPDF2.PdfWriter()
        writer2 = PyPDF2.PdfWriter()
        
        # Add pages to the first PDF writer (from the beginning to the split page number)
        for page_num in range(split_page_number):
            writer1.add_page(reader.pages[page_num])
        
        # Add pages to the second PDF writer (from the split page number to the end)
        for page_num in range(split_page_number, len(reader.pages)):
            writer2.add_page(reader.pages[page_num])
        
        # Write the first PDF to a file
        with open(output_pdf_path1, 'wb') as output_pdf1:
            writer1.write(output_pdf1)
        
        # Write the second PDF to a file
        with open(output_pdf_path2, 'wb') as output_pdf2:
            writer2.write(output_pdf2)

# Example usage
input_pdf_path = 'input.pdf'           # Path to the input PDF file
split_page_number = 10                 # Page number to split at (0-based index)
output_pdf_path1 = 'output_part1.pdf'  # Path to the first output PDF file
output_pdf_path2 = 'output_part2.pdf'  # Path to the second output PDF file

split_pdf(input_pdf_path, split_page_number, output_pdf_path1, output_pdf_path2)

Explanation:

Opening the input PDF: The script opens the input PDF file in read-binary mode.
Creating PDF reader and writer objects: PyPDF2.PdfReader reads the input PDF, and PyPDF2.PdfWriter objects are created to write the split PDF files.
Splitting the PDF: The script iterates through the pages of the input PDF. Pages from the beginning to the split page number are added to the first PDF writer. Pages from the split page number to the end are added to the second PDF writer.
Writing the output PDFs: The script writes the pages collected by each PDF writer to separate output PDF files.

Notes:

Ensure the split_page_number is within the range of the number of pages in the input PDF.
The split_page_number is 0-based, so setting it to 10 means the first 10 pages will go to output_part1.pdf, and the rest will go to output_part2.pdf.

This script effectively splits an input PDF into two separate PDFs at the specified page number. Adjust the file paths and split_page_number as needed for your specific use case.

posted on 2024-05-24 11:56 McDelfino 阅读(17) 评论(0) 编辑收藏举报

刷新页面返回顶部

alex_bn_lee

导航

公告