PyPDF2 is a powerful Python library for working with PDF files. It provides various functionalities to manipulate and process PDFs.
Here are some of the useful objects and methods in PyPDF2:
-
PdfFileReader
:
- Represents a PDF file reader.
- Allows you to open and read an existing PDF file.
- Provides methods like
getNumPages()
to get the total number of pages in the PDF.
-
PdfFileWriter
:
- Represents a PDF file writer.
- Allows you to create a new PDF or modify an existing one.
- Provides methods like
addPage(page)
to add pages to the output PDF.
-
Reading PDF Files:
- Open a PDF file using
PdfFileReader
.
- Access individual pages using
getPage(page_number)
.
- Extract text from pages using
extractText()
.
-
Extracting PDF Metadata:
- Retrieve metadata (such as author, title, creation date) using
getDocumentInfo()
.
-
Splitting and Merging PDF Files:
- Split a PDF into separate pages using
PdfFileWriter
.
- Merge multiple PDFs into a single file using
addPage()
.
-
Adding Watermarks to PDF Files:
- Overlay text or images on existing pages using
PdfFileWriter
.
- Set transparency, position, and rotation for watermarks.
-
Encrypting and Decrypting PDF Files:
- Encrypt a PDF with a password using
encrypt(password)
.
- Decrypt an encrypted PDF using the password.
-
Rotating PDF Pages:
- Rotate pages clockwise or counterclockwise using
rotateClockwise(degrees)
or rotateCounterClockwise(degrees)
.
Remember that PyPDF2 is lightweight, easy to use, and compatible with both Python 2.x and 3.x. Explore these methods to perform various tasks on PDF files! 😊
For more details, you can refer to the official PyPDF2 documentation.123
Some examples:
To split a PDF file into two different PDF files using the PyPDF2
library in Python, you can follow these steps:
Use the following script to split the PDF:
import PyPDF2
def split_pdf(input_pdf_path, split_page_number, output_pdf_path1, output_pdf_path2):
# Open the input PDF file
with open(input_pdf_path, 'rb') as input_pdf:
# Create a PDF reader object
reader = PyPDF2.PdfReader(input_pdf)
# Create two PDF writer objects
writer1 = PyPDF2.PdfWriter()
writer2 = PyPDF2.PdfWriter()
# Add pages to the first PDF writer (from the beginning to the split page number)
for page_num in range(split_page_number):
writer1.add_page(reader.pages[page_num])
# Add pages to the second PDF writer (from the split page number to the end)
for page_num in range(split_page_number, len(reader.pages)):
writer2.add_page(reader.pages[page_num])
# Write the first PDF to a file
with open(output_pdf_path1, 'wb') as output_pdf1:
writer1.write(output_pdf1)
# Write the second PDF to a file
with open(output_pdf_path2, 'wb') as output_pdf2:
writer2.write(output_pdf2)
# Example usage
input_pdf_path = 'input.pdf' # Path to the input PDF file
split_page_number = 10 # Page number to split at (0-based index)
output_pdf_path1 = 'output_part1.pdf' # Path to the first output PDF file
output_pdf_path2 = 'output_part2.pdf' # Path to the second output PDF file
split_pdf(input_pdf_path, split_page_number, output_pdf_path1, output_pdf_path2)
Explanation:
- Opening the input PDF: The script opens the input PDF file in read-binary mode.
- Creating PDF reader and writer objects:
PyPDF2.PdfReader
reads the input PDF, and PyPDF2.PdfWriter
objects are created to write the split PDF files.
- Splitting the PDF: The script iterates through the pages of the input PDF. Pages from the beginning to the split page number are added to the first PDF writer. Pages from the split page number to the end are added to the second PDF writer.
- Writing the output PDFs: The script writes the pages collected by each PDF writer to separate output PDF files.
Notes:
- Ensure the
split_page_number
is within the range of the number of pages in the input PDF.
- The
split_page_number
is 0-based, so setting it to 10 means the first 10 pages will go to output_part1.pdf
, and the rest will go to output_part2.pdf
.
This script effectively splits an input PDF into two separate PDFs at the specified page number. Adjust the file paths and split_page_number
as needed for your specific use case.