Python处理Word，Excel，PDF

合集 - 基础(24)

1.爬虫 - Scrapy框架安装使用12023-04-07 2.数据库 - MongoDB基础语法22023-05-14 3.数据库 - MongoDB基础知识12023-05-13 4.数据库 - MongoDB安装2023-04-30 5.文档数据存储2023-04-24 6.爬虫 - Scrapy框架安装使用22023-07-01 7.数据库 - MySql语句2023-07-14 8.爬虫 - 基础类总集2023-07-15 9.K8S安装流程2023-07-18 10.Docker简介安装2023-07-20 11.Docker常用命令2023-07-20 12.Docker容器数据卷2023-07-20 13.DockerFile2023-07-20 14.爬虫 - Request库简介2023-07-30 15.爬虫 - Selenium简介2023-08-07

16.Python处理Word，Excel，PDF2024-02-22

17.数据分析 - 理论简介2024-02-26 18.数据分析 - NumPy模块2024-03-16 19.数据分析 - Pandas模块2024-05-07 20.数据分析 - Matplotlib模块2024-04-02 21.爬虫 - 网页解析库2024-07-09 22.Android自动化 - 环境准备2024-07-19 23.Android自动化 - 基础总集2024-07-19 24.Python一些简单基础的模板化语法2024-08-19

openpyxl模块处理Excel表

安装

以下命令意思是：指定D盘下的Python解释器用豆瓣的源安装openpyxl模块

D:\PycharmProjects\Study\venv\Scripts\python.exe -m pip install openpyxl -i http://pypi.douban.com/simple --trusted-host=pypi.douban.com

基本概念

openpyxl库有三大模块组成，分别为：Workbook、Sheet、Cell

Workbook：工作簿，一个excel文件包含一个工作簿(Workbook)
Worksheet：工作表，一个工作簿(Workboot)包含多个工作表(Worksheet)，由多个单元格组成
Cell：单元格，一个工作表(Worksheet)包含多个单元格(Cell)，单元格只存储两种数据类型：数字和字符串，除了纯数字之外，其余的都为字符串

使用

创建工作簿（Excel）

from openpyxl import Workbook

new_excel = Workbook()  ####创建一个新的工作簿
new_excel.save('new.xlsx')   ####保存在当前目录下
new_excel.save('C:/Users/Administrator/Desktop/new.xlsx')   ####保存到桌面

操作工作簿（Excel）

load_workbook()函数会返回表格对象，该对象可以看成Excel文件本身()
```
 import openpyxl
 testexcel = openpyxl.load_workbook('test.xlsx')   ###加载excel文件
 testexcel.close()   ###关闭工作簿
```
- testexcel.active
  获取电子表格Worksheet是否有数据
- testexcel.encoding
  获取表格编码
- testexcel.properties
  获取创建时间，修改时间等
- testexcel.worksheets
  获取Excel表中所有的工作表对象
- testexcel.sheetnames
  获取工作表名称
- testexcel.create_sheet('test')
  在工作簿中创建新的工作表，要保存
- testexcel.copy_worksheet(testexcel['test']
  复制工作表
- testexcel.save('test.xlsx')
  创建，复制之后保存到硬盘
操作工作表（sheet）

获取工作表对象
```
import openpyxl

testexcel = openpyxl.load_workbook('test.xlsx')
### 通过工作表列表位置获取，建议这种
testsheet = testexcel.worksheets[2]
### 直接通过名称获取
# testsheet = testexcel['2022']
testexcel.close()   ###关闭工作簿
```
- testsheet.title
  获取工作表标题
- testsheet.dimensions
  获取表格大小
- testsheet.max_row
  最大行数（min_row 最小行数）
- testsheet.min_column
  最小列数 (max_colum最大列数)
- testsheet.rows
  按行获取单元格对象（testsheet.columns 按列获取单元格对象）
- testsheet.freeze_panes = 'A2'
  冻结A2前面的行和列（基本相当于冻结首行），要保存
- testsheet.values
  按照行获取数据，生成器
- testsheet.iter_rows
  按行获取，也是一个生成器，里面是一个个的表格对象（iter_columns()按列获取）
- testsheet.merge_cells('A1:D4')
  合并表中A1到D4的单元格，要保存
- testsheet.unmerge_cells('A1:D4')
  取消合并A1到D4的单元格，要保存
- testsheet.append(['数据1','数据2'])
  在工作表新增一行数据

操作表格数据（cell）

import openpyxl

testexcel = openpyxl.load_workbook('test.xlsx')
testsheet = testexcel.worksheets[3]
testcell = testsheet['A5']
print(testcell.value)   ###获取单元格的数据
print(testcell.row)   ###获取行数
print(testcell.column)   ###获取列数
testsheet['A2'].value = 'test'   ###修改A2单元格的数据为test
testexcel.save('test.xlsx')   ###保存
testexcel.close()

python-docx模块处理Word文档

安装

D:\PycharmProjects\Study\venv\Scripts\python.exe -m pip install python-docx -i http://pypi.douban.com/simple --trusted-host=pypi.douban.com

基本概念

Document：文档
Paragraph：段落
Tables：表格
Sections：节，集合
Styles：样式
Inline_shapes：内置图形
Run：文字块，颜色，字体，粗细，斜体不同，就是不同的文字块

使用

创建文档（doc）

from docx import Document

new_doc = Document()
new_doc.save('test.docx')

操作文档

from docx import Document

testdoc = Document('test.docx')
testdoc.save('test.docx')    ###保存

操作段落
- testdoc.paragraphs
  获取所有段落，列表格式，里面是一个个对象
- testdoc.add_paragraph('新增第一个段落')
  新增段落
- testdoc.add_heading(text='新增标题1',level=0)
  新添加到文档末尾的标题段落
操作文字块
- testdoc.paragraphs[2].runs
  查看文字块
- testdoc.paragraphs[2].add_run('加粗字体').bold=True
  新增加粗字体
- testdoc.paragraphs[2].add_run('，普通字体')
  新增普通字体
- testdoc.paragraphs[2].add_run('，斜体').italic=True
  新增斜体

pdfplumber模块处理PDF

安装

D:\PycharmProjects\Study\venv\Scripts\python.exe -m pip install pdfplumber -i http://pypi.douban.com/simple --trusted-host=pypi.douban.com

使用

打开pdf文档

import pdfplumber
'''无打开密码'''
with pdfplumber.open('C:/Users/Administrator/Desktop/tes1.pdf') as pdf:
    print(pdf.pages)
'''有打开密码'''
pdf_path = 'D:/Tencent/WeChat/WeChat Files/wxid_36x0toqnut7i22/FileStorage/MsgAttach/9e20f478899dc29eb19741386f9343c8/File/2023-12/'
with pdfplumber.open(pdf_path+'tes1.pdf',password='Qianfan1025') as pdf :
    print(pdf.pages)

查看pdf文件的元数据
- pdf.metadata

查看pdf所有页面

pdf.pages
一个包含该 pdf 所有页面对象（pdfplumber.Page）的列表
查看pdf转文档

import pdfplumber

with pdfplumber.open('C:/Users/Administrator/Desktop/tes1.pdf') as pdf:
    for num in range(len(pdf.pages)):
        page = pdf.pages[num]
        text = page.extract_text()
        print(text)

posted @ 2024-02-22 17:01 水开白阅读(48) 评论(0) 编辑收藏举报

刷新页面返回顶部

登录后才能查看或发表评论，立即登录或者逛逛博客园首页

相关博文：

· 数据分析 - Matplotlib模块

· 文档数据存储

· Python模块学习 - openpyxl

· Python openpyxl 之 Excel 文档简单操作

阅读排行：
· PowerShell开发游戏 · 打蜜蜂
· 在鹅厂做java开发是什么体验
· 百万级群聊的设计实践
· WPF到Web的无缝过渡：英雄联盟客户端的OpenSilver迁移实战
· 永远不要相信用户的输入：从 SQL 注入攻防看输入验证的重要性

公告

昵称：水开白
园龄： 3年7个月
粉丝： 3
关注： 0

+加关注

2025年2月

日

一

二

三

四

五

六

一个葫芦瓢啊

配图有陈贝拉，小乖，多多等等可爱宠物

Python处理Word，Excel，PDF

openpyxl模块处理Excel表

安装

基本概念

使用

创建工作簿（Excel）

操作工作簿（Excel）

操作工作表（sheet）

操作表格数据（cell）

python-docx模块处理Word文档

安装

基本概念

使用

pdfplumber模块处理PDF

安装

使用

打开pdf文档

查看pdf文件的元数据

查看pdf所有页面

公告

搜索

常用链接

合集

随笔分类

随笔档案

文章分类

相册

阅读排行榜