python操作文件

python操作文件模块比较多，常用的比如json,xlrd,xlsxwriter，pandas, openpyxl，pdfminer, docx 不同类型文件有单独的操作模块

1. 使用内置函数操作文件：

　　文件读取，基础的read([size])，readline()，readlines()，区别就不用多讲，也是各有优缺点，平时也是用read比较多，读大文件就传个参数，每次读取多大字节，

　　以read为例，写了两种不同的

1 #不使用with的写法
2 f = open("E:\\a.txt")
3 data = f.read()
4 f.close()

1 #使用with
2 with open("E:\\a.txt") as file:
3      data = file.read()
4      print(data)

使用with好处：

　　with执行的时候open(path)调用后生成一个上下文管理器，调用里面的__enter__()函数，如果有as 就将函数返回值给as 后面的变量，在操作文件中就返回文件对象，对文件对象进行操作，完成以后会调用__exit__（）函数进行资源清理，操作文件就指的是关闭文件

open方法返回的这个对象，

<class '_io.TextIOWrapper'>，
是open自带生成的可迭代对象，也就是说完全不需要使用read方法，直接对这个对象进行遍历就好了

1 file = open("D:\\Community.txt",encoding="utf-8")
2 for f in file:
3     print(f)
4     print(type(f))

发现输出的f是一行数据，按行输出，类型是字符串，有点像是readline，好处是使用这种方式python会自动处理IO缓存和内存管理

1 #使用with语句进行文件读取
2 with open("E:\\a.txt") as file:
3     for f in file:
4         print(f)  #按行输出，f为str

说到这块就想起来还有个模块，就是输出固定某一行的内容，使用linecache，python的标准库

1 import linecache
2 data = linecache.getline("D:\\LICENSE-Community.txt",1)
3 print(data)

getline方法第一个参数是文件路径，第二个参数是行号，从1开始，data是字符串类型，如果这一行没有就是空字符串

2. python 读取docx文件信息

 1 import os
 2 import docx
 3 
 4 def docx_to_txt(path):
 5     '''
 6     func: docx 文件转txt文件
 7     '''
 8     newpath = path.replace(".docx",".txt")
 9     if os.path.exists(newpath):
10         os.remove(newpath)
11     if os.path.getsize(path) == 0:
12         return
13 
14     file = docx.Document(path)
15     for paragraph in file.paragraphs:
16         if paragraph.text:
17             with open(newpath, "a+", encoding="utf-8") as file:
18                 file.write(paragraph.text.strip() + "\n")
19             # print(paragraph.text)
20 
21     file = docx.Document(path)
22     if file.tables:
23         for table in file.tables:
24             row_count = len(table.rows)
25             colu_count = len(table.columns)
26             for i in range(row_count):
27                 for j in range(colu_count):
28                     with open(newpath, "a+", encoding="utf-8") as file:
29                         file.write(table.cell(i, j).text.strip().replace("\n"," ") + "\t")
30                 with open(newpath, "a+", encoding="utf-8") as file:
31                     file.write("\n")

posted @ 2019-09-09 22:37 今日店休阅读(167) 评论(0) 编辑收藏举报

刷新页面返回顶部

今日店休

python操作文件

公告