Python之文件操作
文件基本操作分为读、写、修改及其他操作,在以下几个模式中进行:
1.读模式
f = open(file = 'file.txt', mode = 'r', encoding = 'utf-8') f.read() #可以用参数控制读几个,下次再调用read()的时候接着光标的位置读取 f.readline() # 读取一行,结尾默认带一个"\n" f.close() # encoding中,文件是什么编码存的,就用什么编码读。(如果不知道编码,可以通过第三方模块chardet尝试查找) # mode中,r改为rb则变成二进制模式打开
f.read()将文件全部内容读成一个字符串,若需要分行读的时候,用for循环
for line in f: print(line) # 此处print会自动加一个‘\n’
2.写模式
f = open(file = 'file.txt', mode = 'w', encoding = 'utf-8') f.write('A') # A只能是一个字符串,可以用占位符多个输出 # w模式下,首先会清空原文件内容,再根据代码重新写文件 # w模式下,若文件不存在,会根据文件名自动创建该文件
3.追加模式(a)
f = open(file = 'file.txt', mode = 'a') f.write('A') # A默认追加在文件的最后
4.混合模式
a.读写模式(r+)
f = open(file = 'file.txt', mode = 'r+') f.read() f.write() #read()读过的信息将不会再次读取(相当于一个光标,读完后移到最后)
b.写读模式(w+)
f = open(file = 'file.txt', mode = 'w+') f.read() f.write() f.seek() f.read(0) # 第一个read将不会读到任何东西,第二个read时,需要将光标调整到最开头,之后才会读取到write里写的东西,该模式默认清空文件内容后重新写内容
c.追加读模式(a+)
f = open(file = 'file.txt', mode = 'a+') f.seek(0) # a+模式下打开文件,光标默认在最后 f.read() f.seek(0) f.write() # a+模式下,无论光标怎么调,都只能在最后添加
文件操作
所有的文件操作都将在以上几种模式中进行,以上介绍了文件的读、写、追加等操作,接下来根据源码介绍下其他用法:
1 class file(object): 2 3 def close(self): # real signature unknown; restored from __doc__ 4 关闭文件 5 6 """close() -> None or (perhaps) an integer. Close the file. 7 8 Sets data attribute .closed to True. A closed file cannot be used for 9 further I/O operations. close() may be called more than once without 10 error. Some kinds of file objects (for example, opened by popen()) 11 may return an exit status upon closing. 12 """ 13 14 def fileno(self): # real signature unknown; restored from __doc__ 15 文件描述符 16 17 """fileno() -> integer "file descriptor". 18 19 This is needed for lower-level file interfaces, such os.read(). """ 20 21 return 0 22 23 def flush(self): # real signature unknown; restored from __doc__ 24 刷新文件内部缓冲区 25 26 """ flush() -> None. Flush the internal I/O buffer. """ 27 28 pass 29 30 def isatty(self): # real signature unknown; restored from __doc__ 31 判断文件是否是同意tty设备 32 33 """ isatty() -> true or false. True if the file is connected to a tty device. """ 34 35 return False 36 37 def next(self): # real signature unknown; restored from __doc__ 38 获取下一行数据,不存在,则报错 39 40 """ x.next() -> the next value, or raise StopIteration """ 41 42 pass 43 44 45 46 def read(self, size=None): # real signature unknown; restored from __doc__ 47 读取指定字节数据 48 49 """read([size]) -> read at most size bytes, returned as a string. 50 51 If the size argument is negative or omitted, read until EOF is reached. 52 Notice that when in non-blocking mode, less data than what was requested 53 may be returned, even if no size parameter was given.""" 54 55 pass 56 57 def readinto(self): # real signature unknown; restored from __doc__ 58 读取到缓冲区,不要用,将被遗弃 59 60 """ readinto() -> Undocumented. Don't use this; it may go away. """ 61 62 pass 63 64 65 def readline(self, size=None): # real signature unknown; restored from __doc__ 66 仅读取一行数据 67 """readline([size]) -> next line from the file, as a string. 68 69 Retain newline. A non-negative size argument limits the maximum 70 number of bytes to return (an incomplete line may be returned then). 71 Return an empty string at EOF. """ 72 73 pass 74 75 def readlines(self, size=None): # real signature unknown; restored from __doc__ 76 读取所有数据,并根据换行保存值列表 77 78 """readlines([size]) -> list of strings, each a line from the file. 79 80 Call readline() repeatedly and return a list of the lines so read. 81 The optional size argument, if given, is an approximate bound on the 82 total number of bytes in the lines returned. """ 83 84 return [] 85 86 87 88 def seek(self, offset, whence=None): # real signature unknown; restored from __doc__ 89 指定文件中指针位置 90 """seek(offset[, whence]) -> None. Move to new file position. 91 92 Argument offset is a byte count. Optional argument whence defaults to 93 0 (offset from start of file, offset should be >= 0); other values are 1 94 (move relative to current position, positive or negative), and 2 (move 95 relative to end of file, usually negative, although many platforms allow 96 seeking beyond the end of a file). If the file is opened in text mode, 97 only offsets returned by tell() are legal. Use of other offsets causes 98 undefined behavior. 99 Note that not all file objects are seekable. """ 100 101 pass 102 103 104 105 def tell(self): # real signature unknown; restored from __doc__ 106 获取当前指针位置 107 108 """ tell() -> current file position, an integer (may be a long integer). """ 109 pass 110 111 112 def truncate(self, size=None): # real signature unknown; restored from __doc__ 113 截断数据,仅保留指定之前数据 114 115 """ truncate([size]) -> None. Truncate the file to at most size bytes. 116 117 Size defaults to the current file position, as returned by tell().""" 118 119 pass 120 121 122 123 def write(self, p_str): # real signature unknown; restored from __doc__ 124 写内容 125 126 """write(str) -> None. Write string str to file. 127 128 Note that due to buffering, flush() or close() may be needed before 129 the file on disk reflects the data written.""" 130 131 pass 132 133 def writelines(self, sequence_of_strings): # real signature unknown; restored from __doc__ 134 将一个字符串列表写入文件 135 """writelines(sequence_of_strings) -> None. Write the strings to the file. 136 137 Note that newlines are not added. The sequence can be any iterable object 138 producing strings. This is equivalent to calling write() for each string. """ 139 140 pass 141 142 143 144 def xreadlines(self): # real signature unknown; restored from __doc__ 145 可用于逐行读取文件,非全部 146 147 """xreadlines() -> returns self. 148 149 For backward compatibility. File objects now include the performance 150 optimizations previously implemented in the xreadlines module. """ 151 152 pass 153 154 file Code
常用操作:
1.fileno 返回内核中的索引值,在做IO多路复用时用到
2.flush 将内存中的东西强制写入硬盘
3.readable 判断是否可读
4.readline 只读一行,遇到\r或\n停止
5.tell 返回指针(光标)位置(字节)
6.seek:
当一个参数时,将光标移到指定位置(此处括号内数字代表字节数) #在read()操作中,括号中的数字代表读的字符
当有两个参数时:
seek(0, 1):表示光标调整到当前位置
seek(0, 0):表示光标调整到开始位置
seek(0, 2):表示光标调整到末尾位置
7.truncate 从光标处开始截断,后面的删去 # 如果truncate()中有值,将从头开始数字节截断
*文件修改操作
由于文件存储的特殊性,因此修改文件内的内容比较特殊,思路如下:
先使用seek()操作将光标移动到要修改的位置,然后再write()写入指定内容,但是!!只能覆盖原来的内容,而不能插入内容。若想插入,只能打开两个文件,用readline()方法,一边插入一边保存到新文件中!
举一个栗子实现修改文件中的内容,将联系方式表中所有兰州的人改为北京,代码实现如下:
原文件内容:
# 占硬盘的方式修改 name = '联系方式' new_name = '%s_new' % name f = open('%s.txt' % name, 'r', encoding='utf-8') f_new = open('%s.txt' % new_name, 'w' ,encoding='utf-8') old_str = '兰州' new_str = '北京' for line in f: # 通过循环每一行判断是否有需要替换的内容 if old_str in line: line = line.replace(old_str, new_str) f_new.write(line) f.close() f_new.close()
此方法思路是逐行将文件内容检索(替换),每检索完一行将内容写入新的文件(联系方式_new)中,直至全部完成。
运行结果:
另外一种思路,是将全部内容保存在内存中,修改完成后输出,代码如下:
# 占内存的方式修改 old_str = '兰州' new_str = '北京' f = open('联系方式.txt', 'r+', encoding='utf-8') data = f.read() # 将文件中的内容全部以字符串的形式保存在data中 data = data.replace(old_str, new_str) f.seek(0) # 替换完成后,将光标移到文件的开始,覆盖原先的内容 f.write(data) f.close()
运行结果如下:
此方法相比第一种方法存在两个问题:
1.当str与new_str字数不一致时,文件结尾位置会出现bug,这与文件在硬盘中保存的方式有关。
2.当文件过大时,全部读入内存会拖累计算机运行的速度,甚至将内存撑爆。