python 大文件读取
一般我们读取常用三个方法
.read()、.readline() 和 .readlines(),使用不好就会导致out of memory
python中用with语句打开和关闭文件,包括了抛出一个内部块异常,并且,for line in f其实是将文件对象f视为一个迭代器,自动的采用缓冲IO和内存管理,所以不必担心大文件。让系统来处理,其实是最简单的方式,交给解释器,就万事大吉了。
#If the file is line based with open('...') as f: for line in f: process(line) #
下面介绍两个方法
1
file = open("sample.txt") while 1: lines = file.readlines(1000) if not lines: break for line in lines: pass # do something
2
def read_in_chunks(filePath, chunk_size=1024*1024): """ Lazy function (generator) to read a file piece by piece. Default chunk size: 1M You can set your own chunk size """ file_object = open(filePath) while True: chunk_data = file_object.read(chunk_size) if not chunk_data: break yield chunk_data if __name__ == "__main__": filePath = './path/filename' for chunk in read_in_chunks(filePath): process(chunk) #