python 大文件读取

一般我们读取常用三个方法

.read()、.readline() 和 .readlines(),使用不好就会导致out of memory

python中用with语句打开和关闭文件,包括了抛出一个内部块异常,并且,for line in f其实是将文件对象f视为一个迭代器,自动的采用缓冲IO和内存管理,所以不必担心大文件。让系统来处理,其实是最简单的方式,交给解释器,就万事大吉了。

#If the file is line based
with open('...') as f:
    for line in f:
        process(line) # 


下面介绍两个方法
1
file = open("sample.txt")
while 1:
lines = file.readlines(1000)
if not lines:
    break
for line in lines:
    pass # do something





2
def
read_in_chunks(filePath, chunk_size=1024*1024): """ Lazy function (generator) to read a file piece by piece. Default chunk size: 1M You can set your own chunk size """ file_object = open(filePath) while True: chunk_data = file_object.read(chunk_size) if not chunk_data: break yield chunk_data if __name__ == "__main__": filePath = './path/filename' for chunk in read_in_chunks(filePath): process(chunk) #

 

posted @ 2020-04-15 14:36  红领巾下的大刀疤  阅读(229)  评论(0编辑  收藏  举报
/* 看板娘 */