python 大文件读取

一般我们读取常用三个方法

.read()、.readline() 和 .readlines()，使用不好就会导致out of memory

python中用with语句打开和关闭文件，包括了抛出一个内部块异常，并且，for line in f其实是将文件对象f视为一个迭代器，自动的采用缓冲IO和内存管理，所以不必担心大文件。让系统来处理，其实是最简单的方式，交给解释器，就万事大吉了。

#If the file is line based
with open('...') as f:
    for line in f:
        process(line) # 


下面介绍两个方法

file = open("sample.txt")
while 1:
lines = file.readlines(1000)
if not lines:
    break
for line in lines:
    pass # do something






2
def read_in_chunks(filePath, chunk_size=1024*1024):
    """
    Lazy function (generator) to read a file piece by piece.
    Default chunk size: 1M
    You can set your own chunk size 
    """
    file_object = open(filePath)
    while True:
        chunk_data = file_object.read(chunk_size)
        if not chunk_data:
            break
        yield chunk_data
if __name__ == "__main__":
    filePath = './path/filename'
    for chunk in read_in_chunks(filePath):
        process(chunk) #

posted @ 2020-04-15 14:36 红领巾下的大刀疤阅读(229) 评论(0) 编辑收藏举报

刷新页面返回顶部

红领巾下的大刀疤

Only you become stronger, to protect the people I want to protect...

python 大文件读取

一般我们读取常用三个方法

公告