Python 从大型csv文件中提取感兴趣的行

处理一个2.xG 大小的 csv文件,文件太大,不宜一次性读入内存,可以使用open迭代器。

with open(filename,'r') as file
     # 按行读取
     for line in file:
        process

或者简单点

for line in open('myfile.txt','r'):
     pass

需求是,提取时间在指定时间段的数据,另存一个文件。

全部代码如下

def is_between_time(str, start, end):
    """
    :param str: a line in data file :  8684496663,粤BC5948,2016-01-01 22:01:56,114.083448,22.531582,225,0,0,0,114075022530,114070022530,114.078316,22.534267,1463910,2016-01-01 22:25:59.772000
    :param start: start point for example: 21:57:00
    :param end: end point for example: 22:03:00
    :return:
    """
    fields = str.split(',')
    datetime = fields[2]
    time = datetime.split(' ')[1]
    if time > start and time < end:
        return True
    else:
        return False


file_to_read_path = "E:/P_CZCGPS_20160101.csv"
file_to_write = open("E:/result.csv", 'w')

# read file and process
with open(file_to_read_path,'r') as file:
    for line in file:
        if is_between_time(line, "21:57:00", "22:03:00"):
            print(line)
            file_to_write.write(line)

file_to_write.close()

1024节日快乐!

posted @ 2019-10-24 10:23  行者孙  阅读(2775)  评论(0编辑  收藏  举报