mongodb ISODate问题(大量数据update优化)
问题描述:
上周有个需求,把mongodb中birthday (ISO日期格式) 转换成北京时间,并保存成string类型。
最初思路:
遍历查找出的结果,逐个加8小时,然后通过_id逐个去update_one。但是发现这种方式效率太低了,一分钟才能更新一千五百条数据。
# -*- coding: utf-8 -*- from pymongo import MongoClient import time import datetime def init(): myclient = MongoClient(host='*.*.*.*', port=27117) # admin为登陆校验库 auth_db = myclient.admin auth_db.authenticate('admin', '1234') # 连接库 mydb = myclient['SR_123'] mycol = mydb['Data_170724145755025001_0305_copy4'] print(time.time()) count = 0 # 只取有birthday的数据,且只返回_id和birthday for x in mycol.find({'birthday':{'$exists': True}},{'_id','birthday'}): count = count + 1 print(count) _time = x.get('birthday') _time += datetime.timedelta(hours=8) timeStr = _time.strftime("%Y%m%d") _id = x.get('_id') query1 = {"_id": _id} newvalues = {"$set": {"birthdayStr": timeStr}} mycol.update_one(query1, newvalues) print(time.time()) init()
新的思路:
通过birthday去重,然后通过birthday去update_many。
for x in mycol.find().distinct('birthday'): _time1 = x + datetime.timedelta(hours=8) timeStr = _time1.strftime("%Y%m%d") query1 = {"birthday": x} newvalues = {"$set": {"birthdayStr": timeStr}} mycol.update_many(query1, newvalues)
最终思路:
利用mongodb 的批量操作bulk_write,并且控制每次批量的数量,避免拼接过长报错。最终发现这种方式,当数据多的时候,效率远比上面两种方式快。
# -*- coding: utf-8 -*- from pymongo import MongoClient from pymongo import UpdateOne import time import datetime def init(): myclient = MongoClient(host='*.*.*.*', port=27117) # admin为登陆校验库 auth_db = myclient.admin auth_db.authenticate('admin', '1234') # 连接库 mydb = myclient['SR_123'] mycol = mydb['Data_170724145755025001_0305_copy4'] print(time.time()) testList = [] i = 0 aa=mycol.find({'birthday': {'$exists': True}},{'_id','birthday'}) for x in aa: i = i + 1; _time = x.get('birthday') _time += datetime.timedelta(hours=8) try: timeStr = _time.strftime("%Y%m%d") except: print(x) _id = x.get('_id') query1 = {"_id": _id} newvalues = {"$set": {"birthdayStr": r''+timeStr}} op = UpdateOne(query1, newvalues, upsert=True) testList.append(op) if( i % 50 == 0): print(i) mycol.bulk_write(testList, ordered=False, bypass_document_validation=True) testList.clear() init()