mongodb ISODate问题(大量数据update优化)

问题描述:

  上周有个需求,把mongodb中birthday (ISO日期格式) 转换成北京时间,并保存成string类型。

最初思路:

  遍历查找出的结果,逐个加8小时,然后通过_id逐个去update_one。但是发现这种方式效率太低了,一分钟才能更新一千五百条数据。

  

# -*- coding: utf-8 -*-
from pymongo import MongoClient
import time
import datetime


def init():
    myclient = MongoClient(host='*.*.*.*', port=27117)
    # admin为登陆校验库
    auth_db = myclient.admin
    auth_db.authenticate('admin', '1234')
    # 连接库
    mydb = myclient['SR_123']
    mycol = mydb['Data_170724145755025001_0305_copy4']
    print(time.time())
    count = 0
    # 只取有birthday的数据,且只返回_id和birthday
    for x in mycol.find({'birthday':{'$exists': True}},{'_id','birthday'}):
        count = count + 1
        print(count)
        _time = x.get('birthday')
        _time += datetime.timedelta(hours=8)
        timeStr = _time.strftime("%Y%m%d")
        _id = x.get('_id')
        query1 = {"_id": _id}
        newvalues = {"$set": {"birthdayStr": timeStr}}
        mycol.update_one(query1, newvalues)
    print(time.time())

init()
View Code

 

 

 

新的思路:  

  通过birthday去重,然后通过birthday去update_many

 for x in mycol.find().distinct('birthday'):
        _time1 = x + datetime.timedelta(hours=8)
        timeStr = _time1.strftime("%Y%m%d")
        query1 = {"birthday": x}
        newvalues = {"$set": {"birthdayStr": timeStr}}
        mycol.update_many(query1, newvalues)

  

最终思路:

  利用mongodb 的批量操作bulk_write,并且控制每次批量的数量,避免拼接过长报错。最终发现这种方式,当数据多的时候,效率远比上面两种方式快

# -*- coding: utf-8 -*-
from pymongo import MongoClient
from pymongo import UpdateOne
import time
import datetime



def init():
    myclient = MongoClient(host='*.*.*.*', port=27117)
    # admin为登陆校验库
    auth_db = myclient.admin
    auth_db.authenticate('admin', '1234')
    # 连接库
    mydb = myclient['SR_123']
    mycol = mydb['Data_170724145755025001_0305_copy4']
    print(time.time())
    testList = []
    i = 0
    aa=mycol.find({'birthday': {'$exists': True}},{'_id','birthday'})
    for x in aa:
        i = i + 1;
        _time = x.get('birthday')
        _time += datetime.timedelta(hours=8)
        try:
            timeStr = _time.strftime("%Y%m%d")
        except:
            print(x)
        _id = x.get('_id')
        query1 = {"_id": _id}
        newvalues = {"$set": {"birthdayStr": r''+timeStr}}
        op = UpdateOne(query1, newvalues, upsert=True)
        testList.append(op)
        if( i % 50 == 0):
            print(i)
            mycol.bulk_write(testList, ordered=False, bypass_document_validation=True)
            testList.clear()


init()

  

 

posted @ 2019-04-22 18:16  无敌小坑笔  阅读(1444)  评论(0编辑  收藏  举报