MongoDB源码分析——mongod数据查询操作

源码版本为MongoDB 2.6分支

~~Edit~~

mongod数据查询操作

在mongod的初始化过程中说过，服务端接收到客户端消息后调用MyMessageHandler::process函数处理消息。

class MyMessageHandler : public MessageHandler {
public:
    ...
    virtual void process( Message& m , AbstractMessagingPort* port , LastError * le) {
        while ( true ) {
            ...

            DbResponse dbresponse;
            try {
                assembleResponse( m, dbresponse, port->remote() );
            }
            catch ( const ClockSkewException & ) {
                log() << "ClockSkewException - shutting down" << endl;
                exitCleanly( EXIT_CLOCK_SKEW );
            }
            ...
        }
    }
};

DbResponse dbresponse;封装了服务器处理消息后的响应数据。在进入数据处理分析之前先看一个枚举类型Operations ，Operations表示了所有MongoDB的操作类型 :

enum Operations {
    opReply = 1,     /* reply. responseTo is set. */
    dbMsg = 1000,    /* generic msg command followed by a string */
    dbUpdate = 2001, /* update object */
    dbInsert = 2002, //数据插入
    //dbGetByOID = 2003,
    dbQuery = 2004,  //数据查询
    dbGetMore = 2005, //可能是数据同步
    dbDelete = 2006, //数据删除
    dbKillCursors = 2007 //关闭cursor
};

Message对象中封装了当前message的操作类型。之后本篇文章只分析dbQuery 部分，其他部分将会在其他文章中分析。
可以看到process中调用了assembleResponse来处理消息并封装响应对象（DbResponse dbresponse）,下面部分我们将分析assembleResponse函数:

int op = m.operation();
    bool isCommand = false;

    DbMessage dbmsg(m);

    if ( op == dbQuery ) {
        const char *ns = dbmsg.getns();

        if (strstr(ns, ".$cmd")) {
            isCommand = true;
            opwrite(m);
            if( strstr(ns, ".$cmd.sys.") ) {
                if( strstr(ns, "$cmd.sys.inprog") ) {
                    inProgCmd(m, dbresponse);
                    return;
                }
                if( strstr(ns, "$cmd.sys.killop") ) {
                    killOp(m, dbresponse);
                    return;
                }
                if( strstr(ns, "$cmd.sys.unlock") ) {
                    unlockFsync(ns, m, dbresponse);
                    return;
                }
            }
        }
        else {
            opread(m); //如果不是命令则记录日志
        }
    }
    ...

在阅读上面的代码之前首先要了解MongoDB源码里面的一个概念——namespace(缩写ns)，一个ns代表一个collection和对应的db，一般表示为:”db name” + “.” + “collection name”，如果ns名称中包含“.$cmd”则表示当前操作为一个命令。所以上面代码先判断了是否为数据库命令，如果是则处理，然后返回。

    // Increment op counters.
    switch (op) {
    case dbQuery:
        if (!isCommand) {
            //增加查询操作计数，暂时没发现有什么作用~
            globalOpCounters.gotQuery();
        }
        else {
            // Command counting is deferred, since it is not known yet whether the command
            // needs counting.
        }
        break;
        ...
    }

    ...
    //进入正题，查询数据
    if ( op == dbQuery ) {
        if ( handlePossibleShardedMessage( m , &dbresponse ) )
            return;
        receivedQuery(c , dbresponse, m );
    }

之前的这些代码都只是做了一下操作分发操作，就是把不同的操作请求分配给相应的函数去处理，而查询请求则由receivedQuery函数处理。

static bool receivedQuery(Client& c, DbResponse& dbresponse, Message& m ) {
    ...
    DbMessage d(m);
    QueryMessage q(d);
    auto_ptr< Message > resp( new Message() );

    CurOp& op = *(c.curop());

    try {
        NamespaceString ns(d.getns());
        cout << "receivedQuery NamespaceString : " << d.getns() << endl;

        if (!ns.isCommand()) {
            //查询权限认证
            // Auth checking for Commands happens later.
            Client* client = &cc();
            Status status = client->getAuthorizationSession()->checkAuthForQuery(ns, q.query);
            audit::logQueryAuthzCheck(client, ns, q.query, status.code());
            uassertStatusOK(status);
        }
        dbresponse.exhaustNS = newRunQuery(m, q, op, *resp);
        verify( !resp->empty() );
    }
    catch (...)
    {
        ...
    }
    ...

    return ok;
}

receivedQuery主要分为两部分，第一部分是查询操作，第二部分是操作结果处理(这一部分我给省略了)，可以看到，进行查询操作前先进行了查询操作认证，如果当前用户对这个集合没有权限则会抛出异常。如果认证通过则会调用newRunQuery函数进行查询。

/**
  * Run the query 'q' and place the result in 'result'.
  */
std::string newRunQuery(Message& m, QueryMessage& q, CurOp& curop, Message &result);

接下来才是查询操作的重头戏，整个过程包括数据的加载，查询命令解析，集合数据扫描匹配等步骤，由于目前对MongoDB的还不是很熟悉，很多地方我个人还是理解不了，所以具体的数据扫描匹配细节会暂时略过，先分析查找流程，具体细节以后深入之后再学习。

    const NamespaceString nsString(ns);
    uassert(16256, str::stream() << "Invalid ns [" << ns << "]", nsString.isValid());

    // Set curop information.
    curop.debug().ns = ns;
    curop.debug().ntoreturn = q.ntoreturn;
    curop.debug().query = q.query;
    curop.setQuery(q.query);

    // If the query is really a command, run it.
    if (nsString.isCommand()) {
        int nToReturn = q.ntoreturn;
        uassert(16979, str::stream() << "bad numberToReturn (" << nToReturn
                                     << ") for $cmd type ns - can only be 1 or -1",
                nToReturn == 1 || nToReturn == -1);

        curop.markCommand();

        BufBuilder bb;
        bb.skip(sizeof(QueryResult));

        BSONObjBuilder cmdResBuf;
        if (!runCommands(ns, q.query, curop, bb, cmdResBuf, false, q.queryOptions)) {
            uasserted(13530, "bad or malformed command request?");
        }

        curop.debug().iscommand = true;
        // TODO: Does this get overwritten/do we really need to set this twice?
        curop.debug().query = q.query;

        QueryResult* qr = reinterpret_cast<QueryResult*>(bb.buf());
        bb.decouple();
        qr->setResultFlagsToOk();
        qr->len = bb.len();
        curop.debug().responseLength = bb.len();
        qr->setOperation(opReply);
        qr->cursorId = 0;
        qr->startingFrom = 0;
        qr->nReturned = 1;
        result.setData(qr, true);
        return "";
    }

之前的代码中已经对部分killop，unlock等部分命令进行了处理，这个地方对之前没有处理的命令再次进行处理，然后直接返回。如果不是命令则继续往下执行，下面就是整个算法最核心的部分:

    // This is a read lock.  We require this because if we're parsing a $where, the
    // where-specific parsing code assumes we have a lock and creates execution machinery that
    // requires it.
    Client::ReadContext ctx(q.ns);
    Collection* collection = ctx.ctx().db()->getCollection( ns );

    // Parse the qm into a CanonicalQuery.
    CanonicalQuery* cq;
    Status canonStatus = CanonicalQuery::canonicalize(q, &cq);
    if (!canonStatus.isOK()) {
        uasserted(17287, str::stream() << "Can't canonicalize query: " << canonStatus.toString());
    }
    verify(cq);

    QLOG() << "Running query:\n" << cq->toString();
    LOG(2) << "Running query: " << cq->toStringShort();

    // Parse, canonicalize, plan, transcribe, and get a runner.
    Runner* rawRunner = NULL;

    // We use this a lot below.
    const LiteParsedQuery& pq = cq->getParsed();

    // We'll now try to get the query runner that will execute this query for us. There
    // are a few cases in which we know upfront which runner we should get and, therefore,
    // we shortcut the selection process here.
    //
    // (a) If the query is over a collection that doesn't exist, we get a special runner
    // that's is so (a runner) which doesn't return results, the EOFRunner.
    //
    // (b) if the query is a replication's initial sync one, we get a SingleSolutinRunner
    // that uses a specifically designed stage that skips extents faster (see details in
    // exec/oplogstart.h)
    //
    // Otherwise we go through the selection of which runner is most suited to the
    // query + run-time context at hand.
    Status status = Status::OK();
    if (collection == NULL) {
        rawRunner = new EOFRunner(cq, cq->ns());
    }
    else if (pq.hasOption(QueryOption_OplogReplay)) {
        status = getOplogStartHack(collection, cq, &rawRunner);
    }
    else {
        // Takes ownership of cq.
        size_t options = QueryPlannerParams::DEFAULT;
        if (shardingState.needCollectionMetadata(pq.ns())) {
            options |= QueryPlannerParams::INCLUDE_SHARD_FILTER;
        }
        status = getRunner(cq, &rawRunner, options);
    }

    if (!status.isOK()) {
        // NOTE: Do not access cq as getRunner has deleted it.
        uasserted(17007, "Unable to execute query: " + status.reason());
    }

上面部分代码包含数据加载，查询数据解析，查询算法匹配等过程，下面稍微详细的分析一下过程。

    // This is a read lock.  We require this because if we're parsing a $where, the
    // where-specific parsing code assumes we have a lock and creates execution machinery that
    // requires it.
    Client::ReadContext ctx(q.ns);

从注释中可以看着，这是一个“读锁”，但是他实际的功能并不止这些。

 /** "read lock, and set my context, all in one operation" 
     *  This handles (if not recursively locked) opening an unopened database.
     */
    class ReadContext : boost::noncopyable { 
    public:
        ReadContext(const std::string& ns, const std::string& path=storageGlobalParams.dbpath);
        Context& ctx() { return *c.get(); }
    private:
        scoped_ptr<Lock::DBRead> lk;
        scoped_ptr<Context> c;
    };

ReadContext 有点像一个代理或者是适配器，实际包含了一个Context对象，然后利用Lock::DBRead添加“读锁”操作。

/** "read lock, and set my context, all in one operation" 
 *  This handles (if not recursively locked) opening an unopened database.
 */
Client::ReadContext::ReadContext(const string& ns, const std::string& path) {
    {
        lk.reset( new Lock::DBRead(ns) );
        Database *db = dbHolder().get(ns, path);
        if( db ) {
            c.reset( new Context(path, ns, db) );
            return;
        }
    }

    // we usually don't get here, so doesn't matter how fast this part is
    {
        if( Lock::isW() ) { 
            // write locked already
            DEV RARELY log() << "write locked on ReadContext construction " << ns << endl;
            c.reset(new Context(ns, path));
        }
        else if( !Lock::nested() ) { 
            lk.reset(0);
            {
                Lock::GlobalWrite w;
                Context c(ns, path);
            }
            // db could be closed at this interim point -- that is ok, we will throw, and don't mind throwing.
            lk.reset( new Lock::DBRead(ns) );
            c.reset(new Context(ns, path));
        }
        else { 
            uasserted(15928, str::stream() << "can't open a database from a nested read lock " << ns);
        }
    }
}

可以看到在ReadContext构造函数中先根据ns来锁住数据库(之前已经说过，ns包含数据库名称和集合名称)，然后在根据ns和数据库路径来获取Database对象，一个Database对象代表一个数据库(这部分包含数据库数据加载，暂时不分析)，如果获取到db对象，则设置上下文信息。

如果没有获取到db对象，则会进入到下面:

lk.reset( new Lock::DBRead(ns) );
c.reset(new Context(ns, path));

Context提供了多个构造函数，这个构造函数中会去创建db对象，并加载数据。锁住数据库之后将进入核心查询部分。

 // Parse the qm into a CanonicalQuery.
    CanonicalQuery* cq;
    Status canonStatus = CanonicalQuery::canonicalize(q, &cq);

首先会解析查询消息为标准化的查询对象，主要是将BSON结构数据转换为MatchExpression方便使用。
之后会获取一个Runner对象来执行查询:

    Runner* rawRunner = NULL;

    // We use this a lot below.
    const LiteParsedQuery& pq = cq->getParsed();

    // We'll now try to get the query runner that will execute this query for us. There
    // are a few cases in which we know upfront which runner we should get and, therefore,
    // we shortcut the selection process here.
    //
    // (a) If the query is over a collection that doesn't exist, we get a special runner
    // that's is so (a runner) which doesn't return results, the EOFRunner.
    //
    // (b) if the query is a replication's initial sync one, we get a SingleSolutinRunner
    // that uses a specifically designed stage that skips extents faster (see details in
    // exec/oplogstart.h)
    //
    // Otherwise we go through the selection of which runner is most suited to the
    // query + run-time context at hand.
    Status status = Status::OK();
    if (collection == NULL) {
        rawRunner = new EOFRunner(cq, cq->ns());
    }
    else if (pq.hasOption(QueryOption_OplogReplay)) {
        status = getOplogStartHack(collection, cq, &rawRunner);
    }
    else {
        // Takes ownership of cq.
        size_t options = QueryPlannerParams::DEFAULT;
        if (shardingState.needCollectionMetadata(pq.ns())) {
            options |= QueryPlannerParams::INCLUDE_SHARD_FILTER;
        }
        status = getRunner(cq, &rawRunner, options);
    }

上面代码调用getRunner函数来返回一个Runner对象，该Runner对象会对集合进行遍历，然后找到符合查询条件的结果并返回。

一个Runner就代表一种数据查询方式，mongo会根据之前的查询BSON解析结果来判断应该使用哪一种Runner来执行查询，有点类似策略模式。
IDHackRunner ：当前集合是以“_id”作为索引或者查询条件中包含”_id”时就使用此来查询。
CachedPlanRunner：如果之前已经有缓存plan，则使用此来查询。
MultiPlanRunner：使用QueryPlanner来plan 查询条件，如果结果为多个QuerySolution，则使用此来执行查询。
SingleSolutionRunner：和multi相对应，对应一些简单的查询则使用此来执行。
SubPlanRunner：没搞明白…

上面这些Runner都比较复杂，详细分析的话每一个都能需要耗费很多时间，其中包含了对集合的扫描算法，对查询的分段处理等等，整个mongod的核心查询算法都封装在这里面，暂时就不深入研究了。

    // Run the query.
    // bb is used to hold query results
    // this buffer should contain either requested documents per query or
    // explain information, but not both
    BufBuilder bb(32768);
    bb.skip(sizeof(QueryResult));

      ...

    while (Runner::RUNNER_ADVANCED == (state = runner->getNext(&obj, NULL))) {
        // Add result to output buffer. This is unnecessary if explain info is requested
        if (!isExplain) {
            bb.appendBuf((void*)obj.objdata(), obj.objsize());
        }

        // Count the result.
        ++numResults;
        ...
    }

获取Runner对象后当然是使用该对象来获取查询结果，Runner提供一个getNext函数来获取下一个结果，之后的就是将查询结果放到result中，然后返回给客户端。

至此，整个数据查询的轮廓已经出来了，其中数据加载和查询算法部分我都很只是提了一下然后略过，主要是水平有限，很多东西我自己还没弄明白，写出来也都是错的，MongoDB的每一个版本代码改动都很大，参考了很多前辈对其他版本的分析，真是很佩服他们，很多东西都分析很透彻，但是对照来看这个版本的源码还是有很多迷惑的地方，所以数据加载和查询算法两个部分之研究明白之后再单独开篇吧。

posted @ 2015-04-03 15:47 _ccx 阅读(1309) 评论(1) 编辑收藏举报

刷新页面返回顶部

_cx

MongoDB源码分析——mongod数据查询操作

mongod数据查询操作

公告