MongoDB异常MongoCursorNotFoundException

昨天在测试数据导出的时候发现，若连续导出多次，则会报如下异常：

com.mongodb.MongoCursorNotFoundException: Query failed with error code -5

异常信息为Mongo查询的游标找不到导致查询失败;
网上的解决办法大多包含如下几种：

noCursorTimeout 设置cursor无超时时间

此种操作查询完成后需要手动清理cursor，若因为异常或网络则会导致游标一直存在，所以不推荐此方法
batchSize 指定在 MongoDB 实例的每批响应中要返回的文档数

https://www.docs4dev.com/docs/zh/mongodb/v3.6/reference/reference-method-cursor.batchSize.html

官方文档：

关于cursor的说明

看完上面的解决方法，应该为虎躯一震，原来是这样。但是不要忽略开篇第一句的问题所在，若连续导出多次，则会报如下异常
由此可知，我们的异常并不是因为cursor的过期而导致的，那为什么会出现cursor not found呢？

我们先看下find的部分查询源码：

	/**
	 * Internal method using callback to do queries against the datastore that requires reading a collection of objects.
	 * It will take the following steps
	 * <ol>
	 * <li>Execute the given {@link ConnectionCallback} for a {@link DBCursor}.</li>
	 * <li>Prepare that {@link DBCursor} with the given {@link CursorPreparer} (will be skipped if {@link CursorPreparer}
	 * is {@literal null}</li>
	 * <li>Iterate over the {@link DBCursor} and applies the given {@link DocumentCallback} to each of the
	 * {@link Document}s collecting the actual result {@link List}.</li>
	 * <ol>
	 *
	 * @param <T>
	 * @param collectionCallback the callback to retrieve the {@link DBCursor} with
	 * @param preparer the {@link CursorPreparer} to potentially modify the {@link DBCursor} before iterating over it
	 * @param objectCallback the {@link DocumentCallback} to transform {@link Document}s into the actual domain type
	 * @param collectionName the collection to be queried
	 * @return
	 */
	private <T> List<T> executeFindMultiInternal(CollectionCallback<FindIterable<Document>> collectionCallback,
			@Nullable CursorPreparer preparer, DocumentCallback<T> objectCallback, String collectionName) {

		try {

			MongoCursor<Document> cursor = null;

			try {

				FindIterable<Document> iterable = collectionCallback
						.doInCollection(getAndPrepareCollection(doGetDatabase(), collectionName));

				if (preparer != null) {
					iterable = preparer.prepare(iterable);
				}

				cursor = iterable.iterator();

				List<T> result = new ArrayList<>();

				while (cursor.hasNext()) {
					Document object = cursor.next();
					result.add(objectCallback.doWith(object));
				}

				return result;
			} finally {

				if (cursor != null) {
					cursor.close();
				}
			}
		} catch (RuntimeException e) {
			throw potentiallyConvertRuntimeException(e, exceptionTranslator);
		}
	}

摘自网络博客：

当我们在使用db.collection.find()命令查询mongodb数据时，直接返回给你的并不是数据本身，而是一个游标，每个游标都有对应的一个游标ID，服务器会记录这个游标ID，真正获取数据时，是通过对游标进行遍历拿到数据，对应的遍历方法主要是hashNext()和next()，跟iterator迭代器一样使用（命令行客户端之所以通过find()命令就得到数据，是因为它自动帮你遍历了游标，且默认展示了20条数据），客户端通过游标从服务端获取数据时并不是一条一条的，而是一批一批的，这样可以提升IO性能，每批数据都缓存在客户端内存中，通过next()遍历完后，继续通过getMore()方法去服务器获取下一批数据，而此时需要携带cursorid的，服务器通过cursorid辨别是取什么数据，当服务器端没有这个cursorid时，就会发生这个游标找不到的错误。

以此我们知道了find命令是依赖batchsize配置来进行迭代多次查询的，那么如果说cursor并没有过期，只是多次获取时找不到了呢？

我们继续查阅源码，在Mongo驱动的代码中找到了获取连接的代码：

    // 摘自com.mongodb.operation.QueryBatchCursor类中
    private void getMore() {
        Connection connection = connectionSource.getConnection();
        try {
            if (serverIsAtLeastVersionThreeDotTwo(connection.getDescription())) {
                try {
                    initFromCommandResult(connection.command(namespace.getDatabaseName(),
                                                             asGetMoreCommandDocument(),
                                                             NO_OP_FIELD_NAME_VALIDATOR,
                                                             ReadPreference.primary(),
                                                             CommandResultDocumentCodec.create(decoder, "nextBatch"),
                                                             connectionSource.getSessionContext()));
                } catch (MongoCommandException e) {
                    throw translateCommandException(e, serverCursor);
                }
            } else {
                QueryResult<T> getMore = connection.getMore(namespace, serverCursor.getId(),
                        getNumberToReturn(limit, batchSize, count), decoder);
                initFromQueryResult(getMore);
            }
            if (limitReached()) {
                killCursor(connection);
            }
            if (serverCursor == null) {
                this.connectionSource.release();
                this.connectionSource = null;
            }
        } finally {
            connection.release();
        }
    }

以此可以看见finally中执行了connection.release() 即每次连接后都会断开连接;
那么会不会存在mongo集群下，连接到另一台机器的情况呢？
查阅资料：

正常情况下，当我们使用mongodb集群时，将所有mongodb服务器以 IP1:PORT1,IP2:PORT2,IP3:PORT3的形式传给驱动，驱动能够自动完成负载均衡和保持会话转发到同一个服务器，这时候不会出现问题；
一旦我们自己实现负载均衡，即用了统一域名或者ip分发了Ip.就会存在每次连接到不同机器，导致找不到cursor，也因此会抛出MongoCursorNotFoundException的错误;
当然，如果自己实现的负载根据Ip来进行了机器分发，确保相同ip每次分发请求到同一台机器，那么也不会存在此类问题；

后来问了我们这边的dba，发现mongo集群的确是自己实现了负载，且存在此类问题；

结论:

知道这个问题的原因后，可以知道之前的修改batchSize也是行不通的，之所以修改后避免了问题，只是因为batchsize修改的足够大，避免了多次获取游标；

那么我们可以得到解决方案，将Mongo的配置改为真实的mongo机器IP，以 IP1:PORT1,IP2:PORT2,IP3:PORT3的形式传给驱动，由驱动自动完成负载。

posted @ 2020-12-04 16:47 faylinn 阅读(2999) 评论(3) 编辑收藏举报

刷新页面返回顶部

Faylinn

www.heyouxin.cn

MongoDB异常MongoCursorNotFoundException

官方文档：

结论:

公告