elasticsearch的查询数大于10000查询失败

错误信息

Caused by: ElasticsearchException[Elasticsearch exception [type=illegal_argument_exception, reason=Result window is too large, from + size must be less than or equal to: [10000] but was [11000]. See the scroll api for a more efficient way to request large data sets. This limit can be set by changing the [index.max_result_window] index level setting.]]; nested: ElasticsearchException[Elasticsearch exception [type=illegal_argument_exception, reason=Result window is too large, from + size must be less than or equal to: [10000] but was [11000]. See the scroll api for a more efficient way to request large data sets. This limit can be set by changing the [index.max_result_window] index level setting.]];
	at org.elasticsearch.ElasticsearchException.innerFromXContent(ElasticsearchException.java:491)
	at org.elasticsearch.ElasticsearchException.fromXContent(ElasticsearchException.java:402)
	at org.elasticsearch.ElasticsearchException.innerFromXContent(ElasticsearchException.java:432)
	at org.elasticsearch.ElasticsearchException.failureFromXContent(ElasticsearchException.java:598)
	at org.elasticsearch.rest.BytesRestResponse.errorFromXContent(BytesRestResponse.java:170)
	... 122 more

错误原因

elasticsearch官方提及:不能使用from和size翻阅超过 10,000 个点击。此限制是由index.max_result_window索引设置设置的保护措施。

处理方法:

方法1.增加max_result_window的大小

PUT tweets_v1/_settings
{
    "index": {
        "max_result_window": 1000000
    }
}

注:手动增加max_result_window大小,会导致内存与CPU占用过高,并且不好保证手动设置的大小,在一定时间后会不会任然够用,故而不推荐使用。

方法2.使用Scroll API(滚动查询)

    @Resource
    private ElasticsearchRestTemplate template;

    // 第一次获取数据
    public ScrollPageResult<TweetsEntity> scrollStart(Integer pageSize) {
        NativeSearchQuery nativeSearchQuery = new NativeSearchQueryBuilder()
                .withSort(new FieldSortBuilder("createTime"))
                .build();
        // 设置每页数据量
        nativeSearchQuery.setMaxResults(pageSize);
        // 设置滚动id的存在时间
        long scrollTimeInMillis = 600 * 1000;
        // 第一次查询
        SearchScrollHits<TweetsEntity> searchScrollHits = template.searchScrollStart(scrollTimeInMillis, nativeSearchQuery, TweetsEntity.class, IndexCoordinates.of("tweets_v1"));

        // 获取下一次的滚动id,时间有限
        String scrollId = searchScrollHits.getScrollId();

        // 获取数据
        List<TweetsEntity> results = new ArrayList<>();
        searchScrollHits.getSearchHits().iterator().forEachRemaining(item -> results.add(item.getContent()));

        // 数据返回
        return ScrollPageResult.of(scrollId, results);
    }

    // 获取下一页数据
    public ScrollPageResult<TweetsEntity> scrollNext(String scrollId) {
        long scrollTimeInMillis = 60 * 1000;
        SearchScrollHits<TweetsEntity> searchScrollHits = template.searchScrollContinue(scrollId, scrollTimeInMillis, TweetsEntity.class, IndexCoordinates.of("tweets_v1"));

        // 获取下一次的滚动id,时间有限
        String scrollId2 = searchScrollHits.getScrollId();

        // 获取数据
        List<TweetsEntity> results = new ArrayList<>();
        searchScrollHits.getSearchHits().iterator().forEachRemaining(item -> results.add(item.getContent()));

        if (searchScrollHits.hasSearchHits()) {
            // 数据返回
            return ScrollPageResult.of(scrollId2, results);
        } else {
            // 清除 scroll
            template.searchScrollClear(Collections.singletonList(scrollId2));
            // 数据返回
            return ScrollPageResult.of(null, results);
        }
    }

注:elasticsearch官方也是不推荐使用Scroll API的。
一是滚动搜索的结果反映了初始搜索请求时的索引状态,随后的索引或文档更改只会影响以后的搜索和滚动请求;
二是Scroll API需要Scroll ID。要获取Scroll ID,则需要设置请求保留搜索上下文多长时间,只有在时间中生成的Scroll ID才有效。

方法3.使用search_after参数

    @Resource
    private ElasticsearchRestTemplate template;
    // 第一次获取
    public SearchAfterPageResult<TweetsEntity> searchAfterStart(Integer pageSize) {
        NativeSearchQuery searchQuery = new NativeSearchQueryBuilder()
                .withQuery(QueryBuilders.matchAllQuery())
                .withPageable(Pageable.ofSize(pageSize))
                .withSort(Sort.by("createTime")).build();
        SearchHits<TweetsEntity> search = template.search(searchQuery, TweetsEntity.class, IndexCoordinates.of("tweets_v1"));
        List<SearchHit<TweetsEntity>> list = search.getSearchHits();

        // 获取数据
        List<TweetsEntity> results = new ArrayList<>();
        search.getSearchHits().iterator().forEachRemaining(item -> results.add(item.getContent()));

        // 数据返回
        return SearchAfterPageResult.of(list.get(list.size() - 1).getSortValues(), results);
    }

    // 获取下一页数据
    public SearchAfterPageResult<TweetsEntity> searchAfterNext(List<Object> searchAfter, Integer pageSize) {
        NativeSearchQuery searchQuery = new NativeSearchQueryBuilder()
                .withQuery(QueryBuilders.matchAllQuery())
                .withPageable(Pageable.ofSize(pageSize))
                .withSearchAfter(searchAfter)
                .withSort(Sort.by("createTime")).build();
        SearchHits<TweetsEntity> search = null;
        try {
            search = template.search(searchQuery, TweetsEntity.class, IndexCoordinates.of("tweets_v1"));
        } catch (Exception e) {
            throw new RuntimeException(e);
        }
        List<SearchHit<TweetsEntity>> list = search.getSearchHits();

        // 获取数据
        List<TweetsEntity> results = new ArrayList<>();
        search.getSearchHits().iterator().forEachRemaining(item -> results.add(item.getContent()));

        // 数据返回
        if (pageSize.equals(results.size())) {
            return SearchAfterPageResult.of(list.get(list.size() - 1).getSortValues(), results);
        } else {
            return SearchAfterPageResult.of(null, results);
        }
    }

注: Elasticsearch 使用 Lucene 的内部文档 ID 作为决胜局。这些内部文档 ID 在相同数据的副本中可能完全不同。当分页搜索命中时,您可能偶尔会看到具有相同排序值的文档的排序不一致。
更多详细资料:https://www.elastic.co/guide/en/elasticsearch/reference/7.17/paginate-search-results.html#search-after

我使用的elasticsearch高版本,在高版本中官方已经不推荐使用RestHighLevelClient,而是使用ElasticsearchRestTemplate。

posted @ 2022-10-15 19:07  purple910  阅读(903)  评论(0编辑  收藏  举报