RestHighLevelClient 之 Scroll

ES中默认最大查询结果为10000,大于10000时查不出结果,报错超过最大值,如把 from调到大于10000.

针对这个问题,有两种解决办法。

第一种,修改 max_result_window

很多人都用这种方法,简单粗暴。缺点是真的简单粗暴,对部分情形可用,但是对一些特殊情形可能就不行了。

PUT index/_settings

{
  "index":{
    "max_result_window":100000000
  }
}

一篇可以参考的博客:关于搜索elasticsearch的数据条数大于10000的坑 max_result_window的两种设置方式


第二种,Scroll

scroll API 可以被用来检索大量的结果, 甚至所有的结果 ,就像在传统数据库中使用的游标 cursor。

本方法官方文档:https://www.elastic.co/guide/en/elasticsearch/reference/7.2/search-request-scroll.html#scroll-search-context

中文翻译参考:https://blog.csdn.net/ctwy291314/article/details/82751898

以下代码是要实现获取ES中全部文档的nid字段,并将其存到文件中,是在单元测试中写的,NID是内部类。

具体代码:

public static class NID {
    private String nid;
    public String getNid() {
        return nid;
    }
    public void setNid(String nid) {
        this.nid = nid;
    }
}

@Test
public void testScroll() {
    //RestHighLevelClient client = elasticClient.getRestHighLevelClient();
    RestHighLevelClient client = esConfig.client();
    // 初始化scroll
    // 设定滚动时间间隔
    // 这个时间并不需要长到可以处理所有的数据,仅仅需要足够长来处理前一批次的结果。每个 scroll 请求(包含 scroll 参数)设置了一个新的失效时间。
    final Scroll scroll = new Scroll(TimeValue.timeValueMinutes(1L));
    SearchRequest searchRequest = new SearchRequest(esConfig.getCaterIndex()); // 新建索引搜索请求
    searchRequest.scroll(scroll);
    SearchSourceBuilder searchSourceBuilder = new SearchSourceBuilder();
    searchSourceBuilder.query(matchAllQuery());
    searchSourceBuilder.size(5000); //设定每次返回多少条数据
    searchSourceBuilder.fetchSource(new String[]{"nid"},null);//设置返回字段和排除字段
    searchRequest.source(searchSourceBuilder);

    SearchResponse searchResponse = null;
    try {
        searchResponse = client.search(searchRequest, RequestOptions.DEFAULT);
    } catch (IOException e) {
        e.printStackTrace();
    }

    int page = 0 ;
    File outFile = new File("E://cater_nid.csv");//写出的CSV文件
    try {
        BufferedWriter writer = new BufferedWriter(new FileWriter(outFile));

        SearchHit[] searchHits = searchResponse.getHits().getHits();
        page++;
        System.out.println("-----第"+ page +"页-----");
        for (SearchHit searchHit : searchHits) {
            //System.out.println(searchHit.getSourceAsString());
            String sourceAsString = searchHit.getSourceAsString();
            NID t = JSON.parseObject(sourceAsString, NID.class);
            writer.write(t.getNid());
            writer.newLine();
        }

        //遍历搜索命中的数据,直到没有数据
        String scrollId = searchResponse.getScrollId();
        while (searchHits != null && searchHits.length > 0) {
            SearchScrollRequest scrollRequest = new SearchScrollRequest(scrollId);
            scrollRequest.scroll(scroll);
            try {
                searchResponse = client.scroll(scrollRequest, RequestOptions.DEFAULT);
            } catch (IOException e) {
                e.printStackTrace();
            }
            scrollId = searchResponse.getScrollId();
            searchHits = searchResponse.getHits().getHits();
            if (searchHits != null && searchHits.length > 0) {
                page++;
                System.out.println("-----第"+ page +"页-----");
                for (SearchHit searchHit : searchHits) {
                    //System.out.println(searchHit.getSourceAsString());
                    String sourceAsString = searchHit.getSourceAsString();
                    NID t = JSON.parseObject(sourceAsString, NID.class);
                    writer.write(t.getNid());
                    writer.newLine();
                }
            }
        }
        //清除滚屏
        ClearScrollRequest clearScrollRequest = new ClearScrollRequest();
        clearScrollRequest.addScrollId(scrollId);//也可以选择setScrollIds()将多个scrollId一起使用
        ClearScrollResponse clearScrollResponse = null;
        try {
            clearScrollResponse = client.clearScroll(clearScrollRequest, RequestOptions.DEFAULT);
        } catch (IOException e) {
            e.printStackTrace();
        }
        boolean succeeded = clearScrollResponse.isSucceeded();
        System.out.println("succeeded:" + succeeded);

        writer.close();

    } catch (IOException e) {
        e.printStackTrace();
    }
}

代码参考:https://www.cnblogs.com/chentop/p/10296517.html







TIM图片20190628110618


posted @ 2019-08-29 16:01  yeren2046  阅读(8508)  评论(4编辑  收藏  举报