RestHighLevelClient 之 Scroll
ES中默认最大查询结果为10000,大于10000时查不出结果,报错超过最大值,如把 from调到大于10000.
针对这个问题,有两种解决办法。
第一种,修改 max_result_window
很多人都用这种方法,简单粗暴。缺点是真的简单粗暴,对部分情形可用,但是对一些特殊情形可能就不行了。
PUT index/_settings { "index":{ "max_result_window":100000000 } }
一篇可以参考的博客:关于搜索elasticsearch的数据条数大于10000的坑 max_result_window的两种设置方式
第二种,Scroll
scroll
API 可以被用来检索大量的结果, 甚至所有的结果 ,就像在传统数据库中使用的游标 cursor。
中文翻译参考:https://blog.csdn.net/ctwy291314/article/details/82751898
以下代码是要实现获取ES中全部文档的nid字段,并将其存到文件中,是在单元测试中写的,NID是内部类。
具体代码:
public static class NID { private String nid; public String getNid() { return nid; } public void setNid(String nid) { this.nid = nid; } } @Test public void testScroll() { //RestHighLevelClient client = elasticClient.getRestHighLevelClient(); RestHighLevelClient client = esConfig.client(); // 初始化scroll // 设定滚动时间间隔 // 这个时间并不需要长到可以处理所有的数据,仅仅需要足够长来处理前一批次的结果。每个 scroll 请求(包含 scroll 参数)设置了一个新的失效时间。 final Scroll scroll = new Scroll(TimeValue.timeValueMinutes(1L)); SearchRequest searchRequest = new SearchRequest(esConfig.getCaterIndex()); // 新建索引搜索请求 searchRequest.scroll(scroll); SearchSourceBuilder searchSourceBuilder = new SearchSourceBuilder(); searchSourceBuilder.query(matchAllQuery()); searchSourceBuilder.size(5000); //设定每次返回多少条数据 searchSourceBuilder.fetchSource(new String[]{"nid"},null);//设置返回字段和排除字段 searchRequest.source(searchSourceBuilder); SearchResponse searchResponse = null; try { searchResponse = client.search(searchRequest, RequestOptions.DEFAULT); } catch (IOException e) { e.printStackTrace(); } int page = 0 ; File outFile = new File("E://cater_nid.csv");//写出的CSV文件 try { BufferedWriter writer = new BufferedWriter(new FileWriter(outFile)); SearchHit[] searchHits = searchResponse.getHits().getHits(); page++; System.out.println("-----第"+ page +"页-----"); for (SearchHit searchHit : searchHits) { //System.out.println(searchHit.getSourceAsString()); String sourceAsString = searchHit.getSourceAsString(); NID t = JSON.parseObject(sourceAsString, NID.class); writer.write(t.getNid()); writer.newLine(); } //遍历搜索命中的数据,直到没有数据 String scrollId = searchResponse.getScrollId(); while (searchHits != null && searchHits.length > 0) { SearchScrollRequest scrollRequest = new SearchScrollRequest(scrollId); scrollRequest.scroll(scroll); try { searchResponse = client.scroll(scrollRequest, RequestOptions.DEFAULT); } catch (IOException e) { e.printStackTrace(); } scrollId = searchResponse.getScrollId(); searchHits = searchResponse.getHits().getHits(); if (searchHits != null && searchHits.length > 0) { page++; System.out.println("-----第"+ page +"页-----"); for (SearchHit searchHit : searchHits) { //System.out.println(searchHit.getSourceAsString()); String sourceAsString = searchHit.getSourceAsString(); NID t = JSON.parseObject(sourceAsString, NID.class); writer.write(t.getNid()); writer.newLine(); } } } //清除滚屏 ClearScrollRequest clearScrollRequest = new ClearScrollRequest(); clearScrollRequest.addScrollId(scrollId);//也可以选择setScrollIds()将多个scrollId一起使用 ClearScrollResponse clearScrollResponse = null; try { clearScrollResponse = client.clearScroll(clearScrollRequest, RequestOptions.DEFAULT); } catch (IOException e) { e.printStackTrace(); } boolean succeeded = clearScrollResponse.isSucceeded(); System.out.println("succeeded:" + succeeded); writer.close(); } catch (IOException e) { e.printStackTrace(); } }
代码参考:https://www.cnblogs.com/chentop/p/10296517.html