Spring boot项目使用 restHighLevelClient 接入 elasticsearch
Spring boot 接入 ElasticSearch 查询数据
最近在做一个需要支持大数据量查询的项目,调研之后选用ElasticSearch存储数据,并接入Spring Boot项目,通过rest接口查询并返回。具体的,获取数据并向ES中插入数据是用Python脚本实现的,本博客只涉及查询操作。
一. 接入ElasticSearch
选用的是官网推荐的restHighLevelClient,其封装了CRUD方法。
服务器上已经部署好ES的前提下,在spring boot项目中接入大概分为三步:
1. 添加依赖
为了简洁,将pom文件中无关的部分都删去了。主要是选择一个适合的es版本,这里选的是7.6.0
<properties>
<java.version>1.8</java.version>
<es.version>7.6.0</es.version>
</properties>
<dependencies>
<dependency>
<groupId>org.springframework.boot</groupId>
<artifactId>spring-boot-starter-data-elasticsearch</artifactId>
</dependency>
<dependency>
<groupId>org.elasticsearch</groupId>
<artifactId>elasticsearch</artifactId>
<version>${es.version}</version>
</dependency>
<dependency>
<groupId>org.elasticsearch.client</groupId>
<artifactId>elasticsearch-rest-high-level-client</artifactId>
<version>${es.version}</version>
</dependency>
</dependencies>
2. 在yaml文件中添加配置
在这里主要配置es服务的地址,以及鉴权。
spring:
elasticsearch:
rest:
connection-timeout: 6s
uris: test-cluster-01:9200,test-cluster-02:9200
read-timeout: 10s
# 如果不需要账号密码就可访问,下面两个字段可以去掉
username: estest
password: estest
3. 创建configration,在服务启动时创建好restHighLevelClient
使用configration注解,服务启动时会生成RestHighLevelClient的bean,之后使用只需要注入就行了。
@Configuration
public class ESConfig {
@Value("${spring.elasticsearch.rest.uris}")
private List<String> uris;
// 如果不需要账号密码就可访问,userName和password两个字段可以去掉
@Value("${spring.elasticsearch.rest.password}")
private String userName;
@Value("${spring.elasticsearch.rest.username}")
private String password;
@Bean
public RestHighLevelClient restHighLevelClient() {
HttpHost[] httpHosts = createHosts();
RestClientBuilder restClientBuilder = RestClient.builder(httpHosts)
.setHttpClientConfigCallback(httpClientBuilder -> {
CredentialsProvider credentialsProvider = new BasicCredentialsProvider();
// 如果不需要账号密码就可访问,下面这行可以去掉
credentialsProvider.setCredentials(AuthScope.ANY,new UsernamePasswordCredentials(userName,password));
return httpClientBuilder.setDefaultCredentialsProvider(credentialsProvider);
});
return new RestHighLevelClient(restClientBuilder);
}
// 支持ES分布式
private HttpHost[] createHosts() {
HttpHost[] httpHosts = new HttpHost[uris.size()];
for (int i = 0; i < uris.size(); i++) {
String hostStr = uris.get(i);
String[] host = hostStr.split(":");
httpHosts[i] = new HttpHost(host[0].trim(),Integer.parseInt(host[1].trim()));
}
return httpHosts;
}
}
二. 查询
es中存储的数据结构如下,下面根据这个数据结构进行各种查询
class Entity {
private String id;
private String summary;
private String name;
private String introduction;
}
1. 根据id查询(单索引查询)
这里选用GetRequest查询,非常方便,但缺点就是只能设置一个索引查询,也只能设置一个id,不能批量。
public class EsEntityClient {
@Autowired
private RestHighLevelClient restHighLevelClient;
// 设置索引名
private static final String INDEX_NAME = "entity";
public Entity queryEntityById(String id) {
GetRequest getRequest = new GetRequest(INDEX_NAME).id(id);
Entity entity = null;
try {
GetResponse response = restHighLevelClient.get(getRequest, RequestOptions.DEFAULT);
entity = JSONObject.parseObject(JSONObject.toJSONString(response.getSource()), Entity.class);
} catch (IOException e) {
log.warn("can't find entity, id:{}", id, e);
}
return entity;
}
2.根据ids进行 单/多 索引查询
选用IdsQueryBuilder来构建查询
public List<Entity> queryByIds(List<String> ids) {
IdsQueryBuilder idsQueryBuilder = QueryBuilders.idsQuery();
idsQueryBuilder.ids().addAll(ids);
SearchSourceBuilder searchSourceBuilder = SearchSourceBuilder.searchSource()
.query(idsQueryBuilder);
SearchRequest searchRequest = new SearchRequest().source(searchSourceBuilder)
// 这里可以设置多索引
.indices("idx1","idx2","idx3");
List<Entity> entities = new ArrayList<>();
try {
SearchResponse searchResponse = restHighLevelClient.search(searchRequest, RequestOptions.DEFAULT);
SearchHit[] hits = searchResponse.getHits().getHits();
// 根据score倒序排序(相关度排序)
Arrays.sort(hits, (h1, h2) -> (int) (h2.getScore() - h1.getScore()));
for (SearchHit hit : hits) {
String jsonString = hit.getSourceAsString();
entities.add(JSONObject.parseObject(jsonString, Entity.class));
}
} catch (IOException e) {
log.warn("search by ids failed. ids:{}", ids.toString(), e);
}
return entities;
}
3.根据id查询(多索引查询)
项目中遇到的问题是,爬虫从不同来源爬取的数据存在了不同的index里,所以前端给一个id,可能需要从多个索引中查询。此时上面的GetRequest就行不通了(当然可以循环去查不同index,但是index多的情况下,IO开销大,接口响应慢)。
由此,我选择的方法是...
public Entity queryById(String id) {
List<Entity> result = queryByIds(Collections.singletonList(id));
if (CollectionUtils.isEmpty(result)) {
log.warn("can't find entity by id:{}", id);
return null;
}
// 这里其实直接返回result.get(0)就行吧,但是这里不转不行,感觉是aliFastJson的BUG
return JSONObject.parseObject(JSONObject.toJSONString(result.get(0)), Entity.class);
}
4.根据name精准查询
public List<Entity> queryEntityByName(String name) {
BoolQueryBuilder queryBuilder = new BoolQueryBuilder();
// 使用termQuery,第一个参数为:目标字段名.keyword,就可以实现对这个参数的精准匹配
queryBuilder.filter(QueryBuilders.termQuery("name" + ".keyword", name));
SearchSourceBuilder searchSourceBuilder = SearchSourceBuilder.searchSource().query(queryBuilder).size(20);
SearchRequest searchRequest = new SearchRequest().source(searchSourceBuilder).indices(INDEX_NAME);
List<Entity> entities = new ArrayList<>();
try {
SearchResponse searchResponse = restHighLevelClient.search(searchRequest, RequestOptions.DEFAULT);
SearchHit[] hits = searchResponse.getHits().getHits();
Arrays.sort(hits, (h1, h2) -> (int) (h2.getScore() - h1.getScore()));
for (SearchHit hit : hits) {
String jsonString = hit.getSourceAsString();
entities.add(JSONObject.parseObject(jsonString, Entity.class));
}
} catch (IOException e) {
log.warn("search entities failed. name:{}", name, e);
}
return entities;
}
5.多字段模糊搜索
根据各个字段的关键字,模糊匹配
public List<Entity> query(String name, String summary, String introduction) {
BoolQueryBuilder queryBuilder = buildFuzzQueryBuilder(name, summary, introduction);
// 暂时写死查100个
SearchSourceBuilder searchSourceBuilder = SearchSourceBuilder.searchSource().query(queryBuilder).size(100);
SearchRequest searchRequest = new SearchRequest().source(searchSourceBuilder);
// 设置查询范围
searchRequest.indices("idx1","idx2","idx3");
List<Entity> entities = new ArrayList<>();
try {
SearchResponse searchResponse = restHighLevelClient.search(searchRequest, RequestOptions.DEFAULT);
SearchHit[] hits = searchResponse.getHits().getHits();
Arrays.sort(hits, (h1, h2) -> (int) (h2.getScore() - h1.getScore()));
for (SearchHit hit : hits) {
String jsonString = hit.getSourceAsString();
entities.add(JSONObject.parseObject(jsonString, Entity.class));
}
} catch (IOException e) {
log.warn("search failed.", e);
}
return entities;
}
// 构建查询
private BoolQueryBuilder buildFuzzQueryBuilder(String name, String summary, String introduction) {
BoolQueryBuilder boolQueryBuilder = new BoolQueryBuilder();
if (Strings.isNotEmpty(name)) {
// 模糊匹配
MatchPhraseQueryBuilder queryBuilder = QueryBuilders.matchPhraseQuery("name", name);
boolQueryBuilder.filter(queryBuilder);
}
if (Strings.isNotEmpty(summary)) {
// 模糊匹配
MatchPhraseQueryBuilder queryBuilder = QueryBuilders.matchPhraseQuery("summary", summary);
boolQueryBuilder.filter(queryBuilder);
}
if (Strings.isNotEmpty(introduction)) {
// 模糊匹配
MatchPhraseQueryBuilder queryBuilder = QueryBuilders.matchPhraseQuery("introduction", introduction);
boolQueryBuilder.filter(queryBuilder);
}
return boolQueryBuilder;
}
简单的demo实现已经上传到 https://github.com/bupt-yanch/spring-elasticsearch-demo