ES - 入门
初始ES
安装 elasticsearch
# 1
docker network create es-net
#2导入数据
docker load -i es.tar
#3 运行
docker run -d \
--name es \
-e "ES_JAVA_OPTS=-Xms512m -Xmx512m" \
-e "discovery.type=single-node" \
-v es-data:/usr/share/elasticsearch/data \
-v es-plugins:/usr/share/elasticsearch/plugins \
--privileged \
--network es-net \
-p 9200:9200 \
-p 9300:9300 \
elasticsearch:7.12.1
参数介绍:
-d
: 后台运行
-e "ES_JAVA_OPTS=-Xms512m -Xmx512m"
: 设置环境,内存大小
-e "discovery.type=single-node"
: 非集群模式
-v es-data:/usr/share/elasticsearch/data
:挂载逻辑卷,绑定es的数据目录
-v es-plugins:/usr/share/elasticsearch/plugins
:挂载逻辑卷,绑定es的插件目录
--privileged
:授予逻辑卷访问权
--network es-net
:加入一个名为es-net的网络中
-p 9200:9200
:端口映射配置
当问:http://192.168.184.152:9200/
代表搭建成功
安装 kibana
# 1
docker load -i kibana.tar
# 2
docker run -d \
--name kibana \
-e ELASTICSEARCH_HOSTS=http://es:9200 \
--network=es-net \
-p 5601:5601 \
kibana:7.12.1
--network=es-net
: 加入一个名为es-net的网络中,与elaseticsearch 在同一个网络中
-e ELASTICSEARCH_HOSTS=http://es:9200"
: 设置elaseticsearch 的地址,因为Kibana 与 elaseticsearch 在同一个网络中,因此可以使用容器名直接访问elaseticsearch
-p 5601:5601
: 端口映射配置
kibana启动一般比较慢,需要多等待一会,可以通过命令:
shell docker logs -f kibana
查看运行日志,当查看到下面的日志,说明成功:
需要注意的是启动Kibana前 要先启动 ES
DevTool 可以直接写DSL 语句:
安装 IK 分词器
默认的标准分词器对中文分词不太友好,可以使用IK分词器
下载地址:https://github.com/medcl/elasticsearch-analysis-ik
IK 分词器包含两种模式:
- ik_smart: 最少切分
- ik_max_word: 最细切分
离线安装:
1.查看es 插件数据卷目录
docker volume inspect es-plugins
- 将下载好的IK分词器解压后上传到:/var/lib/docker/volumes/es-plugins/_data
3.重启容器
docker restart es
测试最少切分:
POST /_analyze
{
"text":"程序员旺财学习JAVA太开心了",
"analyzer":"ik_smart"
}
结果:
{
"tokens" : [
{
"token" : "程序员",
"start_offset" : 0,
"end_offset" : 3,
"type" : "CN_WORD",
"position" : 0
},
{
"token" : "旺",
"start_offset" : 3,
"end_offset" : 4,
"type" : "CN_CHAR",
"position" : 1
},
{
"token" : "财",
"start_offset" : 4,
"end_offset" : 5,
"type" : "CN_CHAR",
"position" : 2
},
{
"token" : "学习",
"start_offset" : 5,
"end_offset" : 7,
"type" : "CN_WORD",
"position" : 3
},
{
"token" : "java",
"start_offset" : 7,
"end_offset" : 11,
"type" : "ENGLISH",
"position" : 4
},
{
"token" : "太",
"start_offset" : 11,
"end_offset" : 12,
"type" : "CN_CHAR",
"position" : 5
},
{
"token" : "开心",
"start_offset" : 12,
"end_offset" : 14,
"type" : "CN_WORD",
"position" : 6
},
{
"token" : "了",
"start_offset" : 14,
"end_offset" : 15,
"type" : "CN_CHAR",
"position" : 7
}
]
}
测试最细切分:
POST /_analyze
{
"text":"程序员旺财学习JAVA太开心了",
"analyzer":"ik_max_word"
}
结果:
{
"tokens" : [
{
"token" : "程序员",
"start_offset" : 0,
"end_offset" : 3,
"type" : "CN_WORD",
"position" : 0
},
{
"token" : "程序",
"start_offset" : 0,
"end_offset" : 2,
"type" : "CN_WORD",
"position" : 1
},
{
"token" : "员",
"start_offset" : 2,
"end_offset" : 3,
"type" : "CN_CHAR",
"position" : 2
},
{
"token" : "旺",
"start_offset" : 3,
"end_offset" : 4,
"type" : "CN_CHAR",
"position" : 3
},
{
"token" : "财",
"start_offset" : 4,
"end_offset" : 5,
"type" : "CN_CHAR",
"position" : 4
},
{
"token" : "学习",
"start_offset" : 5,
"end_offset" : 7,
"type" : "CN_WORD",
"position" : 5
},
{
"token" : "java",
"start_offset" : 7,
"end_offset" : 11,
"type" : "ENGLISH",
"position" : 6
},
{
"token" : "太",
"start_offset" : 11,
"end_offset" : 12,
"type" : "CN_CHAR",
"position" : 7
},
{
"token" : "开心",
"start_offset" : 12,
"end_offset" : 14,
"type" : "CN_WORD",
"position" : 8
},
{
"token" : "了",
"start_offset" : 14,
"end_offset" : 15,
"type" : "CN_CHAR",
"position" : 9
}
]
}
IK 分词器的扩展和停用词典
IK 分词器维护了一个词库,但是针对一些网络流行语,比如奥里给,白嫖等词,默认情况下并不会识别为一个词,
此时我们就可以扩展词库
要拓展IK分词器的词库,只需要修改一个ik分词器目录中config没目录中的IKAnalyzer.cfg.xml文件
文件内容:
<?xml version="1.0" encoding="UTF-8"?>
<!DOCTYPE properties SYSTEM "http://java.sun.com/dtd/properties.dtd">
<properties>
<comment>IK Analyzer 扩展配置</comment>
<!--用户可以在这里配置自己的扩展字典 -->
<entry key="ext_dict">ext.dic</entry>
<!--用户可以在这里配置自己的扩展停止词字典-->
<entry key="ext_stopwords"></entry>
<!--用户可以在这里配置远程扩展字典 -->
<!-- <entry key="remote_ext_dict">words_location</entry> -->
<!--用户可以在这里配置远程扩展停止词字典-->
<!-- <entry key="remote_ext_stopwords">words_location</entry> -->
</properties>
在当前目录下,新建文件:ext.dit 并保存以下内容:
传智教育
奥里给
白嫖
旺财
针对一起停用词,可以将其维护在 stopword.dic(默认就有的文件),比如:
新增ext.dic 和 修改完stopword.dic 后需重启 es
重启完毕后,测试:
POST /_analyze
{
"text":"传智教育的课程可以白嫖,而且就业率高达95%,奥里给!",
"analyzer":"ik_smart"
}
结果:
{
"tokens" : [
{
"token" : "传智教育",
"start_offset" : 0,
"end_offset" : 4,
"type" : "CN_WORD",
"position" : 0
},
{
"token" : "课程",
"start_offset" : 5,
"end_offset" : 7,
"type" : "CN_WORD",
"position" : 1
},
{
"token" : "可以",
"start_offset" : 7,
"end_offset" : 9,
"type" : "CN_WORD",
"position" : 2
},
{
"token" : "白嫖",
"start_offset" : 9,
"end_offset" : 11,
"type" : "CN_WORD",
"position" : 3
},
{
"token" : "而且",
"start_offset" : 12,
"end_offset" : 14,
"type" : "CN_WORD",
"position" : 4
},
{
"token" : "就业率",
"start_offset" : 14,
"end_offset" : 17,
"type" : "CN_WORD",
"position" : 5
},
{
"token" : "高达",
"start_offset" : 17,
"end_offset" : 19,
"type" : "CN_WORD",
"position" : 6
},
{
"token" : "95",
"start_offset" : 19,
"end_offset" : 21,
"type" : "ARABIC",
"position" : 7
},
{
"token" : "奥里给",
"start_offset" : 23,
"end_offset" : 26,
"type" : "CN_WORD",
"position" : 8
}
]
}
结果就是识别到了白嫖、奥里给等新兴词语,停用词也不会进行分词
操作索引库
mapping 属性
mapping 是对索引库中文档的约束,类似数据库中的schema,常见的mapp属性包括:
-
type:字段数据类型,常见的简单类型有:
- 字符串: text(可分词的文本)、keyword(精确值,不用分词,例如:品牌、国家、ip地址)
- 数值:long、integer、short、byte、double、float
- 布尔: boolean
- 日期: date
- 对象: object
-
index: 是否创建索引,默认为true
-
analyze:使用哪种分词器
-
properties: 该字段的子字段
创建索引库
ES中通过Restful 请求操作索引库、文档。请求内容用DSL语句来表示。创建索引库和mapping的DSL语法如下:
示例:
# 创建索引库
PUT product
{
"mappings": {
"properties": {
"info": {
"type": "text",
"analyzer": "ik_smart" # 指定分词器
},
"email": {
"type": "keyword",
"index": false # 不需要参数搜索,因此不用创建倒排索引
},
"name": {
"type": "object",
"properties": { # 嵌套字段
"firstName": {
"type": "keyword"
},
"lastName": {
"type": "keyword"
}
}
}
}
}
}
删除、查询、修改索引库
# 删除索引库
delete /product
# 查询索引库
put /product
# 修改索引库,只能添加新字段
PUT /emp/_mapping
{
"properties":{
"age":{
"type":"integer"
}
}
}
修改字段时会报错,如下将age 字段的类型修改为long:
文档操作
新增、查询、删除
新增语法:
示例:
# 新增文档
POST /emp/_doc/1
{
"age":23,
"info":"国庆放假好开心",
"email":"xxx@test.com",
"name":{
"firstName":"张",
"lastName":"三"
}
}
# 返回:
{
"_index" : "emp",
"_type" : "_doc",
"_id" : "1",
"_version" : 6,
"result" : "created",
"_shards" : {
"total" : 2,
"successful" : 1,
"failed" : 0
},
"_seq_no" : 7,
"_primary_term" : 9
}
# 查询文档
GET /emp/_doc/1
# 返回:
{
"_index" : "emp",
"_type" : "_doc",
"_id" : "1",
"_version" : 6,
"_seq_no" : 7,
"_primary_term" : 9,
"found" : true,
"_source" : {
"age" : 23,
"info" : "国庆放假好开心",
"email" : "xxx@test.com",
"name" : {
"firstName" : "张",
"lastName" : "三"
}
}
}
# 删除文档
DELETE /emp/_doc/1
修改文档
方式一:全量修改,当文档ID存在时,先删除后修改。当文档ID不存在时,直接新增
# 修改存在的文档
POST /emp/_doc/1
{
"age":23,
"info":"国庆放假好开心",
"email":"ZhangSan@test.com",
"name":{
"firstName":"张",
"lastName":"三"
}
}
# 返回:
{
"_index" : "emp",
"_type" : "_doc",
"_id" : "1",
"_version" : 7,
"result" : "updated",
"_shards" : {
"total" : 2,
"successful" : 1,
"failed" : 0
},
"_seq_no" : 8,
"_primary_term" : 9
}
# 修改不存在的文档,直接新增
POST /emp/_doc/3
{
"age":23,
"info":"中秋放假好开心",
"email":"wangcai@test.com",
"name":{
"firstName":"wang",
"lastName":"cai"
}
}
方式二:局部修改(增量修改)
语法:
# 局部修改
POST /emp/_update/3
{
"doc":{
"age": 24
}
}
# 返回:
{
"_index" : "emp",
"_type" : "_doc",
"_id" : "3",
"_version" : 2,
"result" : "updated",
"_shards" : {
"total" : 2,
"successful" : 1,
"failed" : 0
},
"_seq_no" : 13,
"_primary_term" : 9
}
RestClient 操作索引库
官方文档:https://www.elastic.co/guide/en/elasticsearch/client/java-api-client/current/index.html
准备工作:
1.创建表导入数据到mysql
CREATE TABLE `tb_hotel` (
`id` bigint(20) NOT NULL COMMENT '酒店id',
`name` varchar(255) CHARACTER SET utf8mb4 COLLATE utf8mb4_general_ci NOT NULL COMMENT '酒店名称',
`address` varchar(255) CHARACTER SET utf8mb4 COLLATE utf8mb4_general_ci NOT NULL COMMENT '酒店地址',
`price` int(10) NOT NULL COMMENT '酒店价格',
`score` int(2) NOT NULL COMMENT '酒店评分',
`brand` varchar(32) CHARACTER SET utf8mb4 COLLATE utf8mb4_general_ci NOT NULL COMMENT '酒店品牌',
`city` varchar(32) CHARACTER SET utf8mb4 COLLATE utf8mb4_general_ci NOT NULL COMMENT '所在城市',
`star_name` varchar(16) CHARACTER SET utf8mb4 COLLATE utf8mb4_general_ci NULL DEFAULT NULL COMMENT '酒店星级,1星到5星,1钻到5钻',
`business` varchar(255) CHARACTER SET utf8mb4 COLLATE utf8mb4_general_ci NULL DEFAULT NULL COMMENT '商圈',
`latitude` varchar(32) CHARACTER SET utf8mb4 COLLATE utf8mb4_general_ci NOT NULL COMMENT '纬度',
`longitude` varchar(32) CHARACTER SET utf8mb4 COLLATE utf8mb4_general_ci NOT NULL COMMENT '经度',
`pic` varchar(255) CHARACTER SET utf8mb4 COLLATE utf8mb4_general_ci NULL DEFAULT NULL COMMENT '酒店图片',
PRIMARY KEY (`id`) USING BTREE
) ENGINE = InnoDB CHARACTER SET = utf8mb4 COLLATE = utf8mb4_general_ci ROW_FORMAT = Compact;
2.准备建立索引库的mapping
PUT /hotel
{
"mappings": {
"properties": {
"id": {
"type": "keyword"
},
"name": {
"type": "text",
"ayalzer": "ik_max_work",
"copy_to": "all"
},
"address": {
"type": "keyword",
"index": false
},
"price": {
"type": "integer"
},
"scope": {
"type": "integer"
},
"brand": {
"type": "keyword",
"copy_to": "all"
},
"city": {
"type": "keyword"
},
"starName": {
"type": "keyword"
},
"business": {
"type": "keyword"
},
"location": {
"type": "gen_point"
},
"pic": {
"type": "keyword",
"index": false
},
"all":{
"type":"text",
"analyzer": "ik_max_word"
}
}
}
}
注意点:
- 不参与搜索可以将字段的index 设置为 false
- copy_to:将多个字段联合在一起作为索引, 可以提升查询效率
- ES中支持两种地理坐标数据类型:
- geo_point: 由维度(latitude) 和 经度(longitude) 确定的一个点. 例如:"32.8752345,120.2981576"
- geo_shape: 有多个geo_point 组成的复杂几何图形,例如一条直线
初始化
1.引入依赖
<dependency>
<groupId>org.elasticsearch.client</groupId>
<artifactId>elasticsearch-rest-high-level-client</artifactId>
<version>7.6.2</version>
</dependency>
2.编写测试类
@SpringBootTest
class HotelDemoApplicationTests {
private RestHighLevelClient client;
@Test
public void testInit(){
System.out.println(client);
}
@BeforeEach
void setUp(){
this.client = new RestHighLevelClient(RestClient.builder(
HttpHost.create("http://192.168.184.152:9200")
));
}
@AfterEach
void tearDown() throws IOException {
this.client.close();
}
}
索引库增删改查
private static final String MAPPING_TEMPLATE = "{\n" +
" \"mappings\": {\n" +
" \"properties\": {\n" +
" \"id\": {\n" +
" \"type\": \"keyword\"\n" +
" },\n" +
" \"name\": {\n" +
" \"type\": \"text\",\n" +
" \"analyzer\": \"ik_max_word\",\n" +
" \"copy_to\": \"all\"\n" +
" },\n" +
" \"address\": {\n" +
" \"type\": \"keyword\",\n" +
" \"index\": false\n" +
" },\n" +
" \"price\": {\n" +
" \"type\": \"integer\"\n" +
" },\n" +
" \"scope\": {\n" +
" \"type\": \"integer\"\n" +
" },\n" +
" \"brand\": {\n" +
" \"type\": \"keyword\",\n" +
" \"copy_to\": \"all\"\n" +
" },\n" +
" \"city\": {\n" +
" \"type\": \"keyword\"\n" +
" },\n" +
" \"starName\": {\n" +
" \"type\": \"keyword\"\n" +
" },\n" +
" \"business\": {\n" +
" \"type\": \"keyword\"\n" +
" },\n" +
" \"location\": {\n" +
" \"type\": \"geo_point\"\n" +
" },\n" +
" \"pic\": {\n" +
" \"type\": \"keyword\",\n" +
" \"index\": false\n" +
" },\n" +
" \"all\":{\n" +
" \"type\":\"text\",\n" +
" \"analyzer\": \"ik_max_word\"\n" +
" }\n" +
" }\n" +
" }\n" +
"}";
@Test
public void testCreataIndex() throws IOException {
//1,获取请求对象
CreateIndexRequest request = new CreateIndexRequest("hotel");
//2.封装数据
request.source(MAPPING_TEMPLATE, XContentType.JSON);
//3.发送请求
client.indices().create(request, RequestOptions.DEFAULT);
}
//删除索引库
@Test
public void deleteIndex() throws IOException {
DeleteIndexRequest request = new DeleteIndexRequest("hotel");
client.indices().delete(request, RequestOptions.DEFAULT);
}
@Test //查询索引库
public void getIndex() throws IOException {
GetIndexRequest getIndexRequest = new GetIndexRequest("hotel");
boolean exists = client.indices().exists(getIndexRequest, RequestOptions.DEFAULT);
System.err.println(exists ? "索引库存在" : "索引库不存在");
}
RestClient 操作文档
封装MYSQL的字段与索引库的映射字段名不一致,需重新封装:
import java.io.IOException;
@SpringBootTest
class HotelDocTests {
@Autowired
private IHotelService iHotelService;
private RestHighLevelClient client;
//测试新增文档
@Test
public void addDoc() throws IOException {
//1.查询mysql数据库数据
Hotel hotel = iHotelService.getById(395434L);
//2.转换为索引库映射的对象
HotelDoc hotelDoc = new HotelDoc(hotel);
//3.创建请求
IndexRequest request = new IndexRequest("hotel").id(hotelDoc.getId().toString());
//4.封装 json 数据 ,source方法 第一个参数为要新增文档的json字符串
request.source(JSON.toJSONString(hotel) , XContentType.JSON);
//4。发送请求
client.index(request, RequestOptions.DEFAULT);
}
@BeforeEach
void setUp(){
this.client = new RestHighLevelClient(RestClient.builder(
HttpHost.create("http://192.168.184.152:9200")
));
}
@AfterEach
void tearDown() throws IOException {
this.client.close();
}
}
结果:
查询文档:
@Test //测试查询文档
public void testGetDoc() throws IOException {
GetRequest request = new GetRequest("hotel", "395434");
GetResponse response = client.get(request, RequestOptions.DEFAULT);
String sourceAsString = response.getSourceAsString();
//序列化为java 对象
HotelDoc hotelDoc = JSON.parseObject(sourceAsString, HotelDoc.class);
System.out.println(hotelDoc);
}
更新及删除文档:
@Test // 更新文档
public void testUpdateDoc() throws IOException {
UpdateRequest request = new UpdateRequest("hotel","395434");
request.doc(
"address","东三环北路东方路2号",
"price","451"
);
client.update(request, RequestOptions.DEFAULT);
}
@Test //删除文档
public void testDeleteDoc() throws IOException {
DeleteRequest deleteRequest = new DeleteRequest("hotel","395434");
client.delete(deleteRequest, RequestOptions.DEFAULT);
}
批量插入:
@Test
public void testBatchInsertDoc() throws IOException {
BulkRequest request = new BulkRequest();
List<Hotel> list = iHotelService.list();
for (Hotel hotel : list) {
HotelDoc hotelDoc = new HotelDoc(hotel);
request.add(new IndexRequest("hotel")
.id(hotelDoc.getId().toString())
.source(JSON.toJSONString(hotelDoc),XContentType.JSON));
}
client.bulk(request, RequestOptions.DEFAULT);
}
批量查询:
GET /hotel/_search
DSL 查询语法
- 查询所有: 查询出所有数据,一般测试使用。例如:match_all
- 全文检索(full text) 查询: 利用分词器对用户输入内容分词,然后去倒排索引库中匹配。例如:
- match_query
- multi_match_query
- 精确查询: 根据精确词条值查找数据,一般是查找keyword、数值、日期、boolean等类型字段。例如:
- ids
- range
- term
- 地理(geo) 查询: 根据经纬度查询。例如:
- geo_distance
- geo_bounding_box
- 复合(compound)查询: 符合查询可以将上述各种查询条件组合起来,合并查询条件。例如:
- bool
- function_score
全文检索
match_query:
GET /hotel/_search
{
"query": {
"match": {
"all": "外滩七天"
}
}
}
multi_match_query:
GET /hotel/_search
{
"query": {
"multi_match": {
"query": "外滩七天",
"fields": ["brand","name","business"]
}
}
}
精确查询
# 精确term 查询
GET /hotel/_search
{
"query": {
"term": {
"city": {
"value": "深圳"
}
}
}
}
# 精确range 查询
GET /hotel/_search
{
"query": {
"range": {
"price": {
"gte": 1000,
"lte": 2000
}
}
}
}
相关性算分
复合查询:可以将其他简单查询组合起来,实现更复杂的搜索逻辑,例如:
- function score:算分函数查询,可以控制文档相关性算分,控制文档排名,例如百度竞价
当我们利用match 查询时, 文档结果会根据与搜索词条的关联度打分(_score),返回结果时按照分值降序排列。
例如,我们搜索"虹桥如家",结果如下:
FunctionScoreQuery
语法介绍:
案例需求:
- 将以下最后一个酒店: 如家酒店·neo(上海外滩城隍庙小南门地铁站店) 搜索结果提升为第一名:
GET /hotel/_search
{
"query": {
"function_score": {
"query": {
"match": {
"all": "外滩"
}
},
"functions": [
{
"filter": {"term": { # 过滤,只要如家酒店
"brand": "如家"
}},
"weight": 10
}
],
"boost_mode": "multiply"
}
}
}
最终的算分结果为: 3.8000445 * 10 = 38.000446, 搜索结果也变为第一
BooleanQuery
布尔查询是一个或多个查询子句的组合。子查询的组合方式有:
- must: 必须匹配每个子查询,参与算分,参与类似"与"
- should: 选择性匹配子查询0,参与算分类似"或"
- must_not: 必须不匹配,不参与算分,类似"非"
- filter: 必须匹配,不参与算分
案例1:
搜索名字包含如家,价格不高于400,在坐标31.21,121.5 周围10km 范围内的酒店
GET /hotel/_search
{
"query": {
"bool": {
"must": [
{"match": {
"name": "如家"
}}
],
"must_not": [
{"range": {
"price": {
"gt": 400
}
}}
],
"filter": [
{
"geo_distance": {
"distance": "10km",
"location": {
"lat": 31.21,
"lon": 121.5
}
}
}
]
}
}
}
案例2:
品牌为如家或者1天,价格不高于400,在坐标31.21,121.5 周围10km 范围内的酒店
GET /hotel/_search
{
"query": {
"bool": {
"should": [
{
"term": {
"brand": {
"value": "如家"
}
}
},
{
"term": {
"brand": {
"value": "7天"
}
}
}
],
"must_not": [
{
"range": {
"price": {
"gt": 400
}
}
}
],
"filter": [
{
"geo_distance": {
"distance": "10km",
"location": {
"lat": 31.21,
"lon": 121.5
}
}
}
]
}
}
}
排序
es 支持对搜索结果排序,默认是根据相关度算分(_socre) 来排序。可以排序字段类型有:keyword类型、数值类型、地理坐标类型、日期类型等。
案例1:对酒店数据按照用户评价降序排序,评价相同的按照价格升序排序
GET /hotel/_search
{
"query": {
"match_all": {}
}
, "sort": [
{
"score": {
"order": "desc"
},
"price": {
"order": "asc"
}
}
]
}
案例2:找到经纬度:121.612282,31.034661 周围的酒店,距离升序排序
GET /hotel/_search
{
"query": {
"match_all": {}
},
"sort": [
{
"_geo_distance": {
"location": {
"lat": 31.034661,
"lon": 121.612282
},
"order": "asc",
"unit":"km"
}
}
]
}
分页
ES 默认情况下只返回 top 10 的数据。而如果要查询更多数据就需要修改分页参数了
ES 中通过修改 from、size 参数来控制要返回的分页结果
# 分页查询
# 查询第二页数据,每页10条,from = (page-1)* size
GET /hotel/_search
{
"query": {
"match_all": {}
},
"sort": [
{
"price": "desc"
}
],
"from": 10,
"size": 10
}
查询第一百页数据:
深度分页问题:
ES 是分布式的,所以会面临深度分页的问题。例如按price排序后,或许第1000页的数据:
1.首先在每个数据分片都排序查询前1000条文档
2.然后将所有节点的结果聚合,在内存中重新排序选出前1000条文档
3.最后从这1000条中,选取从990开始的10条文档
如果搜索页数过深,或者结果集(from + size) 越大,对内存和CPU的消耗也越高。
因此ES设定结果集查询的上限是10000
高亮
高亮: 就是在搜搜结果中把搜索关键字突出显示。
原理:
- 将搜索结果中的关键字用标签标记出来
- 在页面中给标签添加css样式
语法:
注意:
1.默认搜索的字段是与高亮的字段报错一致,如果不一致不会进行高亮
例如:
GET /hotel/_search
{
"query": {
"match": {
"brand":"如家"
}
},
"highlight": {
"fields": {
"name": {}
}
}
}
结果并未高亮:
name添加 "require_field_match": "false" 即可
require_field_match": "false"
GET /hotel/_search
{
"query": {
"match": {
"brand":"如家"
}
},
"highlight": {
"fields": {
"name": {"require_field_match": "false"}
}
}
}
本文来自博客园,作者:chuangzhou,转载请注明原文链接:https://www.cnblogs.com/czzz/p/17700976.html