ElasticSearch学习
ElasticSearch介绍
引言
- 在海量数据中执行搜索功能, Mysql对于大数据的搜索,效率太低
- 如果关键字不准确, 一样可以搜索到想要的数据
es介绍
es是使用java语言并且基于Lucene编写的搜索引擎框架, 提供了分布式的全文检索功能, 可以近乎实时的存储, 检索数据, 提供了统一的基于RESTful风格的web接口, 官方客户端对多种语言提供了相应的API
Lucene: Lucene本身就是一个搜索引擎的底层, 本质是一个jar包,里面包含了封装好的各种建立倒排索引,以及进行搜索的代码,包括各种算法。
全文检索是指计算机索引程序通过扫描文章中的每一个词,对每一个词建立一个索引,指明该词在文章中出现的次数和位置,当用户查询时,根据关键字去分词库中进行检索, 找到匹配内容
结构化检索:我想搜索商品分类为日化用品的商品都有哪些,select * from products where category_id='日化用品'
es和solr
-
Solr在查询死数据时, 速度相对于es会更快. 但是如果数据是实时改变的, Solr的查询速度会降低很多, ES的查询效率基本没有变化
-
Solr搭建基于需要依赖Zookeeper来帮助管理. ES本身就支持集群的搭建, 不需要第三方介入
-
Solr针对国内的文档并不多, 在ES出现后, 火爆程度直线上升, 文档非常健全
-
ES对云计算和大数据支持特别好
倒排索引
将存放的数据, 按照一定的方式进行分词, 并且将分词的内容存放到一个单独的分词库中
当用户去查询时, 会将用户的查询关键词进行分词
然后去分词库中匹配内容, 最终得到数据的id标识
根据id标识去存放数据的位置拉取到指定的数据
ElasticSearch安装
安装ES&Kibana
安装ES
version: "3.1"
services:
elasticsearch:
image: daocloud.io/library/elasticsearch:6.5.4
restart: always
container_name: elasticsearch
environment: # 分配的内存,必须指定,因为es默认指定2g,直接内存溢出了,必须改
- "ES_JAVA_OPTS=-Xms128m -Xmx256m"
- "discovery.type=single-node"
- "COMPOSE_PROJECT_NAME=elasticsearch-server"
ports:
- 9200:9200
kibana:
image: daocloud.io/library/kibana:6.5.4
restart: always
container_name: kibana
ports:
- 5601:5601
environment:
- elasticsearch_url:http://115.159.222.145:9200
depends_on:
- elasticsearch
es文件目录
bin 启动文件
config 配置文件
-log4j2 日志配置
-jvm.options java虚拟机配置, 配置运行所需内存, 内存不够时配置小一点
-elasticsearch.yml elasticsearch配置文件, 端口9200
lib 相关jar包
logs 日志
module 功能模块
plugins 插件
elasticsearch启动不起来
elasticsearch exited with code 78
解决:
切换到root用户
执行命令:
sysctl -w vm.max_map_count=262144
查看结果:
sysctl -a|grep vm.max_map_count
显示:
vm.max_map_count = 262144
上述方法修改之后,如果重启虚拟机将失效,所以:
解决办法:
在 /etc/sysctl.conf文件最后添加一行
vm.max_map_count=262144
如果还有问题,注意服务器的内存状态, 可能是内存不够, 需要清理出一些内存.
启动成功后测试
浏览器访问es
http://host:9200
安装Kibana
kibana是一个针对ElasticSearch的开源分析及可视化平台, 用来搜索, 查看交互存储在es索引中的数据.可以通过各种图标进行高级数据分析及展示.
操作简单方便, 数据展示直观
在访问kibana
http://host:5601
安装可视化ES插件head
-
启动
npm install npm run start
-
跨域问题解决
# 修改es配置文件elasticsearch.yml http.cors.enabled: true http.cors.allow-origin: "*"
-
重启es服务器, 再次连接
安装ik分词器
ik分词器下载地址
查看es容器
docker ps | grep elastic
进入es容器内部, 执行bin/目录下elasticsearch-plugin安装ik分词器
./elasticsearch-plugin install https://github.com/medcl/elasticsearch-analysis-ik/releases/download/v6.5.4/elasticsearch-analysis-ik-6.5.4.zip
如果github网络不好,可以找其他版本使用国内路径
使用接口测试分词效果
注意:
需要重启es加载安装的分词器
docker restart es容器名/id
等待重启后测试分词
POST _analyze
{
"analyzer": "ik_max_word",
"text": "尚硅谷教育"
}
需要指定分词器类型 analyzer
返回值
{
"tokens" : [
{
"token" : "尚",
"start_offset" : 0,
"end_offset" : 1,
"type" : "CN_CHAR",
"position" : 0
},
{
"token" : "硅谷",
"start_offset" : 1,
"end_offset" : 3,
"type" : "CN_WORD",
"position" : 1
},
{
"token" : "教育",
"start_offset" : 3,
"end_offset" : 5,
"type" : "CN_WORD",
"position" : 2
}
]
}
ElasticSearch核心
ES组件
近实时
分为两个意思
- 从写入数据到数据可以被搜索到有一个小延迟(大概1秒);
- 基于es执行搜索和分析可以达到秒级。
Cluster(集群)
集群包含多个节点,每个节点属于哪个集群是通过一个配置(集群名称,默认是elasticsearch)来决定的,对于中小型应用来说,刚开始一个集群就一个节点很正常
node(节点)
集群中的一个节点,节点也有一个名称(默认是随机分配的),节点名称很重要(在执行运维管理操作的时候),默认节点会去加入一个名称为“elasticsearch”的集群,如果直接启动一堆节点,那么它们会自动组成一个elasticsearch集群,当然一个节点也可以组成一个elasticsearch集群。
ElasticSearch存储结构
Index(索引-数据库)
索引包含一堆有相似结构的文档数据,ES服务中,可以建立多个索引
如可以有一个客户索引,商品分类索引,订单索引,索引有一个名称。
- 每一个索引默认分为5片存储
- 每个分片会存在至少一个备份
- 备份分片默认不会帮助检索,当检索压力特别大时, 备份才会帮助检索
- 备份分片需要放在不同的服务器中
Type(类型-表)
每个索引里都可以有一个或多个type,type是index中的一个逻辑数据分类,一个type下的document,都有相同的field
注意:
- ES5.x版本,一个Index下可以创建多个Type
- ES5.x版本,一个Index下可以创建一个Type
- ES5.x版本,一个Index没有Type
Document(文档-行)
文档是es中的最小数据单元,一个类型下可以有多个document, 一个document可以是一条或多条客户数据
Field(字段-列)
Field是Elasticsearch的最小单位。一个document里面有多个field,每个field就是一个数据字段。
操作ES的RESTful语法
GET请求:
POST请求:
- http://ip:port/index/type/_search: 查询文档, 可以在请求体中添加json字符串代表查询条件
- http://ip:port/index/type/doc_id/_update: 修改文档, 可以在请求体中添加json字符串代表修改的具体内容
PUT请求:
- http://ip:port/index: 创建一个索引, 需要在请求体中指定索引的具体信息
- http://ip:port/index/type/_mapping: 代表创建索引时, 指定索引文档存储的属性信息
DELETE请求:
索引的操作
创建一个索引
PUT /person
{
"settings": {
"number_of_shards": 5,
"number_of_replicas": 1
}
}
查看索引信息
- kibana图形界面查询
-
接口查询
# 查看索引信息 GET /person
返回
{ "person" : { "aliases" : { }, "mappings" : { }, "settings" : { "index" : { "creation_date" : "1614596113957", "number_of_shards" : "5", "number_of_replicas" : "1", "uuid" : "bC8PJsegQ16t5EAtNWh_vg", "version" : { "created" : "6050499" }, "provided_name" : "person" } } } }
删除索引
-
图形管理界面
-
接口删除
# 删除索引 DELETE /person
返回
{ "acknowledged" : true }
ES中Field类型
String:
- text: 用于全文检索, 将当前Field进行分词
- keyworld: 当前Field不会被分词
数值类型:
- long
- integer
- byte
- double
- float
时间类型:
- date类型: 针对时间类型指定具体的格式
布尔类型:
- boolean类型, 表达true和false
二进制类型:
- binary类型暂时支持Base64 encoding string
范围类型:
- long_range: 赋值是,只需存储一个范围即可, 指定gt, lt, gte, lte
- float_range:
- integer_range:
- date_range:
- ip_range:
经纬度类型:
- geo_point: 用来存储经纬度的
ip类型:
- ip: 可以存储ipv4或者ipv6
其他
创建索引并指定数据结构
# 创建索引, 指定数据类型
PUT /book
{
"settings": {
"number_of_shards": 5,
"number_of_replicas": 1
},
"mappings": {
"novel": {
"properties": {
"name": {
"type": "text",
"analyzer": "ik_max_word",
"index": true,
"store": false
},
"author": {
"type": "keyword"
},
"count": {
"type": "long"
},
"onSale": {
"type": "date",
"format": "yyyy-MM-dd HH:mm:ss||yyyy-MM-dd|epoch_millis"
},
"desc": {
"type": "text",
"analyzer": "ik_max_word"
}
}
}
}
}
解释:
- number_of_shards: 分片
- number_of_replicas: 备份
- mappings: 指定数据结构
- novel: 指定的类型名
- properties: 文档中字段的定义
- name: 指定一个字段名为name
- type: 指定该字段的类型
- analyzer: 指定使用的分词器
- index: true指定当前的field可以被作为查询条件
- store: 是否需要额外存储
- format: 指定时间存储的格式
文档的操作
文档在ES服务器中唯一的标识, _index, _type, _id三个内容为组合, 锁定一个文档, 操作时添加还是修改
新建文档
自动生成id
# 添加文档
POST /book/novel
{
"name": "斗罗",
"author": "西红柿",
"count": 10000,
"onSale": "2000-01-01",
"desc": "斗罗大陆修仙小说"
}
手动指定id
# 手动指定id
PUT /book/novel/1
{
"name": "红楼梦",
"author": "曹雪芹",
"count": 10000,
"onSale": "1758-01-01",
"desc": "红楼梦小说"
}
修改文档
覆盖式修改
# 手动指定id
PUT /book/novel/1
{
"name": "红楼梦",
"author": "曹雪芹",
"count": 20000,
"onSale": "1758-01-01",
"desc": "红楼梦小说"
}
doc修改方式
# 修改文档,基于doc方式
POST /book/novel/1/_update
{
"doc": {
"count": 123455
# 指定修改的field和对应的值
}
}
删除文档
# 根据id删除文档
DELETE /book/novel/Ile37XcBdlEqQ4RmWKpJ
Java操作ElasticSearch
java连接ES
-
创建maven工程
-
导入依赖
<dependencies> <!-- 1.elasticsearch --> <dependency> <groupId>org.elasticsearch</groupId> <artifactId>elasticsearch</artifactId> <version>6.5.4</version> </dependency> <!-- 2.elasticsearch API --> <dependency> <groupId>org.elasticsearch.client</groupId> <artifactId>elasticsearch-rest-high-level-client</artifactId> <version>6.5.4</version> </dependency> <!-- 3. junit--> <dependency> <groupId>junit</groupId> <artifactId>junit</artifactId> <version>4.12</version> </dependency> <!-- 4. lombok--> <dependency> <groupId>org.projectlombok</groupId> <artifactId>lombok</artifactId> <version>1.16.22</version> </dependency> <!-- 5. jackson --> <dependency> <groupId>com.fasterxml.jackson.core</groupId> <artifactId>jackson-databind</artifactId> <version>2.10.2</version> </dependency> </dependencies>
-
创建es连接
package com.example.utils; import org.apache.http.HttpHost; import org.elasticsearch.client.RestClient; import org.elasticsearch.client.RestClientBuilder; import org.elasticsearch.client.RestHighLevelClient; /** * @author : ryxiong728 * @email : ryxiong728@126.com * @date : 3/1/21 * @Description: */ public class ESClient { public static RestHighLevelClient getClient() { // 1.创建HttpHost对象 HttpHost httpHost = new HttpHost("115.159.222.145", 9200); // 2. 创建RestClientBuilder RestClientBuilder clientBuilder = RestClient.builder(httpHost); // 3. 创建RestHighLevelClient RestHighLevelClient client = new RestHighLevelClient(clientBuilder); // 返回client对象 return client; } }
java操作索引
创建索引
package com.example.test;
import com.example.utils.ESClient;
import org.elasticsearch.action.admin.indices.create.CreateIndexRequest;
import org.elasticsearch.action.admin.indices.create.CreateIndexResponse;
import org.elasticsearch.client.RequestOptions;
import org.elasticsearch.client.RestHighLevelClient;
import org.elasticsearch.common.settings.Settings;
import org.elasticsearch.common.xcontent.XContentBuilder;
import org.elasticsearch.common.xcontent.json.JsonXContent;
import org.junit.Test;
import java.io.IOException;
/**
* @author : ryxiong728
* @email : ryxiong728@126.com
* @date : 3/1/21
* @Description:
*/
public class Demo02 {
RestHighLevelClient client = ESClient.getClient();
String index = "person";
String type = "info";
/*
"mappings": {
"novel": {
"properties": {
"name": {
"type": "text",
"analyzer": "ik_max_word",
"index": true,
"store": false
},
"author": {
"type": "keyword"
},
"count": {
"type": "long"
},
"onSale": {
"type": "date",
"format": "yyyy-MM-dd HH:mm:ss||yyyy-MM-dd||epoch_millis"
},
"desc": {
"type": "text",
"analyzer": "ik_max_word"
}
}
}
}
*/
@Test
public void createIndex() throws IOException {
// 1. 准备索引的settings
Settings.Builder settings = Settings.builder();
settings.put("number_of_shards", 3);
settings.put("number_of_replicas", 1);
// 2. 准备关于索引的结构mappings
XContentBuilder mappings = JsonXContent.contentBuilder()
.startObject()
.startObject("properties")
.startObject("name")
.field("type", "text")
.endObject()
.startObject("age")
.field("type", "integer")
.endObject()
.startObject("birthday")
.field("type", "date")
.field("format", "yyyy-MM-dd")
.endObject()
.endObject()
.endObject();
// 3. 将settings和mappings封装到Request对象中
CreateIndexRequest request = new CreateIndexRequest(index);
request.settings(settings);
request.mapping(type, mappings);
// 4. 通过client连接ES并创建索引
CreateIndexResponse response = client.indices().create(request, RequestOptions.DEFAULT);
// 5. 输出
System.out.println(response.toString());
}
}
检查索引是否存在
@Test
public void isExists() throws IOException {
// 1. 准备request对象
GetIndexRequest request = new GetIndexRequest();
request.indices(index);
// 2. 通过client去操作
boolean exists = client.indices().exists(request, RequestOptions.DEFAULT);
// 3. 打印
System.out.println(exists);
}
删除索引
@Test
public void deleteIndex() throws IOException {
// 1. 准备request对象
DeleteIndexRequest delete = new DeleteIndexRequest();
delete.indices(index);
// 2. 通过client操作
AcknowledgedResponse resp = client.indices().delete(delete, RequestOptions.DEFAULT);
// 3. 获取返回结果
System.out.println(resp.isAcknowledged());
}
java操作文档
添加文档
person实例
@Data
@NoArgsConstructor
@AllArgsConstructor
public class Person {
@JsonIgnore // 注解: 序列化是忽略id字段
private Integer id;
private String name;
private Integer age;
@JsonFormat(pattern = "yyyy-MM-dd") // 序列化是将date格式化为 yyyy-MM-dd类型
private Date birthday;
}
创建案例
package com.example.test;
import com.example.entity.Person;
import com.example.utils.ESClient;
import com.fasterxml.jackson.databind.ObjectMapper;
import org.elasticsearch.action.index.IndexRequest;
import org.elasticsearch.action.index.IndexResponse;
import org.elasticsearch.client.RequestOptions;
import org.elasticsearch.client.RestHighLevelClient;
import org.elasticsearch.common.xcontent.XContentType;
import org.junit.Test;
import java.io.IOException;
import java.util.Date;
/**
* @author : ryxiong728
* @email : ryxiong728@126.com
* @date : 3/1/21
* @Description:
*/
public class Demo03 {
ObjectMapper mapper = new ObjectMapper();
RestHighLevelClient client = ESClient.getClient();
String index = "person";
String type = "info";
@Test
public void createDocument() throws IOException {
// 1. 准备一个json数据
Person person = new Person(1, "张三", 23, new Date());
String json = mapper.writeValueAsString(person);
System.out.println(json);
// 2. 准备一个request对象
IndexRequest indexRequest = new IndexRequest(index, type, person.getId().toString());
indexRequest.source(json, XContentType.JSON);
// 3. 通过client对象添加文档
IndexResponse resp = client.index(indexRequest, RequestOptions.DEFAULT);
// 4. 打印结果
System.out.println(resp.getResult().toString());
}
}
修改文档
@Test
public void updateDocument() throws IOException {
// 1. 创建一个map, 指定修改的内容
Map<String, Object> doc = new HashMap<String, Object>();
doc.put("name", "李四");
String docId = "1";
// 2. 创建request对象, 封装数据
UpdateRequest updateRequest = new UpdateRequest(index, type, docId);
updateRequest.doc(doc);
// 3. 通过client对象执行
UpdateResponse response = client.update(updateRequest, RequestOptions.DEFAULT);
// 4. 输出返回结果
System.out.println(response.getResult().toString());
}
删除文档
@Test
public void deleteDocument() throws IOException {
// 1. 封装request对象
DeleteRequest deleteRequest = new DeleteRequest(index, type, "1");
// 2. 通过client执行
DeleteResponse response = client.delete(deleteRequest, RequestOptions.DEFAULT);
// 3. 输出结果
System.out.println(response.getResult().toString());
}
java批量操作文档
批量添加文档
@Test
public void bulkCreateDocument() throws IOException {
// 1. 准备多个json数据
Person p1 = new Person(1, "张三", 23, new Date());
Person p2 = new Person(2, "李四", 24, new Date());
Person p3 = new Person(3, "王五", 25, new Date());
String json1 = mapper.writeValueAsString(p1);
String json2 = mapper.writeValueAsString(p2);
String json3 = mapper.writeValueAsString(p3);
// 2. 创建Request, 将准备好的数据封装
BulkRequest bulkRequest = new BulkRequest();
bulkRequest.add(new IndexRequest(index, type, p1.getId().toString()).source(json1, XContentType.JSON));
bulkRequest.add(new IndexRequest(index, type, p2.getId().toString()).source(json2, XContentType.JSON));
bulkRequest.add(new IndexRequest(index, type, p3.getId().toString()).source(json3, XContentType.JSON));
// 3. client执行
BulkResponse resp = client.bulk(bulkRequest, RequestOptions.DEFAULT);
// 4. 打印
System.out.println(resp.toString());
}
批量删除
@Test
public void bulkDeleteDocument() throws IOException {
// 1. 封装Request对象
BulkRequest bulkRequest = new BulkRequest();
bulkRequest.add(new DeleteRequest(index, type, "1"));
bulkRequest.add(new DeleteRequest(index, type, "2"));
bulkRequest.add(new DeleteRequest(index, type, "3"));
// 2. client执行
BulkResponse response = client.bulk(bulkRequest, RequestOptions.DEFAULT);
// 3. 输出
System.out.println(response.toString());
}
ElasticSearch练习案例
索引: sms-logs-index
类型: sms-logs-type
字段名称 | 备注 |
---|---|
createDate | 创建时间 |
senDate | 发送时间 |
longCode | 发送的长号码 如"10698886622" |
mobile | 电话, 如"13800000000" |
corpName | 发送公司名, 需要分词检索 |
smsContent | 发送短信内容, 需要分词检索 |
state | 短信发送状态, 0成功, 1失败 |
operateId | 运营商编号1-移动,2-联通,3-电信 |
province | 省份 |
ipAddr | 下发服务器IP地址 |
replyTotal | 短信状态报告返回时长(s) |
fee | 扣费(分) |
SmsLogs
package com.example.entity;
import com.fasterxml.jackson.annotation.JsonFormat;
import com.fasterxml.jackson.annotation.JsonIgnore;
import lombok.AllArgsConstructor;
import lombok.Data;
import lombok.NoArgsConstructor;
import java.util.ArrayList;
import java.util.Date;
/**
* @author : ryxiong728
* @email : ryxiong728@126.com
* @date : 3/1/21
* @Description:
*/
@Data
@NoArgsConstructor
@AllArgsConstructor
public class SmsLogs {
@JsonIgnore
private Integer id;
@JsonFormat(pattern = "yyyy-MM-dd")
private Date createDate;
@JsonFormat(pattern = "yyyy-MM-dd")
private Date sendDate;
private String longCode;
private String mobile;
private String corpName;
private String smsContent;
private Integer state;
private Integer operatorId;
private String province;
private String ipAddr;
private Integer replyTotal;
private String fee;
@JsonIgnore
public static String doc = "乌山镇,玉兰大陆第一山脉‘魔兽山脉’西方的芬莱王国中的一个普通小镇。朝阳初升,乌山镇这个小镇上依旧有着清晨的一丝清冷之气,只是小镇中的居民几乎都已经出来开始工作了,即使是六七岁的稚童,也差不多也都起床开始了传统性的晨练。乌山镇东边的空地上,早晨温热的阳光透过空地旁边的大树,在空地上留下了斑驳的光点。只见一大群孩子,目视过去估摸着差不多有一两百个。这群孩子分成了三个团队,每个团队都是排成几排,孩子们一个个都静静地站在空地上,面色严肃。纠结了好久买多大的屏,全凭感觉和运气,最后确定了65寸的,非常合适,大小刚刚好,我家客厅面积是34平方米,差不多的面积尽管拍就好了,其实在大一些可能更棒吧!双十一下手,电视越大越好,实惠好用,电视功能多了也用不着,之前的什么画中画,三D功能,有几个用的上的,电视这东西简单实用就行。我一共买了4台,一台75寸,一台70寸,两台65寸。松下冰箱一台,华帝油烟机燃气灶两套,马桶2个,西门子开关插座55个,丝涟床垫一个,七七八八加起来一共4万5左右,大家电我只信京东,服务杠杠滴";
}
初始化数据
package com.example.test;
import com.example.entity.SmsLogs;
import com.example.utils.ESClient;
import com.fasterxml.jackson.databind.ObjectMapper;
import org.elasticsearch.action.admin.indices.create.CreateIndexRequest;
import org.elasticsearch.action.admin.indices.create.CreateIndexResponse;
import org.elasticsearch.action.bulk.BulkRequest;
import org.elasticsearch.action.bulk.BulkResponse;
import org.elasticsearch.action.index.IndexRequest;
import org.elasticsearch.client.RequestOptions;
import org.elasticsearch.client.RestHighLevelClient;
import org.elasticsearch.common.settings.Settings;
import org.elasticsearch.common.xcontent.XContentBuilder;
import org.elasticsearch.common.xcontent.XContentType;
import org.elasticsearch.common.xcontent.json.JsonXContent;
import org.junit.Test;
import java.util.ArrayList;
import java.util.Date;
import java.util.List;
import java.util.Random;
/**
* @author : ryxiong728
* @email : ryxiong728@126.com
* @date : 3/1/21
* @Description:
*/
public class InitDate {
ObjectMapper mapper = new ObjectMapper();
RestHighLevelClient client = ESClient.getClient();
String index = "sms-logs-index";
String type="sms-logs-type";
@Test
public void createIndex() throws Exception{
// 1.准备关于索引的setting
Settings.Builder settings = Settings.builder()
.put("number_of_shards", 5)
.put("number_of_replicas", 1);
// 2.准备关于索引的mapping
XContentBuilder mappings = JsonXContent.contentBuilder()
.startObject()
.startObject("properties")
.startObject("corpName")
.field("type", "keyword")
.endObject()
.startObject("createDate")
.field("type", "date")
.field("format", "yyyy-MM-dd")
.endObject()
.startObject("fee")
.field("type", "long")
.endObject()
.startObject("ipAddr")
.field("type", "ip")
.endObject()
.startObject("longCode")
.field("type", "keyword")
.endObject()
.startObject("mobile")
.field("type", "keyword")
.endObject()
.startObject("operatorId")
.field("type", "integer")
.endObject()
.startObject("province")
.field("type", "keyword")
.endObject()
.startObject("replyTotal")
.field("type", "integer")
.endObject()
.startObject("sendDate")
.field("type", "date")
.field("format", "yyyy-MM-dd")
.endObject()
.startObject("smsContent")
.field("type", "text")
.field("analyzer", "ik_max_word")
.endObject()
.startObject("state")
.field("type", "integer")
.endObject()
.endObject()
.endObject();
// 3.将settings和mappings 封装到到一个Request对象中
CreateIndexRequest request = new CreateIndexRequest(index)
.settings(settings)
.mapping(type, mappings);
// 4.使用client 去连接ES
CreateIndexResponse response = client.indices().create(request, RequestOptions.DEFAULT);
System.out.println("response:"+response.toString());
}
@Test
public void bulkCreateDoc() throws Exception{
// 1.准备多个json 对象
String longCode = "1008687";
String mobile ="138340658";
List<String> companies = new ArrayList<String>();
companies.add("腾讯课堂");
companies.add("阿里旺旺");
companies.add("海尔电器");
companies.add("海尔智家公司");
companies.add("格力汽车");
companies.add("苏宁易购");
companies.add("盒马鲜生");
companies.add("途虎养车");
List<String> provinces = new ArrayList<String>();
provinces.add("北京");
provinces.add("重庆");
provinces.add("上海");
provinces.add("晋城");
provinces.add("深圳");
provinces.add("武汉");
Random random = new Random();
BulkRequest bulkRequest = new BulkRequest();
for (int i = 1; i <20 ; i++) {
Thread.sleep(1000);
SmsLogs s1 = new SmsLogs();
s1.setId(i);
s1.setCreateDate(new Date((int) (Math.random() * (854526980000L + 1 - 852526980000L)) + 852526980000L));
s1.setSendDate(new Date((int) (Math.random() * (854526980000L + 1 - 852526980000L)) + 852526980000L));
s1.setLongCode(longCode+i);
s1.setMobile(mobile+2*i);
s1.setCorpName(companies.get(random.nextInt(companies.size())));
s1.setSmsContent(SmsLogs.doc.substring((i-1)*20,i*20));
s1.setState(i%2);
s1.setOperatorId(i%3);
s1.setProvince(provinces.get(random.nextInt(provinces.size())));
s1.setIpAddr("127.0.0."+i);
s1.setReplyTotal(i*3);
s1.setFee(i*6+"");
String json1 = mapper.writeValueAsString(s1);
bulkRequest.add(new IndexRequest(index,type,s1.getId().toString()).source(json1, XContentType.JSON));
System.out.println("数据"+i+s1.toString());
}
// 3.client 执行
BulkResponse responses = client.bulk(bulkRequest, RequestOptions.DEFAULT);
// 4.输出结果
System.out.println(responses.getItems().toString());
}
}
ElasticSearch查询
Term&terms查询
term查询
term的查询代表完全匹配, 搜索之前不会对你搜索的关键词进行分词,直接去文档分词库中匹配内容
# term查询
POST /sms-logs-index/sms-logs-type/_search
{
"from": 0, # limit起始
"size": 5, # limit查询条数
"query": {
"term": { # 查询类型, term全匹配
"province": {
"value": "北京"
}
}
}
}
返回
{
"took" : 31,
"timed_out" : false,
"_shards" : {
"total" : 5,
"successful" : 5,
"skipped" : 0,
"failed" : 0
},
"hits" : {
"total" : 4,
"max_score" : 1.3862944,
"hits" : [
{
"_index" : "sms-logs-index",
"_type" : "sms-logs-type",
"_id" : "12",
"_score" : 1.3862944,
"_source" : {
"createDate" : "1997-01-20",
"sendDate" : "1997-01-19",
"longCode" : "100868712",
"mobile" : "13834065824",
"corpName" : "海尔智家公司",
"smsContent" : "了好久买多大的屏,全凭感觉和运气,最后确",
"state" : 0,
"operatorId" : 0,
"province" : "北京",
"ipAddr" : "127.0.0.12",
"replyTotal" : 36,
"fee" : "72"
}
},
...
}
]
}
}
java代码实现
package com.example.test;
import com.example.utils.ESClient;
import com.fasterxml.jackson.databind.ObjectMapper;
import org.elasticsearch.action.search.SearchRequest;
import org.elasticsearch.action.search.SearchResponse;
import org.elasticsearch.client.RequestOptions;
import org.elasticsearch.client.RestHighLevelClient;
import org.elasticsearch.index.query.QueryBuilders;
import org.elasticsearch.search.SearchHit;
import org.elasticsearch.search.builder.SearchSourceBuilder;
import org.junit.Test;
import java.io.IOException;
import java.util.Map;
/**
* @author : ryxiong728
* @email : ryxiong728@126.com
* @date : 3/2/21
* @Description:
*/
public class Demo04Query {
ObjectMapper mapper = new ObjectMapper();
RestHighLevelClient client = ESClient.getClient();
String index = "sms-logs-index";
String type = "sms-logs-type";
@Test
public void termQuery() throws IOException {
// 1. 创建Request对象
SearchRequest request = new SearchRequest(index);
request.types(type);
// 2. 指定查询条件
SearchSourceBuilder builder = new SearchSourceBuilder();
builder.from(0);
builder.size(5);
builder.query(QueryBuilders.termQuery("province", "北京"));
request.source(builder);
// 3. 执行查询
SearchResponse response = client.search(request, RequestOptions.DEFAULT);
// 4. 获取_source中的数据, 并展示
for (SearchHit hit : response.getHits().getHits()) {
Map<String, Object> result = hit.getSourceAsMap();
System.out.println(result);
}
}
}
terms查询
与term查询机制一样, 不会对查询关键字分词, 直接匹配
不同点:
terms针对一个 字段包含多个值的时候使用
如:
- term: where province="北京"
- terms: where province="北京" or province="?"
# terms查询
POST /sms-logs-index/sms-logs-type/_search
{
"query": {
"terms": {
"province": [
"北京",
"武汉"
]
}
}
}
java代码实现
@Test
public void termsQuery() throws IOException {
// 1. 创建Request对象
SearchRequest request = new SearchRequest(index);
request.types(type);
// 2. 指定查询条件
SearchSourceBuilder builder = new SearchSourceBuilder();
builder.query(QueryBuilders.termsQuery("province", "北京", "武汉"));
request.source(builder);
// 3. 执行查询
SearchResponse response = client.search(request, RequestOptions.DEFAULT);
// 4. 获取_source中的数据, 并展示
for (SearchHit hit : response.getHits().getHits()) {
Map<String, Object> result = hit.getSourceAsMap();
System.out.println(result);
}
}
match查询
match查询属于高层查询, 她会根据你查询的字段类型不一样, 采用不同的查询方式
- 查询的是日期或者数值的话, 他会将你基于的字符串查询内容转换为日期或数值对待
- 如果查询的内容是一个不能被分词的内容(keyword), match查询不会对你指定的查询关键字进行分词
- 如果查询内容是一个可以被分词的内容(text), match会将指定的查询内容根据一定方式去分词, 去分词库匹配指定的内容
match底层实际是多个term查询, 将查到的结果封装在一起
match_all查询
查询全部结果
# match_all查询
POST /sms-logs-index/sms-logs-type/_search
{
"query": {
"match_all": {}
}
}
java代码实现
package com.example.test;
import com.example.utils.ESClient;
import com.fasterxml.jackson.databind.ObjectMapper;
import org.elasticsearch.action.search.SearchRequest;
import org.elasticsearch.action.search.SearchResponse;
import org.elasticsearch.client.RequestOptions;
import org.elasticsearch.client.RestHighLevelClient;
import org.elasticsearch.index.query.QueryBuilders;
import org.elasticsearch.search.SearchHit;
import org.elasticsearch.search.builder.SearchSourceBuilder;
import org.junit.Test;
import java.io.IOException;
import java.util.Map;
/**
* @author : ryxiong728
* @email : ryxiong728@126.com
* @date : 3/2/21
* @Description:
*/
public class Demo05Match {
ObjectMapper mapper = new ObjectMapper();
RestHighLevelClient client = ESClient.getClient();
String index = "sms-logs-index";
String type = "sms-logs-type";
@Test
public void matchAllQuery() throws IOException {
// 1. 创建Request对象
SearchRequest request = new SearchRequest(index);
request.types(type);
// 2. 指定查询条件
SearchSourceBuilder builder = new SearchSourceBuilder();
builder.query(QueryBuilders.matchAllQuery());
builder.size(20); // es默认只查询10条, 查询更多需要指定
request.source(builder);
// 3. 执行查询
SearchResponse response = client.search(request, RequestOptions.DEFAULT);
// 4. 获取_source中的数据, 并展示
for (SearchHit hit : response.getHits().getHits()) {
Map<String, Object> result = hit.getSourceAsMap();
System.out.println(result);
}
System.out.println(response.getHits().getHits().length);
}
}
match查询
指定field查询条件
# match查询
POST /sms-logs-index/sms-logs-type/_search
{
"query": {
"match": {
"smsContent": "面积"
}
}
}
java代码实现
@Test
public void matchQuery() throws IOException {
// 1. 创建Request对象
SearchRequest request = new SearchRequest(index);
request.types(type);
// 2. 指定查询条件
SearchSourceBuilder builder = new SearchSourceBuilder();
builder.query(QueryBuilders.matchQuery("smsContent", "面积"));
request.source(builder);
// 3. 执行查询
SearchResponse response = client.search(request, RequestOptions.DEFAULT);
// 4. 获取_source中的数据, 并展示
for (SearchHit hit : response.getHits().getHits()) {
Map<String, Object> result = hit.getSourceAsMap();
System.out.println(result);
}
System.out.println(response.getHits().getHits().length);
}
布尔match查询
基于一个field查询条件,进行and或or的连接方式查询
# 布尔match查询
POST /sms-logs-index/sms-logs-type/_search
{
"query": {
"match": {
"smsContent": {
"query": "孩子 团队",
"operator": "or" # "operator": "and" 按照and,既包含"孩子"又包含"团队"的
}
}
}
}
java代码实现
@Test
public void booleanMatchQuery() throws IOException {
// 1. 创建Request对象
SearchRequest request = new SearchRequest(index);
request.types(type);
// 2. 指定查询条件
SearchSourceBuilder builder = new SearchSourceBuilder();
builder.query(QueryBuilders.matchQuery("smsContent", "团队 孩子").operator(Operator.OR));
request.source(builder);
// 3. 执行查询
SearchResponse response = client.search(request, RequestOptions.DEFAULT);
// 4. 获取_source中的数据, 并展示
for (SearchHit hit : response.getHits().getHits()) {
Map<String, Object> result = hit.getSourceAsMap();
System.out.println(result);
}
System.out.println(response.getHits().getHits().length);
}
multi_match查询
match针对一个field做检索, multi_match针对多个field进行检索, 多个filed针对一个text
# multi_match查询
POST /sms-logs-index/sms-logs-type/_search
{
"query": {
"multi_match": {
"query": "北京",
"fields": ["province", "smsContent"]
}
}
}
# 省份或信息中包含"北京"的都符合
java代码实现
@Test
public void multiMatchQuery() throws IOException {
// 1. 创建Request对象
SearchRequest request = new SearchRequest(index);
request.types(type);
// 2. 指定查询条件
SearchSourceBuilder builder = new SearchSourceBuilder();
builder.query(QueryBuilders.multiMatchQuery("北京", "province", "smsContent").operator(Operator.OR));
request.source(builder);
// 3. 执行查询
SearchResponse response = client.search(request, RequestOptions.DEFAULT);
// 4. 获取_source中的数据, 并展示
for (SearchHit hit : response.getHits().getHits()) {
Map<String, Object> result = hit.getSourceAsMap();
System.out.println(result);
}
System.out.println(response.getHits().getHits().length);
}
其他查询
id查询
# id查询
GET /sms-logs-index/sms-logs-type/1
java代码实现
package com.example.test;
import com.example.utils.ESClient;
import com.fasterxml.jackson.databind.ObjectMapper;
import org.elasticsearch.action.get.GetRequest;
import org.elasticsearch.action.get.GetResponse;
import org.elasticsearch.client.RequestOptions;
import org.elasticsearch.client.RestHighLevelClient;
import org.junit.Test;
import java.io.IOException;
/**
* @author : ryxiong728
* @email : ryxiong728@126.com
* @date : 3/2/21
* @Description:
*/
public class demo06Other {
ObjectMapper mapper = new ObjectMapper();
RestHighLevelClient client = ESClient.getClient();
String index = "sms-logs-index";
String type = "sms-logs-type";
@Test
public void findById() throws IOException {
// 1. 创建GetRequest
GetRequest request = new GetRequest(index, type, "1");
// 2. 执行查询
GetResponse response = client.get(request, RequestOptions.DEFAULT);
// 3. 输出结果
System.out.println(response.getSourceAsMap());
}
}
Ids查询
根据多个id查询, 类似Mysql中的where id in (id1, id2, id3...)
# ids查询
POST /sms-logs-index/sms-logs-type/_search
{
"query": {
"ids": {
"values": ["1", "2", "100"]
}
}
}
java代码实现
@Test
public void findByIds() throws IOException {
// 1. 创建searchRequest对象
SearchRequest request = new SearchRequest(index);
request.types(type);
// 2. 指定查询条件
SearchSourceBuilder builder = new SearchSourceBuilder();
builder.query(QueryBuilders.idsQuery().addIds("1", "2", "100"));
request.source(builder);
// 3. 执行
SearchResponse response = client.search(request, RequestOptions.DEFAULT);
// 4. 获取结果
for (SearchHit hit : response.getHits().getHits()) {
Map<String, Object> result = hit.getSourceAsMap();
System.out.println(result);
}
}
prefix查询
前缀查询, 通过一个关键字去指定一个field的前缀, 从而查询到指定的文档
# prefix查询
POST /sms-logs-index/sms-logs-type/_search
{
"query": {
"prefix": {
"corpName": {
"value": "海尔"
}
}
}
}
java代码实现
@Test
public void findByPrefix() throws IOException {
// 1. 创建searchRequest对象
SearchRequest request = new SearchRequest(index);
request.types(type);
// 2. 指定查询条件
SearchSourceBuilder builder = new SearchSourceBuilder();
builder.query(QueryBuilders.prefixQuery("corpName", "海尔"));
request.source(builder);
// 3. 执行
SearchResponse response = client.search(request, RequestOptions.DEFAULT);
// 4. 获取结果
for (SearchHit hit : response.getHits().getHits()) {
Map<String, Object> result = hit.getSourceAsMap();
System.out.println(result);
}
}
fuzzy查询
模糊查询
输入字符的大概, ES可以根据输入的内容进行查询, 即时有错别字也可以, 查询结果相应不会太精确
# fuzzy查询
POST /sms-logs-index/sms-logs-type/_search
{
"query": {
"fuzzy": {
"corpName": {
"value": "苏拧易购",
"prefix_length": 1 # 指定前面几个字符不允许出错
}
}
}
}
java代码实现
@Test
public void findByFuzzy() throws IOException {
// 1. 创建searchRequest对象
SearchRequest request = new SearchRequest(index);
request.types(type);
// 2. 指定查询条件
SearchSourceBuilder builder = new SearchSourceBuilder();
builder.query(QueryBuilders.fuzzyQuery("corpName", "苏宁易购").prefixLength(2));
request.source(builder);
// 3. 执行
SearchResponse response = client.search(request, RequestOptions.DEFAULT);
// 4. 获取结果
for (SearchHit hit : response.getHits().getHits()) {
Map<String, Object> result = hit.getSourceAsMap();
System.out.println(result);
}
}
wildcard查询
通配查询, 和Mysql中的like是一个套路, 在查询时, 指定通配符*和占位符
# wildcard查询公司以"海尔"开头的
POST /sms-logs-index/sms-logs-type/_search
{
"query": {
"wildcard": {
"corpName": {
"value": "海尔*" # 可以使用 * 和 ? 指定通配符和占位符
}
}
}
}
java代码实现
@Test
public void findByWillCard() throws IOException {
// 1. 创建searchRequest对象
SearchRequest request = new SearchRequest(index);
request.types(type);
// 2. 指定查询条件
SearchSourceBuilder builder = new SearchSourceBuilder();
builder.query(QueryBuilders.wildcardQuery("corpName", "海尔*"));
request.source(builder);
// 3. 执行
SearchResponse response = client.search(request, RequestOptions.DEFAULT);
// 4. 获取结果
for (SearchHit hit : response.getHits().getHits()) {
Map<String, Object> result = hit.getSourceAsMap();
System.out.println(result);
}
}
range查询
范围查询, 只针对数值类型, 对某一个field进行大于或者小于的范围指定
# range查询
POST /sms-logs-index/sms-logs-type/_search
{
"query": {
"range": {
"fee": {
"gte": 10,
"lte": 50 # gt >, lt <, gte >=, lte <=
}
}
}
}
java代码实现
@Test
public void findByRange() throws IOException {
// 1. 创建searchRequest对象
SearchRequest request = new SearchRequest(index);
request.types(type);
// 2. 指定查询条件
SearchSourceBuilder builder = new SearchSourceBuilder();
builder.query(QueryBuilders.rangeQuery("fee").lt(50).gt(10));
request.source(builder);
// 3. 执行
SearchResponse response = client.search(request, RequestOptions.DEFAULT);
// 4. 获取结果
for (SearchHit hit : response.getHits().getHits()) {
Map<String, Object> result = hit.getSourceAsMap();
System.out.println(result);
}
}
regexp查询
正则表达式查询, 根据编写的正则表达式去匹配内容
注意: prefix, willcard, fuzzy和regexp查询效率相对较低, 对效率要求高时, 避免使用
# regexp查询 电话以38结尾的
POST /sms-logs-index/sms-logs-type/_search
{
"query": {
"regexp": {
"mobile": "[0-9]{9}38"
}
}
}
java代码实现
@Test
public void findByRegexp() throws IOException {
// 1. 创建searchRequest对象
SearchRequest request = new SearchRequest(index);
request.types(type);
// 2. 指定查询条件
SearchSourceBuilder builder = new SearchSourceBuilder();
builder.query(QueryBuilders.regexpQuery("mobile", "[0-9]{9}38"));
request.source(builder);
// 3. 执行
SearchResponse response = client.search(request, RequestOptions.DEFAULT);
// 4. 获取结果
for (SearchHit hit : response.getHits().getHits()) {
Map<String, Object> result = hit.getSourceAsMap();
System.out.println(result);
}
}
深分页Scroll
ES对from + size是有限制的, from和size二者之和不能超过1W
原理:
from + size 在ES查询数据的方式
- 将用户指定的关键字进行分词,
- 将词汇去分词库中进行检索, 得到多个文档id
- 去各个分片中去拉取指定的数据, 耗时较长
- 将数据根据score分数进行排序, 耗时较长
- 根据from的值,将查询的数据进行取舍
- 返回结果
Scroll + size 在ES中查询数据的方式
- 将用户指定的关键字进行分词,
- 将词汇去分词库中进行检索, 得到多个文档id
- 将文档的id存放在es的上下文中
- 根据指定的size去ES中检索指定的数据, 拿完数据的文档id, 会从上下文中移除
- 如果需要下一页数据, 直接去ES的上下文中, 找后续内容
- 循环第四和第五步,获取查询内容
Scroll查询方式, 不适合做实时的查询
# Scroll查询, 返回第一页数据, 将文档id存放在ES上下文中, 指定生存时间1m
POST /sms-logs-index/sms-logs-type/_search?scroll=1m
{
"query": {
"match_all": {}
},
"size": 2,
"sort": [
{
"fee": {
"order": "desc"
}
}
]
}
# 根据scroll查询下一页数据
POST /_search/scroll
{
"scroll_id": "DnF1ZXJ5VGhlbkZldGNoBQAAAAAAAWlZFnhjSU5MM0RBUk9hb2Eza1g5OWtzbncAAAAAAAFpWhZ4Y0lOTDNEQVJPYW9hM2tYOTlrc253AAAAAAABaVsWeGNJTkwzREFST2FvYTNrWDk5a3NudwAAAAAAAWlcFnhjSU5MM0RBUk9hb2Eza1g5OWtzbncAAAAAAAFpXRZ4Y0lOTDNEQVJPYW9hM2tYOTlrc253", # 根据第一步得到的scroll_id
"scroll": "1m" # scroll信息的生存时间
}
# 删除scroll在ES上下文中的数据
DELETE /_search/scroll/DnF1ZXJ5VGhlbkZldGNoBQAAAAAAAWlZFnhjSU5MM0RBUk9hb2Eza1g5OWtzbncAAAAAAAFpWhZ4Y0lOTDNEQVJPYW9hM2tYOTlrc253AAAAAAABaVsWeGNJTkwzREFST2FvYTNrWDk5a3NudwAAAAAAAWlcFnhjSU5MM0RBUk9hb2Eza1g5OWtzbncAAAAAAAFpXRZ4Y0lOTDNEQVJPYW9hM2tYOTlrc253 # scroll_id
java代码实现
package com.example.test;
import com.example.utils.ESClient;
import com.fasterxml.jackson.databind.ObjectMapper;
import org.elasticsearch.action.get.GetRequest;
import org.elasticsearch.action.get.GetResponse;
import org.elasticsearch.action.search.*;
import org.elasticsearch.client.RequestOptions;
import org.elasticsearch.client.RestHighLevelClient;
import org.elasticsearch.common.unit.TimeValue;
import org.elasticsearch.index.query.QueryBuilders;
import org.elasticsearch.search.SearchHit;
import org.elasticsearch.search.builder.SearchSourceBuilder;
import org.elasticsearch.search.sort.SortOrder;
import org.junit.Test;
import java.io.IOException;
import java.util.Map;
/**
* @author : ryxiong728
* @email : ryxiong728@126.com
* @date : 3/2/21
* @Description:
*/
public class demo07Scroll {
ObjectMapper mapper = new ObjectMapper();
RestHighLevelClient client = ESClient.getClient();
String index = "sms-logs-index";
String type = "sms-logs-type";
@Test
public void scrollQuery() throws IOException {
// 1. 创建searchRequest对象
SearchRequest request = new SearchRequest(index);
request.types(type);
// 2. 指定scroll信息
request.scroll(TimeValue.timeValueMinutes(1L));
// 3. 指定查询条件
SearchSourceBuilder builder = new SearchSourceBuilder();
builder.size(4);
builder.sort("fee", SortOrder.DESC);
builder.query(QueryBuilders.matchAllQuery());
request.source(builder);
// 4. 获取返回结果的scrollId, source
SearchResponse response = client.search(request, RequestOptions.DEFAULT);
String scrollId = response.getScrollId();
System.out.println("--------首页-------");
for (SearchHit hit : response.getHits().getHits()) {
System.out.println(hit.getSourceAsMap());
}
while (true) {
// 5. 创建SearchScrollRequest
SearchScrollRequest scrollRequest = new SearchScrollRequest(scrollId);
// 6. 指定scrollId的生存时间
scrollRequest.scroll(TimeValue.timeValueMinutes(1L));
// 7. 执行查询获取返回结果
SearchResponse scrollResponse = client.scroll(scrollRequest, RequestOptions.DEFAULT);
// 8. 判断是否查询到了数据输出
SearchHit[] hits = scrollResponse.getHits().getHits();
if (hits != null && hits.length > 0) {
System.out.println("---------下一页--------");
for (SearchHit hit : hits) {
System.out.println(hit.getSourceAsMap());
}
} else {
// 9. 判断没有查询到的数据- 退出循环
System.out.println("---------结束--------");
break;
}
}
// 10. 创建ClearScrollRequest
ClearScrollRequest clearScrollRequest = new ClearScrollRequest();
// 11. 指定ScrollId
clearScrollRequest.addScrollId(scrollId);
// 12. 删除scrollId
ClearScrollResponse clearScrollResponse = client.clearScroll(clearScrollRequest, RequestOptions.DEFAULT);
// 13. 输出结果
System.out.println("删除scroll" + clearScrollResponse.isSucceeded());
}
}
delete-by-query
根据term, match等查询方式去删除大量的文档
注意: 如果你需要删除的内容, 是index下的大部分数据, 推荐创建一个全新的index, 将保留的文档内容, 添加到全新的索引
# delete-by-query查询删除
POST /sms-logs-index/sms-logs-type/_delete_by_query
{
"query": {
"range": {
"fee": {
"gte": 10,
"lte": 15
}
}
}
}
java代码实现
package com.example.test;
import com.example.utils.ESClient;
import com.fasterxml.jackson.databind.ObjectMapper;
import org.elasticsearch.action.search.*;
import org.elasticsearch.client.RequestOptions;
import org.elasticsearch.client.RestHighLevelClient;
import org.elasticsearch.common.unit.TimeValue;
import org.elasticsearch.index.query.QueryBuilders;
import org.elasticsearch.index.reindex.BulkByScrollResponse;
import org.elasticsearch.index.reindex.DeleteByQueryRequest;
import org.elasticsearch.search.SearchHit;
import org.elasticsearch.search.builder.SearchSourceBuilder;
import org.elasticsearch.search.sort.SortOrder;
import org.junit.Test;
import java.io.IOException;
/**
* @author : ryxiong728
* @email : ryxiong728@126.com
* @date : 3/2/21
* @Description:
*/
public class demo08Delete {
ObjectMapper mapper = new ObjectMapper();
RestHighLevelClient client = ESClient.getClient();
String index = "sms-logs-index";
String type = "sms-logs-type";
@Test
public void deleteByQuery() throws IOException {
// 1. 创建DeleteByQueryRequest
DeleteByQueryRequest request = new DeleteByQueryRequest(index);
request.types(type);
// 2. 指定检索条件
request.setQuery(QueryBuilders.rangeQuery("fee").gt(10).lt(20));
// 3. 执行删除
BulkByScrollResponse response = client.deleteByQuery(request, RequestOptions.DEFAULT);
// 4. 输出返回结果
System.out.println(response.toString());
}
}
复合查询
bool查询
符合过滤器, 将你的多个查询条件, 以一定的逻辑组合在一起.
- must: 所有条件都符合,表示and
- must_not : 所有条件都不匹配, 表示not
- should: 所有条件, 满足其一即可, 表示or
# 复合查询
# 1. 省份为武汉或北京
# 2. 运营商不是电信
# 3. smsContent中包含 客厅 和 面积
POST /sms-logs-index/sms-logs-type/_search
{
"query": {
"bool": {
"should": [
{
"term": {
"province": {
"value": "北京"
}
}
},
{
"term": {
"province": {
"value": "武汉"
}
}
}
],
"must_not": [
{
"term": {
"operatorId": {
"value": "3"
}
}
}
],
"must": [
{
"match": {
"smsContent": "客厅"
}
},
{
"match": {
"smsContent": "面积"
}
}
]
}
}
}
java代码实现
package com.example.test;
import com.example.utils.ESClient;
import com.fasterxml.jackson.databind.ObjectMapper;
import org.elasticsearch.action.search.SearchRequest;
import org.elasticsearch.action.search.SearchRequestBuilder;
import org.elasticsearch.action.search.SearchResponse;
import org.elasticsearch.client.RequestOptions;
import org.elasticsearch.client.RestHighLevelClient;
import org.elasticsearch.index.query.BoolQueryBuilder;
import org.elasticsearch.index.query.QueryBuilders;
import org.elasticsearch.index.reindex.BulkByScrollResponse;
import org.elasticsearch.search.SearchHit;
import org.elasticsearch.search.builder.SearchSourceBuilder;
import org.junit.Test;
import java.io.IOException;
/**
* @author : ryxiong728
* @email : ryxiong728@126.com
* @date : 3/2/21
* @Description:
*/
public class demo08complex {
ObjectMapper mapper = new ObjectMapper();
RestHighLevelClient client = ESClient.getClient();
String index = "sms-logs-index";
String type = "sms-logs-type";
@Test
public void boolQuery() throws IOException {
// 1. 创建SearchRequest
SearchRequest request = new SearchRequest(index);
request.types(type);
// 2. 指定检索条件
SearchSourceBuilder builder = new SearchSourceBuilder();
BoolQueryBuilder boolQueryBuilder = QueryBuilders.boolQuery();
// 指定省份为武汉或北京
boolQueryBuilder.should(QueryBuilders.termQuery("province", "武汉"));
boolQueryBuilder.should(QueryBuilders.termQuery("province", "北京"));
// 指定运营方不为电信
boolQueryBuilder.mustNot(QueryBuilders.termQuery("operatorId", 3));
// smsContent中包含 面积 和 客厅
boolQueryBuilder.must(QueryBuilders.matchQuery("smsContent", "面积"));
boolQueryBuilder.must(QueryBuilders.matchQuery("smsContent", "客厅"));
builder.query(boolQueryBuilder);
request.source(builder);
// 3. 执行查询
SearchResponse response = client.search(request, RequestOptions.DEFAULT);
// 4. 输出返回结果
for (SearchHit hit : response.getHits().getHits()) {
System.out.println(hit.getSourceAsMap());
}
}
}
boosting查询
boosting查询可以帮助我们影响查询后的score
- positive: 只有匹配上positive查询的内容, 才会放到返回的结果中
- negative: 在匹配上positive后同时匹配上了negative, 可以降低这样的文档score
- negative_boost: 指定降低的系数, 必须小于1.0
关于查询时 分数如何计算:
- 搜索的关键字在文档中出现的频次越高, 分数越高
- 指定的文档内容越短, 分数就越高
- 指定的关键字也会被分词, 被分词的内容在分词库匹配的个数越多, 分数越高
# boosting查询 客厅面积
POST /sms-logs-index/sms-logs-type/_search
{
"query": {
"boosting": {
"positive": {
"match": {
"smsContent": "客厅面积"
}
},
"negative": {
"match": {
"smsContent": "差不多"
}
},
"negative_boost": 0.5
}
}
}
java代码实现
@Test
public void boostingQuery() throws IOException {
// 1. 创建SearchRequest
SearchRequest request = new SearchRequest(index);
request.types(type);
// 2. 指定检索条件
SearchSourceBuilder builder = new SearchSourceBuilder();
BoostingQueryBuilder boostingQueryBuilder = QueryBuilders.boostingQuery(
QueryBuilders.matchQuery("smsContent", "客厅面积"),
QueryBuilders.matchQuery("smsContent", "差不多")
).negativeBoost(0.5f);
builder.query(boostingQueryBuilder);
request.source(builder);
// 3. 执行查询
SearchResponse response = client.search(request, RequestOptions.DEFAULT);
// 4. 输出返回结果
for (SearchHit hit : response.getHits().getHits()) {
System.out.println(hit.getSourceAsMap());
}
}
filter查询
query和filter区别:
- query: 根据你的查询条件, 去计算文档的匹配度获取一个分数, 根据分数进行排序, 不会作缓存
- filter: 根据你的查询条件,去查询文档, 不会计算匹配分数, 但是filter会对经常被过滤的数据进行缓存
# filter查询
POST /sms-logs-index/sms-logs-type/_search
{
"query": {
"bool": {
"filter": [
{
"term": {
"corpName": "苏宁易购"
}
},
{
"range": {
"fee": {
"gt": 4
}
}
}
]
}
}
}
java代码实现
package com.example.test;
import com.example.utils.ESClient;
import com.fasterxml.jackson.databind.ObjectMapper;
import org.elasticsearch.action.search.SearchRequest;
import org.elasticsearch.action.search.SearchResponse;
import org.elasticsearch.client.RequestOptions;
import org.elasticsearch.client.RestHighLevelClient;
import org.elasticsearch.index.query.BoolQueryBuilder;
import org.elasticsearch.index.query.BoostingQueryBuilder;
import org.elasticsearch.index.query.QueryBuilders;
import org.elasticsearch.search.SearchHit;
import org.elasticsearch.search.builder.SearchSourceBuilder;
import org.junit.Test;
import java.io.IOException;
/**
* @author : ryxiong728
* @email : ryxiong728@126.com
* @date : 3/2/21
* @Description:
*/
public class demo10Filter {
ObjectMapper mapper = new ObjectMapper();
RestHighLevelClient client = ESClient.getClient();
String index = "sms-logs-index";
String type = "sms-logs-type";
@Test
public void filter() throws IOException {
// 1. 创建SearchRequest
SearchRequest request = new SearchRequest(index);
request.types(type);
// 2. 指定检索条件
SearchSourceBuilder builder = new SearchSourceBuilder();
BoolQueryBuilder boolQueryBuilder = QueryBuilders.boolQuery();
boolQueryBuilder.filter(QueryBuilders.termQuery("corpName", "苏宁易购"));
boolQueryBuilder.filter(QueryBuilders.rangeQuery("fee").gt(10));
builder.query(boolQueryBuilder);
request.source(builder);
// 3. 执行查询
SearchResponse response = client.search(request, RequestOptions.DEFAULT);
// 4. 输出返回结果
for (SearchHit hit : response.getHits().getHits()) {
System.out.println(hit.getSourceAsMap());
}
}
}
高亮查询
高亮查询, 将用户输入的关键字, 以一定的特殊样式展示给用户, 让用户知道为什么结果被检索出来
高亮展示的数据, 本身是文档中的一个field, 单独将Field以highlight的形式返回给你
ES中提供一个highlight属性, 和query同级别
-
fields: 指定那几个字段以高亮显示
-
fragment_size: 指定高亮数据展示多少个字符
-
pre_tags: 指定前缀标签, 如<font color="red">
-
post_tags: 指定后缀标签, 如</font>
# highlight查询
POST /sms-logs-index/sms-logs-type/_search
{
"query": {
"match": {
"smsContent": "面积"
}
},
"highlight": {
"fields": {
"smsContent": {}
},
"pre_tags": "<font color='red'>",
"post_tags": "</font>",
"fragment_size": 10
}
}
java代码实现
package com.example.test;
import com.example.utils.ESClient;
import com.fasterxml.jackson.databind.ObjectMapper;
import org.elasticsearch.action.search.SearchRequest;
import org.elasticsearch.action.search.SearchResponse;
import org.elasticsearch.client.RequestOptions;
import org.elasticsearch.client.RestHighLevelClient;
import org.elasticsearch.index.query.BoolQueryBuilder;
import org.elasticsearch.index.query.QueryBuilders;
import org.elasticsearch.search.SearchHit;
import org.elasticsearch.search.builder.SearchSourceBuilder;
import org.elasticsearch.search.fetch.subphase.highlight.HighlightBuilder;
import org.junit.Test;
import java.io.IOException;
/**
* @author : ryxiong728
* @email : ryxiong728@126.com
* @date : 3/2/21
* @Description:
*/
public class demo11HighLight {
ObjectMapper mapper = new ObjectMapper();
RestHighLevelClient client = ESClient.getClient();
String index = "sms-logs-index";
String type = "sms-logs-type";
@Test
public void highlightQuery() throws IOException {
// 1. 创建SearchRequest
SearchRequest request = new SearchRequest(index);
request.types(type);
// 2. 指定检索条件
SearchSourceBuilder builder = new SearchSourceBuilder();
// 2.1 指定查询条件
builder.query(QueryBuilders.matchQuery("smsContent", "面积"));
// 2.2 指定高亮
HighlightBuilder highlightBuilder = new HighlightBuilder();
highlightBuilder.field("smsContent", 10).preTags("<font color='red'>").postTags("</font>");
builder.highlighter(highlightBuilder);
request.source(builder);
// 3. 执行查询
SearchResponse response = client.search(request, RequestOptions.DEFAULT);
// 4. 输出返回结果
for (SearchHit hit : response.getHits().getHits()) {
System.out.println(hit.getHighlightFields().get("smsContent"));
}
}
}
聚合查询
ES的聚合查询和mysql的聚合查询类似, 相比mysql更强大, 提供了多种多样的统计数据方法
# 聚合查询RESTful语法
POST /sms-logs-index/sms-logs-type/_search
{
"aggs": {
"名字(agg)": {
"agg_type": {
"属性": "值"
}
}
}
}
去重计数查询
去重计数Cardinality
- 将返回的文档中的一个指定的field进行去重, 统计一共有多少条
# cardinality去重计数查询
POST /sms-logs-index/sms-logs-type/_search
{
"aggs": {
"agg": {
"cardinality": {
"field": "province"
}
}
}
}
java代码实现
package com.example.test;
import com.example.utils.ESClient;
import com.fasterxml.jackson.databind.ObjectMapper;
import org.elasticsearch.action.search.SearchRequest;
import org.elasticsearch.action.search.SearchResponse;
import org.elasticsearch.client.RequestOptions;
import org.elasticsearch.client.RestHighLevelClient;
import org.elasticsearch.index.query.QueryBuilders;
import org.elasticsearch.search.SearchHit;
import org.elasticsearch.search.aggregations.Aggregation;
import org.elasticsearch.search.aggregations.AggregationBuilders;
import org.elasticsearch.search.aggregations.metrics.cardinality.Cardinality;
import org.elasticsearch.search.builder.SearchSourceBuilder;
import org.elasticsearch.search.fetch.subphase.highlight.HighlightBuilder;
import org.junit.Test;
import java.io.IOException;
/**
* @author : ryxiong728
* @email : ryxiong728@126.com
* @date : 3/2/21
* @Description:
*/
public class Demo12Aggs {
ObjectMapper mapper = new ObjectMapper();
RestHighLevelClient client = ESClient.getClient();
String index = "sms-logs-index";
String type = "sms-logs-type";
@Test
public void cardinality() throws IOException {
// 1. 创建SearchRequest
SearchRequest request = new SearchRequest(index);
request.types(type);
// 2. 指定使用的聚合查询方式
SearchSourceBuilder builder = new SearchSourceBuilder();
builder.aggregation(AggregationBuilders.cardinality("agg").field("province"));
request.source(builder);
// 3. 执行查询
SearchResponse response = client.search(request, RequestOptions.DEFAULT);
// 4. 输出返回结果
Cardinality agg = response.getAggregations().get("agg"); // 向下转型
long value = agg.getValue();
System.out.println(value);
}
}
范围统计
统计一定范围内出现的文档个数, 比如, 针对一个Field的值在0~100, 100~200等之间出现的个数分别是多少
范围统计可以针对普通的数值, 也可以针对时间类型, 针对ip类型都可以做响应的统计
-
range: 数值统计
# 数值方式范围统计 POST /sms-logs-index/sms-logs-type/_search { "aggs": { "agg": { "range": { "field": "fee", "ranges": [ { "from": 10, "to": 50 }, { "from": 50, "to": 100 },{ "from": 100 } ] } } } }
-
date_range: 时间统计
# 时间方式范围统计 POST /sms-logs-index/sms-logs-type/_search { "aggs": { "agg": { "range": { "field": "createDate", "format": "yyyy", "ranges": [ { "to": 1996 }, { "from": 1996 } ] } } } }
-
ip_range: ip统计
# ip方式范围统计 POST /sms-logs-index/sms-logs-type/_search { "aggs": { "agg": { "ip_range": { "field": "ipAddr", "ranges": [ { "to": "127.0.0.10" }, { "from": "127.0.0.10" } ] } } } }
java代码实现
数值方式范围统计
@Test
public void range() throws IOException {
// 1. 创建SearchRequest
SearchRequest request = new SearchRequest(index);
request.types(type);
// 2. 指定使用的聚合查询方式
SearchSourceBuilder builder = new SearchSourceBuilder();
builder.aggregation(AggregationBuilders.range("agg").field("fee")
.addUnboundedTo(10)
.addRange(10, 50)
.addUnboundedFrom(50)
);
request.source(builder);
// 3. 执行查询
SearchResponse response = client.search(request, RequestOptions.DEFAULT);
// 4. 输出返回结果
Range agg = response.getAggregations().get("agg");
for (Range.Bucket bucket : agg.getBuckets()) {
String key = bucket.getKeyAsString();
Object from = bucket.getFrom();
Object to = bucket.getTo();
long docCount = bucket.getDocCount();
System.out.println(String.format("key: %s, from: %s, to: %s, doc: %s",key, from, to, docCount));
}
}
其他类似
统计聚合
查询指定field的最大值, 最小值, 平均值, 平方和...
使用 extended_stats
# 统计聚合查询
POST /sms-logs-index/sms-logs-type/_search
{
"aggs": {
"agg": {
"extended_stats": {
"field": "fee"
}
}
}
}
java实现统计聚合查询
@Test
public void extendedStats() throws IOException {
// 1. 创建SearchRequest
SearchRequest request = new SearchRequest(index);
request.types(type);
// 2. 指定使用的聚合查询方式
SearchSourceBuilder builder = new SearchSourceBuilder();
builder.aggregation(AggregationBuilders.extendedStats("agg").field("fee"));
request.source(builder);
// 3. 执行查询
SearchResponse response = client.search(request, RequestOptions.DEFAULT);
// 4. 输出返回结果
ExtendedStats agg = response.getAggregations().get("agg");
double max = agg.getMax();
double min = agg.getMin();
System.out.println("fee的最大是为:" + max + ", 最小值为: " + min);
}
地图经纬度搜索
ES中提供了一个数据类型geo_point, 用来存储经纬度
# 创建一个索引, 一个name, 一个location
PUT /map
{
"settings": {
"number_of_replicas": 1,
"number_of_shards": 5
},
"mappings": {
"map": {
"properties": {
"name": {
"type": "text"
},
"location": {
"type": "geo_point"
}
}
}
}
}
# 添加测试数据
PUT /map/map/1
{
"name": "天安门",
"location": {
"lon": 116.402981,
"lat": 39.914492
}
}
PUT /map/map/2
{
"name": "海淀公园",
"location": {
"lon": 116.302509,
"lat": 39.991152
}
}
PUT /map/map/3
{
"name": "北京动物园",
"location": {
"lon": 116.343184,
"lat": 39.947468
}
}
ES地图检索方式
- geo_distance: 直线距离检索方式
- geo_bounding_box: 以两个点确定一个矩形, 获取矩形内的全部数据
- geo_polygon: 以多个点确定一个多边形, 获取多边形内的全部数据
基于RESTful实现地图检索
geo_distance
# geo_distance
POST /map/map/_search
{
"query": {
"geo_distance": {
"location": { # 找一个目标点
"lon": 116.433733,
"lat": 39.908404
},
"distance": 3000, # 确定半径
"distance_type": "arc" # 确定形状为园
}
}
}
geo_bounding_box
# geo_bounding_box
POST /map/map/_search
{
"query": {
"geo_bounding_box": {
"location": {
"top_left": {
"lon": 116.326943,
"lat": 39.95499
},
"bottom_right": {
"lon": 116.347783,
"lat": 39.939281
}
}
}
}
}
geo_polygen
# geo_polygon
POST /map/map/_search
{
"query": {
"geo_polygon": {
"location": {
"points": [
{
"lon": 116.298916,
"lat": 39.99878
},
{
"lon": 116.29561,
"lat": 39.972576
},
{
"lon": 116.327661,
"lat": 39.984736
}
]
}
}
}
}
java实现代码geo_polygon
package com.example.test;
import com.example.utils.ESClient;
import org.elasticsearch.action.search.SearchRequest;
import org.elasticsearch.action.search.SearchResponse;
import org.elasticsearch.client.RequestOptions;
import org.elasticsearch.client.RestHighLevelClient;
import org.elasticsearch.common.geo.GeoPoint;
import org.elasticsearch.index.query.QueryBuilders;
import org.elasticsearch.search.SearchHit;
import org.elasticsearch.search.aggregations.AggregationBuilders;
import org.elasticsearch.search.aggregations.bucket.range.Range;
import org.elasticsearch.search.aggregations.metrics.cardinality.Cardinality;
import org.elasticsearch.search.aggregations.metrics.stats.extended.ExtendedStats;
import org.elasticsearch.search.builder.SearchSourceBuilder;
import org.junit.Test;
import java.io.IOException;
import java.util.ArrayList;
import java.util.List;
/**
* @author : ryxiong728
* @email : ryxiong728@126.com
* @date : 3/2/21
* @Description:
*/
public class Demo13GeoSearch {
RestHighLevelClient client = ESClient.getClient();
String index = "map";
String type = "map";
@Test
public void geoPolygon() throws IOException {
// 1. 创建SearchRequest
SearchRequest request = new SearchRequest(index);
request.types(type);
// 2. 指定检索方式
SearchSourceBuilder builder = new SearchSourceBuilder();
List<GeoPoint> points = new ArrayList<GeoPoint>();
points.add(new GeoPoint(39.99878, 116.298916));
points.add(new GeoPoint(39.972576, 116.29561));
points.add(new GeoPoint(39.984739, 116.327661));
builder.query(QueryBuilders.geoPolygonQuery("location", points));
request.source(builder);
// 3. 执行查询
SearchResponse response = client.search(request, RequestOptions.DEFAULT);
// 4. 输出返回结果
for (SearchHit hit : response.getHits().getHits()) {
System.out.println(hit.getSourceAsMap());
}
}
}