ElasticSearch学习

ElasticSearch介绍

引言

  1. 在海量数据中执行搜索功能, Mysql对于大数据的搜索,效率太低
  2. 如果关键字不准确, 一样可以搜索到想要的数据

es介绍

es是使用java语言并且基于Lucene编写的搜索引擎框架, 提供了分布式的全文检索功能, 可以近乎实时的存储, 检索数据, 提供了统一的基于RESTful风格的web接口, 官方客户端对多种语言提供了相应的API

Lucene: Lucene本身就是一个搜索引擎的底层, 本质是一个jar包,里面包含了封装好的各种建立倒排索引,以及进行搜索的代码,包括各种算法。

全文检索是指计算机索引程序通过扫描文章中的每一个词,对每一个词建立一个索引,指明该词在文章中出现的次数和位置,当用户查询时,根据关键字去分词库中进行检索, 找到匹配内容

结构化检索:我想搜索商品分类为日化用品的商品都有哪些,select * from products where category_id='日化用品'

es和solr

  1. Solr在查询死数据时, 速度相对于es会更快. 但是如果数据是实时改变的, Solr的查询速度会降低很多, ES的查询效率基本没有变化

  2. Solr搭建基于需要依赖Zookeeper来帮助管理. ES本身就支持集群的搭建, 不需要第三方介入

  3. Solr针对国内的文档并不多, 在ES出现后, 火爆程度直线上升, 文档非常健全

  4. ES对云计算和大数据支持特别好

倒排索引

将存放的数据, 按照一定的方式进行分词, 并且将分词的内容存放到一个单独的分词库中

当用户去查询时, 会将用户的查询关键词进行分词

然后去分词库中匹配内容, 最终得到数据的id标识

根据id标识去存放数据的位置拉取到指定的数据

image-20210301165903923

ElasticSearch安装

安装ES&Kibana

安装ES

version: "3.1"
services:
  elasticsearch:
    image: daocloud.io/library/elasticsearch:6.5.4
    restart: always
    container_name: elasticsearch
    environment:  # 分配的内存,必须指定,因为es默认指定2g,直接内存溢出了,必须改
      - "ES_JAVA_OPTS=-Xms128m -Xmx256m"
      - "discovery.type=single-node"
      - "COMPOSE_PROJECT_NAME=elasticsearch-server"
    ports:
      - 9200:9200
  kibana:
    image: daocloud.io/library/kibana:6.5.4
    restart: always
    container_name: kibana
    ports:
      - 5601:5601
    environment:
      - elasticsearch_url:http://115.159.222.145:9200
    depends_on:
      - elasticsearch

es文件目录

bin 启动文件
config 配置文件
	-log4j2 日志配置
	-jvm.options java虚拟机配置, 配置运行所需内存, 内存不够时配置小一点
	-elasticsearch.yml elasticsearch配置文件, 端口9200
lib 相关jar包
logs 日志
module 功能模块
plugins 插件

elasticsearch启动不起来

elasticsearch exited with code 78

解决:

切换到root用户

执行命令:

sysctl -w vm.max_map_count=262144

查看结果:

sysctl -a|grep vm.max_map_count

显示:

vm.max_map_count = 262144

上述方法修改之后,如果重启虚拟机将失效,所以:

解决办法:

在 /etc/sysctl.conf文件最后添加一行

vm.max_map_count=262144

如果还有问题,注意服务器的内存状态, 可能是内存不够, 需要清理出一些内存.

启动成功后测试

浏览器访问es

http://host:9200
image-20210301180515025

安装Kibana

kibana是一个针对ElasticSearch的开源分析及可视化平台, 用来搜索, 查看交互存储在es索引中的数据.可以通过各种图标进行高级数据分析及展示.

操作简单方便, 数据展示直观

在访问kibana

http://host:5601

image-20210301180444514

安装可视化ES插件head

  1. 下载地址:http://mobz.github.io/elasticsearch-head

  2. 启动

    npm install
    npm run start
    
  3. 跨域问题解决

    # 修改es配置文件elasticsearch.yml
    http.cors.enabled: true
    http.cors.allow-origin: "*"
    
  4. 重启es服务器, 再次连接

安装ik分词器

ik分词器下载地址

https://github.com/medcl/elasticsearch-analysis-ik/releases/download/v6.5.4/elasticsearch-analysis-ik-6.5.4.zip

查看es容器

docker ps | grep elastic

进入es容器内部, 执行bin/目录下elasticsearch-plugin安装ik分词器

./elasticsearch-plugin install https://github.com/medcl/elasticsearch-analysis-ik/releases/download/v6.5.4/elasticsearch-analysis-ik-6.5.4.zip

如果github网络不好,可以找其他版本使用国内路径

使用接口测试分词效果

注意:

需要重启es加载安装的分词器

docker restart es容器名/id

等待重启后测试分词

POST _analyze
{
  "analyzer": "ik_max_word",
  "text": "尚硅谷教育"
}

需要指定分词器类型 analyzer

返回值

{
  "tokens" : [
    {
      "token" : "尚",
      "start_offset" : 0,
      "end_offset" : 1,
      "type" : "CN_CHAR",
      "position" : 0
    },
    {
      "token" : "硅谷",
      "start_offset" : 1,
      "end_offset" : 3,
      "type" : "CN_WORD",
      "position" : 1
    },
    {
      "token" : "教育",
      "start_offset" : 3,
      "end_offset" : 5,
      "type" : "CN_WORD",
      "position" : 2
    }
  ]
}

image-20210301180750084

ElasticSearch核心

ES组件

近实时

分为两个意思

  1. 从写入数据到数据可以被搜索到有一个小延迟(大概1秒);
  2. 基于es执行搜索和分析可以达到秒级。

Cluster(集群)

集群包含多个节点,每个节点属于哪个集群是通过一个配置(集群名称,默认是elasticsearch)来决定的,对于中小型应用来说,刚开始一个集群就一个节点很正常

node(节点)

集群中的一个节点,节点也有一个名称(默认是随机分配的),节点名称很重要(在执行运维管理操作的时候),默认节点会去加入一个名称为“elasticsearch”的集群,如果直接启动一堆节点,那么它们会自动组成一个elasticsearch集群,当然一个节点也可以组成一个elasticsearch集群。

ElasticSearch存储结构

image-20210301184034340

image-20210301184107129

Index(索引-数据库)

索引包含一堆有相似结构的文档数据,ES服务中,可以建立多个索引

如可以有一个客户索引,商品分类索引,订单索引,索引有一个名称。

  1. 每一个索引默认分为5片存储
  2. 每个分片会存在至少一个备份
  3. 备份分片默认不会帮助检索,当检索压力特别大时, 备份才会帮助检索
  4. 备份分片需要放在不同的服务器中

Type(类型-表)

每个索引里都可以有一个或多个type,type是index中的一个逻辑数据分类,一个type下的document,都有相同的field

注意:

  • ES5.x版本,一个Index下可以创建多个Type
  • ES5.x版本,一个Index下可以创建一个Type
  • ES5.x版本,一个Index没有Type

Document(文档-行)

文档是es中的最小数据单元,一个类型下可以有多个document, 一个document可以是一条或多条客户数据

Field(字段-列)

Field是Elasticsearch的最小单位。一个document里面有多个field,每个field就是一个数据字段。

操作ES的RESTful语法

GET请求:

  • http://ip:port/index: 查询索引信息
  • http://ip:port/index/type/doc_id: 查询指定的文档信息

POST请求:

  • http://ip:port/index/type/_search: 查询文档, 可以在请求体中添加json字符串代表查询条件
  • http://ip:port/index/type/doc_id/_update: 修改文档, 可以在请求体中添加json字符串代表修改的具体内容

PUT请求:

  • http://ip:port/index: 创建一个索引, 需要在请求体中指定索引的具体信息
  • http://ip:port/index/type/_mapping: 代表创建索引时, 指定索引文档存储的属性信息

DELETE请求:

索引的操作

创建一个索引

PUT /person
{
  "settings": {
    "number_of_shards": 5,
    "number_of_replicas": 1
  }
}

查看索引信息

  1. kibana图形界面查询

image-20210301185729297

  1. 接口查询

    # 查看索引信息
    GET /person
    

    返回

    {
      "person" : {
        "aliases" : { },
        "mappings" : { },
        "settings" : {
          "index" : {
            "creation_date" : "1614596113957",
            "number_of_shards" : "5",
            "number_of_replicas" : "1",
            "uuid" : "bC8PJsegQ16t5EAtNWh_vg",
            "version" : {
              "created" : "6050499"
            },
            "provided_name" : "person"
          }
        }
      }
    }
    

删除索引

  1. 图形管理界面

    image-20210301190106655

  2. 接口删除

    # 删除索引
    DELETE /person
    

    返回

    {
      "acknowledged" : true
    }
    

ES中Field类型

String:

  • text: 用于全文检索, 将当前Field进行分词
  • keyworld: 当前Field不会被分词

数值类型:

  • long
  • integer
  • byte
  • double
  • float

时间类型:

  • date类型: 针对时间类型指定具体的格式

布尔类型:

  • boolean类型, 表达true和false

二进制类型:

  • binary类型暂时支持Base64 encoding string

范围类型:

  • long_range: 赋值是,只需存储一个范围即可, 指定gt, lt, gte, lte
  • float_range:
  • integer_range:
  • date_range:
  • ip_range:

经纬度类型:

  • geo_point: 用来存储经纬度的

ip类型:

  • ip: 可以存储ipv4或者ipv6

其他

创建索引并指定数据结构

# 创建索引, 指定数据类型
PUT /book
{
  "settings": {
    "number_of_shards": 5,
    "number_of_replicas": 1
  },
  "mappings": {
    "novel": {
      "properties": {
        "name": {
          "type": "text",
          "analyzer": "ik_max_word",
          "index": true,
          "store": false 
        },
        "author": {
          "type": "keyword"
        },
        "count": {
          "type": "long"
        },
        "onSale": {
          "type": "date",
          "format": "yyyy-MM-dd HH:mm:ss||yyyy-MM-dd|epoch_millis"
        },
        "desc": {
          "type": "text",
          "analyzer": "ik_max_word"
        }
      }
    }
  }
}

解释:

  • number_of_shards: 分片
  • number_of_replicas: 备份
  • mappings: 指定数据结构
  • novel: 指定的类型名
  • properties: 文档中字段的定义
  • name: 指定一个字段名为name
  • type: 指定该字段的类型
  • analyzer: 指定使用的分词器
  • index: true指定当前的field可以被作为查询条件
  • store: 是否需要额外存储
  • format: 指定时间存储的格式

文档的操作

文档在ES服务器中唯一的标识, _index, _type, _id三个内容为组合, 锁定一个文档, 操作时添加还是修改

新建文档

自动生成id

# 添加文档
POST /book/novel
{
  "name": "斗罗",
  "author": "西红柿",
  "count": 10000,
  "onSale": "2000-01-01",
  "desc": "斗罗大陆修仙小说"
}

手动指定id

# 手动指定id
PUT /book/novel/1
{
  "name": "红楼梦",
  "author": "曹雪芹",
  "count": 10000,
  "onSale": "1758-01-01",
  "desc": "红楼梦小说"
}

修改文档

覆盖式修改

# 手动指定id
PUT /book/novel/1
{
  "name": "红楼梦",
  "author": "曹雪芹",
  "count": 20000,
  "onSale": "1758-01-01",
  "desc": "红楼梦小说"
}

doc修改方式

# 修改文档,基于doc方式
POST /book/novel/1/_update
{
  "doc": {
    "count": 123455
    # 指定修改的field和对应的值
  }
}

删除文档

# 根据id删除文档
DELETE /book/novel/Ile37XcBdlEqQ4RmWKpJ

Java操作ElasticSearch

java连接ES

  1. 创建maven工程

  2. 导入依赖

    <dependencies>
        <!-- 1.elasticsearch -->
        <dependency>
            <groupId>org.elasticsearch</groupId>
            <artifactId>elasticsearch</artifactId>
            <version>6.5.4</version>
        </dependency>
        <!-- 2.elasticsearch API -->
        <dependency>
            <groupId>org.elasticsearch.client</groupId>
            <artifactId>elasticsearch-rest-high-level-client</artifactId>
            <version>6.5.4</version>
        </dependency>
        <!-- 3. junit-->
        <dependency>
            <groupId>junit</groupId>
            <artifactId>junit</artifactId>
            <version>4.12</version>
        </dependency>
        <!-- 4. lombok-->
        <dependency>
            <groupId>org.projectlombok</groupId>
            <artifactId>lombok</artifactId>
            <version>1.16.22</version>
        </dependency>
        <!-- 5. jackson -->
        <dependency>
            <groupId>com.fasterxml.jackson.core</groupId>
            <artifactId>jackson-databind</artifactId>
            <version>2.10.2</version>
        </dependency>
    </dependencies>
    
  3. 创建es连接

    package com.example.utils;
    
    import org.apache.http.HttpHost;
    import org.elasticsearch.client.RestClient;
    import org.elasticsearch.client.RestClientBuilder;
    import org.elasticsearch.client.RestHighLevelClient;
    
    /**
     * @author : ryxiong728
     * @email : ryxiong728@126.com
     * @date : 3/1/21
     * @Description:
     */
    public class ESClient {
        public static RestHighLevelClient getClient() {
            // 1.创建HttpHost对象
            HttpHost httpHost = new HttpHost("115.159.222.145", 9200);
    
            // 2. 创建RestClientBuilder
            RestClientBuilder clientBuilder = RestClient.builder(httpHost);
    
            // 3. 创建RestHighLevelClient
            RestHighLevelClient client = new RestHighLevelClient(clientBuilder);
            // 返回client对象
            return client;
        }
    }
    

java操作索引

创建索引

package com.example.test;

import com.example.utils.ESClient;
import org.elasticsearch.action.admin.indices.create.CreateIndexRequest;
import org.elasticsearch.action.admin.indices.create.CreateIndexResponse;
import org.elasticsearch.client.RequestOptions;
import org.elasticsearch.client.RestHighLevelClient;
import org.elasticsearch.common.settings.Settings;
import org.elasticsearch.common.xcontent.XContentBuilder;
import org.elasticsearch.common.xcontent.json.JsonXContent;
import org.junit.Test;

import java.io.IOException;

/**
 * @author : ryxiong728
 * @email : ryxiong728@126.com
 * @date : 3/1/21
 * @Description:
 */
public class Demo02 {
    RestHighLevelClient client = ESClient.getClient();
    String index = "person";
    String type = "info";
    /*
    "mappings": {
        "novel": {
          "properties": {
            "name": {
              "type": "text",
              "analyzer": "ik_max_word",
              "index": true,
              "store": false
            },
            "author": {
              "type": "keyword"
            },
            "count": {
              "type": "long"
            },
            "onSale": {
              "type": "date",
              "format": "yyyy-MM-dd HH:mm:ss||yyyy-MM-dd||epoch_millis"
            },
            "desc": {
              "type": "text",
              "analyzer": "ik_max_word"
            }
          }
        }
      }
     */
    @Test
    public void createIndex() throws IOException {
        // 1. 准备索引的settings
        Settings.Builder settings = Settings.builder();
        settings.put("number_of_shards", 3);
        settings.put("number_of_replicas", 1);

        // 2. 准备关于索引的结构mappings
        XContentBuilder mappings = JsonXContent.contentBuilder()
                .startObject()
                    .startObject("properties")
                        .startObject("name")
                            .field("type", "text")
                        .endObject()
                        .startObject("age")
                            .field("type", "integer")
                        .endObject()
                        .startObject("birthday")
                            .field("type", "date")
                            .field("format", "yyyy-MM-dd")
                        .endObject()
                    .endObject()
                .endObject();

        // 3. 将settings和mappings封装到Request对象中
        CreateIndexRequest request = new CreateIndexRequest(index);
        request.settings(settings);
        request.mapping(type, mappings);

        // 4. 通过client连接ES并创建索引
        CreateIndexResponse response = client.indices().create(request, RequestOptions.DEFAULT);

        // 5. 输出
        System.out.println(response.toString());

    }
}

检查索引是否存在

@Test
public void isExists() throws IOException {
    // 1. 准备request对象
    GetIndexRequest request = new GetIndexRequest();
    request.indices(index);

    // 2. 通过client去操作
    boolean exists = client.indices().exists(request, RequestOptions.DEFAULT);

    // 3. 打印
    System.out.println(exists);
}

删除索引

@Test
public void deleteIndex() throws IOException {
    // 1. 准备request对象
    DeleteIndexRequest delete = new DeleteIndexRequest();
    delete.indices(index);

    // 2. 通过client操作
    AcknowledgedResponse resp = client.indices().delete(delete, RequestOptions.DEFAULT);

    // 3. 获取返回结果
    System.out.println(resp.isAcknowledged());
}

java操作文档

添加文档

person实例

@Data
@NoArgsConstructor
@AllArgsConstructor
public class Person {
    @JsonIgnore  // 注解: 序列化是忽略id字段
    private Integer id;
    private String name;
    private Integer age;
    @JsonFormat(pattern = "yyyy-MM-dd")  // 序列化是将date格式化为 yyyy-MM-dd类型
    private Date birthday;

}

创建案例

package com.example.test;

import com.example.entity.Person;
import com.example.utils.ESClient;
import com.fasterxml.jackson.databind.ObjectMapper;
import org.elasticsearch.action.index.IndexRequest;
import org.elasticsearch.action.index.IndexResponse;
import org.elasticsearch.client.RequestOptions;
import org.elasticsearch.client.RestHighLevelClient;
import org.elasticsearch.common.xcontent.XContentType;
import org.junit.Test;

import java.io.IOException;
import java.util.Date;

/**
 * @author : ryxiong728
 * @email : ryxiong728@126.com
 * @date : 3/1/21
 * @Description:
 */
public class Demo03 {
    ObjectMapper mapper = new ObjectMapper();
    RestHighLevelClient client = ESClient.getClient();
    String index = "person";
    String type = "info";

    @Test
    public void createDocument() throws IOException {
        // 1. 准备一个json数据
        Person person = new Person(1, "张三", 23, new Date());
        String json = mapper.writeValueAsString(person);
        System.out.println(json);

        // 2. 准备一个request对象
        IndexRequest indexRequest = new IndexRequest(index, type, person.getId().toString());
        indexRequest.source(json, XContentType.JSON);

        // 3. 通过client对象添加文档
        IndexResponse resp = client.index(indexRequest, RequestOptions.DEFAULT);

        // 4. 打印结果
        System.out.println(resp.getResult().toString());
    }
}


修改文档

@Test
public void updateDocument() throws IOException {
    // 1. 创建一个map, 指定修改的内容
    Map<String, Object> doc = new HashMap<String, Object>();
    doc.put("name", "李四");
    String docId = "1";

    // 2. 创建request对象, 封装数据
    UpdateRequest updateRequest = new UpdateRequest(index, type, docId);
    updateRequest.doc(doc);

    // 3. 通过client对象执行
    UpdateResponse response = client.update(updateRequest, RequestOptions.DEFAULT);

    // 4. 输出返回结果
    System.out.println(response.getResult().toString());

}

删除文档

@Test
public void deleteDocument() throws IOException {
    // 1. 封装request对象
    DeleteRequest deleteRequest = new DeleteRequest(index, type, "1");

    // 2. 通过client执行
    DeleteResponse response = client.delete(deleteRequest, RequestOptions.DEFAULT);

    // 3. 输出结果
    System.out.println(response.getResult().toString());

}

java批量操作文档

批量添加文档

@Test
public void bulkCreateDocument() throws IOException {
    // 1. 准备多个json数据
    Person p1 = new Person(1, "张三", 23, new Date());
    Person p2 = new Person(2, "李四", 24, new Date());
    Person p3 = new Person(3, "王五", 25, new Date());

    String json1 = mapper.writeValueAsString(p1);
    String json2 = mapper.writeValueAsString(p2);
    String json3 = mapper.writeValueAsString(p3);

    // 2. 创建Request, 将准备好的数据封装
    BulkRequest bulkRequest = new BulkRequest();
    bulkRequest.add(new IndexRequest(index, type, p1.getId().toString()).source(json1, XContentType.JSON));
    bulkRequest.add(new IndexRequest(index, type, p2.getId().toString()).source(json2, XContentType.JSON));
    bulkRequest.add(new IndexRequest(index, type, p3.getId().toString()).source(json3, XContentType.JSON));

    // 3. client执行
    BulkResponse resp = client.bulk(bulkRequest, RequestOptions.DEFAULT);

    // 4. 打印
    System.out.println(resp.toString());
}

批量删除

@Test
public void bulkDeleteDocument() throws IOException {
    // 1. 封装Request对象
    BulkRequest bulkRequest = new BulkRequest();
    bulkRequest.add(new DeleteRequest(index, type, "1"));
    bulkRequest.add(new DeleteRequest(index, type, "2"));
    bulkRequest.add(new DeleteRequest(index, type, "3"));

    // 2. client执行
    BulkResponse response = client.bulk(bulkRequest, RequestOptions.DEFAULT);

    // 3. 输出
    System.out.println(response.toString());

}

ElasticSearch练习案例

索引: sms-logs-index

类型: sms-logs-type

字段名称 备注
createDate 创建时间
senDate 发送时间
longCode 发送的长号码 如"10698886622"
mobile 电话, 如"13800000000"
corpName 发送公司名, 需要分词检索
smsContent 发送短信内容, 需要分词检索
state 短信发送状态, 0成功, 1失败
operateId 运营商编号1-移动,2-联通,3-电信
province 省份
ipAddr 下发服务器IP地址
replyTotal 短信状态报告返回时长(s)
fee 扣费(分)

SmsLogs

package com.example.entity;
import com.fasterxml.jackson.annotation.JsonFormat;
import com.fasterxml.jackson.annotation.JsonIgnore;
import lombok.AllArgsConstructor;
import lombok.Data;
import lombok.NoArgsConstructor;

import java.util.ArrayList;
import java.util.Date;

/**
 * @author : ryxiong728
 * @email : ryxiong728@126.com
 * @date : 3/1/21
 * @Description:
 */
@Data
@NoArgsConstructor
@AllArgsConstructor
public class SmsLogs {
    @JsonIgnore
    private Integer id;
    @JsonFormat(pattern = "yyyy-MM-dd")
    private Date createDate;
    @JsonFormat(pattern = "yyyy-MM-dd")
    private Date sendDate;
    private String longCode;
    private String mobile;
    private String corpName;
    private String smsContent;
    private Integer state;
    private Integer operatorId;
    private String province;
    private String ipAddr;
    private Integer replyTotal;
    private String fee;

    @JsonIgnore
    public static String doc = "乌山镇,玉兰大陆第一山脉‘魔兽山脉’西方的芬莱王国中的一个普通小镇。朝阳初升,乌山镇这个小镇上依旧有着清晨的一丝清冷之气,只是小镇中的居民几乎都已经出来开始工作了,即使是六七岁的稚童,也差不多也都起床开始了传统性的晨练。乌山镇东边的空地上,早晨温热的阳光透过空地旁边的大树,在空地上留下了斑驳的光点。只见一大群孩子,目视过去估摸着差不多有一两百个。这群孩子分成了三个团队,每个团队都是排成几排,孩子们一个个都静静地站在空地上,面色严肃。纠结了好久买多大的屏,全凭感觉和运气,最后确定了65寸的,非常合适,大小刚刚好,我家客厅面积是34平方米,差不多的面积尽管拍就好了,其实在大一些可能更棒吧!双十一下手,电视越大越好,实惠好用,电视功能多了也用不着,之前的什么画中画,三D功能,有几个用的上的,电视这东西简单实用就行。我一共买了4台,一台75寸,一台70寸,两台65寸。松下冰箱一台,华帝油烟机燃气灶两套,马桶2个,西门子开关插座55个,丝涟床垫一个,七七八八加起来一共4万5左右,大家电我只信京东,服务杠杠滴";
}

初始化数据

package com.example.test;
import com.example.entity.SmsLogs;
import com.example.utils.ESClient;
import com.fasterxml.jackson.databind.ObjectMapper;
import org.elasticsearch.action.admin.indices.create.CreateIndexRequest;
import org.elasticsearch.action.admin.indices.create.CreateIndexResponse;
import org.elasticsearch.action.bulk.BulkRequest;
import org.elasticsearch.action.bulk.BulkResponse;
import org.elasticsearch.action.index.IndexRequest;
import org.elasticsearch.client.RequestOptions;
import org.elasticsearch.client.RestHighLevelClient;
import org.elasticsearch.common.settings.Settings;
import org.elasticsearch.common.xcontent.XContentBuilder;
import org.elasticsearch.common.xcontent.XContentType;
import org.elasticsearch.common.xcontent.json.JsonXContent;
import org.junit.Test;

import java.util.ArrayList;
import java.util.Date;
import java.util.List;
import java.util.Random;

/**
 * @author : ryxiong728
 * @email : ryxiong728@126.com
 * @date : 3/1/21
 * @Description:
 */
public class InitDate {
    ObjectMapper mapper = new ObjectMapper();
    RestHighLevelClient client =  ESClient.getClient();
    String index = "sms-logs-index";
    String type="sms-logs-type";

    @Test
    public void createIndex() throws  Exception{
        // 1.准备关于索引的setting
        Settings.Builder settings = Settings.builder()
                .put("number_of_shards", 5)
                .put("number_of_replicas", 1);

        // 2.准备关于索引的mapping
        XContentBuilder mappings = JsonXContent.contentBuilder()
                .startObject()
                .startObject("properties")
                .startObject("corpName")
                .field("type", "keyword")
                .endObject()
                .startObject("createDate")
                .field("type", "date")
                .field("format", "yyyy-MM-dd")
                .endObject()
                .startObject("fee")
                .field("type", "long")
                .endObject()
                .startObject("ipAddr")
                .field("type", "ip")
                .endObject()
                .startObject("longCode")
                .field("type", "keyword")
                .endObject()
                .startObject("mobile")
                .field("type", "keyword")
                .endObject()
                .startObject("operatorId")
                .field("type", "integer")
                .endObject()
                .startObject("province")
                .field("type", "keyword")
                .endObject()
                .startObject("replyTotal")
                .field("type", "integer")
                .endObject()
                .startObject("sendDate")
                .field("type", "date")
                .field("format", "yyyy-MM-dd")
                .endObject()
                .startObject("smsContent")
                .field("type", "text")
                .field("analyzer", "ik_max_word")
                .endObject()
                .startObject("state")
                .field("type", "integer")
                .endObject()
                .endObject()
                .endObject();
        // 3.将settings和mappings 封装到到一个Request对象中
        CreateIndexRequest request = new CreateIndexRequest(index)
                .settings(settings)
                .mapping(type, mappings);
        // 4.使用client 去连接ES
        CreateIndexResponse response = client.indices().create(request, RequestOptions.DEFAULT);

        System.out.println("response:"+response.toString());

    }

    @Test
    public void  bulkCreateDoc() throws  Exception{
        // 1.准备多个json 对象
        String longCode = "1008687";
        String mobile ="138340658";
        List<String> companies = new ArrayList<String>();
        companies.add("腾讯课堂");
        companies.add("阿里旺旺");
        companies.add("海尔电器");
        companies.add("海尔智家公司");
        companies.add("格力汽车");
        companies.add("苏宁易购");
        companies.add("盒马鲜生");
        companies.add("途虎养车");
        List<String> provinces = new ArrayList<String>();
        provinces.add("北京");
        provinces.add("重庆");
        provinces.add("上海");
        provinces.add("晋城");
        provinces.add("深圳");
        provinces.add("武汉");
        Random random = new Random();
        BulkRequest bulkRequest = new BulkRequest();
        for (int i = 1; i <20 ; i++) {
            Thread.sleep(1000);
            SmsLogs s1 = new SmsLogs();
            s1.setId(i);
            s1.setCreateDate(new Date((int) (Math.random() * (854526980000L + 1 - 852526980000L)) + 852526980000L));
            s1.setSendDate(new Date((int) (Math.random() * (854526980000L + 1 - 852526980000L)) + 852526980000L));
            s1.setLongCode(longCode+i);
            s1.setMobile(mobile+2*i);
            s1.setCorpName(companies.get(random.nextInt(companies.size())));
            s1.setSmsContent(SmsLogs.doc.substring((i-1)*20,i*20));
            s1.setState(i%2);
            s1.setOperatorId(i%3);
            s1.setProvince(provinces.get(random.nextInt(provinces.size())));
            s1.setIpAddr("127.0.0."+i);
            s1.setReplyTotal(i*3);
            s1.setFee(i*6+"");
            String json1  = mapper.writeValueAsString(s1);
            bulkRequest.add(new IndexRequest(index,type,s1.getId().toString()).source(json1, XContentType.JSON));
            System.out.println("数据"+i+s1.toString());
        }

        // 3.client 执行
        BulkResponse responses = client.bulk(bulkRequest, RequestOptions.DEFAULT);

        // 4.输出结果
        System.out.println(responses.getItems().toString());
    }
}

ElasticSearch查询

Term&terms查询

term查询

term的查询代表完全匹配, 搜索之前不会对你搜索的关键词进行分词,直接去文档分词库中匹配内容

# term查询
POST /sms-logs-index/sms-logs-type/_search
{
  "from": 0,  # limit起始
  "size": 5,  # limit查询条数
  "query": {
    "term": {  # 查询类型, term全匹配
      "province": {
        "value": "北京"
      }
    }
  }
}

返回

{
  "took" : 31,
  "timed_out" : false,
  "_shards" : {
    "total" : 5,
    "successful" : 5,
    "skipped" : 0,
    "failed" : 0
  },
  "hits" : {
    "total" : 4,
    "max_score" : 1.3862944,
    "hits" : [
      {
        "_index" : "sms-logs-index",
        "_type" : "sms-logs-type",
        "_id" : "12",
        "_score" : 1.3862944,
        "_source" : {
          "createDate" : "1997-01-20",
          "sendDate" : "1997-01-19",
          "longCode" : "100868712",
          "mobile" : "13834065824",
          "corpName" : "海尔智家公司",
          "smsContent" : "了好久买多大的屏,全凭感觉和运气,最后确",
          "state" : 0,
          "operatorId" : 0,
          "province" : "北京",
          "ipAddr" : "127.0.0.12",
          "replyTotal" : 36,
          "fee" : "72"
        }
      },
      ...
      }
    ]
  }
}

java代码实现

package com.example.test;

import com.example.utils.ESClient;
import com.fasterxml.jackson.databind.ObjectMapper;
import org.elasticsearch.action.search.SearchRequest;
import org.elasticsearch.action.search.SearchResponse;
import org.elasticsearch.client.RequestOptions;
import org.elasticsearch.client.RestHighLevelClient;
import org.elasticsearch.index.query.QueryBuilders;
import org.elasticsearch.search.SearchHit;
import org.elasticsearch.search.builder.SearchSourceBuilder;
import org.junit.Test;

import java.io.IOException;
import java.util.Map;

/**
 * @author : ryxiong728
 * @email : ryxiong728@126.com
 * @date : 3/2/21
 * @Description:
 */
public class Demo04Query {
    ObjectMapper mapper = new ObjectMapper();
    RestHighLevelClient client = ESClient.getClient();
    String index = "sms-logs-index";
    String type = "sms-logs-type";

    @Test
    public void termQuery() throws IOException {
        // 1. 创建Request对象
        SearchRequest request = new SearchRequest(index);
        request.types(type);

        // 2. 指定查询条件
        SearchSourceBuilder builder = new SearchSourceBuilder();
        builder.from(0);
        builder.size(5);
        builder.query(QueryBuilders.termQuery("province", "北京"));
        request.source(builder);

        // 3. 执行查询
        SearchResponse response = client.search(request, RequestOptions.DEFAULT);

        // 4. 获取_source中的数据, 并展示
        for (SearchHit hit : response.getHits().getHits()) {
            Map<String, Object> result = hit.getSourceAsMap();
            System.out.println(result);
        }
    }
}

terms查询

与term查询机制一样, 不会对查询关键字分词, 直接匹配

不同点:

terms针对一个 字段包含多个值的时候使用

如:

  • term: where province="北京"
  • terms: where province="北京" or province="?"
# terms查询
POST /sms-logs-index/sms-logs-type/_search
{
  "query": {
    "terms": {
      "province": [
        "北京",
        "武汉"
        ]
    }
  }
}

java代码实现

@Test
public void termsQuery() throws IOException {
    // 1. 创建Request对象
    SearchRequest request = new SearchRequest(index);
    request.types(type);

    // 2. 指定查询条件
    SearchSourceBuilder builder = new SearchSourceBuilder();
    builder.query(QueryBuilders.termsQuery("province", "北京", "武汉"));
    request.source(builder);

    // 3. 执行查询
    SearchResponse response = client.search(request, RequestOptions.DEFAULT);

    // 4. 获取_source中的数据, 并展示
    for (SearchHit hit : response.getHits().getHits()) {
        Map<String, Object> result = hit.getSourceAsMap();
        System.out.println(result);
    }
}

match查询

match查询属于高层查询, 她会根据你查询的字段类型不一样, 采用不同的查询方式

  • 查询的是日期或者数值的话, 他会将你基于的字符串查询内容转换为日期或数值对待
  • 如果查询的内容是一个不能被分词的内容(keyword), match查询不会对你指定的查询关键字进行分词
  • 如果查询内容是一个可以被分词的内容(text), match会将指定的查询内容根据一定方式去分词, 去分词库匹配指定的内容

match底层实际是多个term查询, 将查到的结果封装在一起

match_all查询

查询全部结果

# match_all查询
POST /sms-logs-index/sms-logs-type/_search
{
  "query": {
    "match_all": {}
  }
}

java代码实现

package com.example.test;

import com.example.utils.ESClient;
import com.fasterxml.jackson.databind.ObjectMapper;
import org.elasticsearch.action.search.SearchRequest;
import org.elasticsearch.action.search.SearchResponse;
import org.elasticsearch.client.RequestOptions;
import org.elasticsearch.client.RestHighLevelClient;
import org.elasticsearch.index.query.QueryBuilders;
import org.elasticsearch.search.SearchHit;
import org.elasticsearch.search.builder.SearchSourceBuilder;
import org.junit.Test;

import java.io.IOException;
import java.util.Map;

/**
 * @author : ryxiong728
 * @email : ryxiong728@126.com
 * @date : 3/2/21
 * @Description:
 */
public class Demo05Match {
    ObjectMapper mapper = new ObjectMapper();
    RestHighLevelClient client = ESClient.getClient();
    String index = "sms-logs-index";
    String type = "sms-logs-type";

    @Test
    public void matchAllQuery() throws IOException {
        // 1. 创建Request对象
        SearchRequest request = new SearchRequest(index);
        request.types(type);

        // 2. 指定查询条件
        SearchSourceBuilder builder = new SearchSourceBuilder();
        builder.query(QueryBuilders.matchAllQuery());
        builder.size(20);  // es默认只查询10条, 查询更多需要指定
        request.source(builder);

        // 3. 执行查询
        SearchResponse response = client.search(request, RequestOptions.DEFAULT);

        // 4. 获取_source中的数据, 并展示
        for (SearchHit hit : response.getHits().getHits()) {
            Map<String, Object> result = hit.getSourceAsMap();
            System.out.println(result);
        }
        System.out.println(response.getHits().getHits().length);
    }
}

match查询

指定field查询条件

# match查询
POST /sms-logs-index/sms-logs-type/_search
{
  "query": {
    "match": {
      "smsContent": "面积"
    }
  }
}

java代码实现

@Test
public void matchQuery() throws IOException {
    // 1. 创建Request对象
    SearchRequest request = new SearchRequest(index);
    request.types(type);

    // 2. 指定查询条件
    SearchSourceBuilder builder = new SearchSourceBuilder();
    builder.query(QueryBuilders.matchQuery("smsContent", "面积"));
    request.source(builder);

    // 3. 执行查询
    SearchResponse response = client.search(request, RequestOptions.DEFAULT);

    // 4. 获取_source中的数据, 并展示
    for (SearchHit hit : response.getHits().getHits()) {
        Map<String, Object> result = hit.getSourceAsMap();
        System.out.println(result);
    }
    System.out.println(response.getHits().getHits().length);
}

布尔match查询

基于一个field查询条件,进行and或or的连接方式查询

# 布尔match查询
POST /sms-logs-index/sms-logs-type/_search
{
  "query": {
    "match": {
      "smsContent": {
        "query": "孩子 团队",
        "operator": "or"  # "operator": "and" 按照and,既包含"孩子"又包含"团队"的
      }
    }
  }
}

java代码实现

@Test
public void booleanMatchQuery() throws IOException {
    // 1. 创建Request对象
    SearchRequest request = new SearchRequest(index);
    request.types(type);

    // 2. 指定查询条件
    SearchSourceBuilder builder = new SearchSourceBuilder();
    builder.query(QueryBuilders.matchQuery("smsContent", "团队 孩子").operator(Operator.OR));
    request.source(builder);

    // 3. 执行查询
    SearchResponse response = client.search(request, RequestOptions.DEFAULT);

    // 4. 获取_source中的数据, 并展示
    for (SearchHit hit : response.getHits().getHits()) {
        Map<String, Object> result = hit.getSourceAsMap();
        System.out.println(result);
    }
    System.out.println(response.getHits().getHits().length);
}

multi_match查询

match针对一个field做检索, multi_match针对多个field进行检索, 多个filed针对一个text

# multi_match查询
POST /sms-logs-index/sms-logs-type/_search
{
  "query": {
    "multi_match": {
      "query": "北京",
      "fields": ["province", "smsContent"]
    }
  }
}
# 省份或信息中包含"北京"的都符合

java代码实现

@Test
public void multiMatchQuery() throws IOException {
    // 1. 创建Request对象
    SearchRequest request = new SearchRequest(index);
    request.types(type);

    // 2. 指定查询条件
    SearchSourceBuilder builder = new SearchSourceBuilder();
    builder.query(QueryBuilders.multiMatchQuery("北京", "province", "smsContent").operator(Operator.OR));
    request.source(builder);

    // 3. 执行查询
    SearchResponse response = client.search(request, RequestOptions.DEFAULT);

    // 4. 获取_source中的数据, 并展示
    for (SearchHit hit : response.getHits().getHits()) {
        Map<String, Object> result = hit.getSourceAsMap();
        System.out.println(result);
    }
    System.out.println(response.getHits().getHits().length);
}

其他查询

id查询

# id查询
GET /sms-logs-index/sms-logs-type/1

java代码实现

package com.example.test;

import com.example.utils.ESClient;
import com.fasterxml.jackson.databind.ObjectMapper;
import org.elasticsearch.action.get.GetRequest;
import org.elasticsearch.action.get.GetResponse;
import org.elasticsearch.client.RequestOptions;
import org.elasticsearch.client.RestHighLevelClient;
import org.junit.Test;

import java.io.IOException;

/**
 * @author : ryxiong728
 * @email : ryxiong728@126.com
 * @date : 3/2/21
 * @Description:
 */
public class demo06Other {
    ObjectMapper mapper = new ObjectMapper();
    RestHighLevelClient client = ESClient.getClient();
    String index = "sms-logs-index";
    String type = "sms-logs-type";

    @Test
    public void findById() throws IOException {
        // 1. 创建GetRequest
        GetRequest request = new GetRequest(index, type, "1");

        // 2. 执行查询
        GetResponse response = client.get(request, RequestOptions.DEFAULT);

        // 3. 输出结果
        System.out.println(response.getSourceAsMap());
    }
}

Ids查询

根据多个id查询, 类似Mysql中的where id in (id1, id2, id3...)

# ids查询
POST /sms-logs-index/sms-logs-type/_search
{
  "query": {
    "ids": {
      "values": ["1", "2", "100"]
    }
  }
}

java代码实现

@Test
public void findByIds() throws IOException {
    // 1. 创建searchRequest对象
    SearchRequest request = new SearchRequest(index);
    request.types(type);

    // 2. 指定查询条件
    SearchSourceBuilder builder = new SearchSourceBuilder();
    builder.query(QueryBuilders.idsQuery().addIds("1", "2", "100"));
    request.source(builder);

    // 3. 执行
    SearchResponse response = client.search(request, RequestOptions.DEFAULT);

    // 4. 获取结果
    for (SearchHit hit : response.getHits().getHits()) {
        Map<String, Object> result = hit.getSourceAsMap();
        System.out.println(result);
    }

}

prefix查询

前缀查询, 通过一个关键字去指定一个field的前缀, 从而查询到指定的文档

# prefix查询
POST /sms-logs-index/sms-logs-type/_search
{
  "query": {
    "prefix": {
      "corpName": {
        "value": "海尔"
      }
    }
  }
}

java代码实现

@Test
public void findByPrefix() throws IOException {
    // 1. 创建searchRequest对象
    SearchRequest request = new SearchRequest(index);
    request.types(type);

    // 2. 指定查询条件
    SearchSourceBuilder builder = new SearchSourceBuilder();
    builder.query(QueryBuilders.prefixQuery("corpName", "海尔"));
    request.source(builder);

    // 3. 执行
    SearchResponse response = client.search(request, RequestOptions.DEFAULT);

    // 4. 获取结果
    for (SearchHit hit : response.getHits().getHits()) {
        Map<String, Object> result = hit.getSourceAsMap();
        System.out.println(result);
    }
}

fuzzy查询

模糊查询

输入字符的大概, ES可以根据输入的内容进行查询, 即时有错别字也可以, 查询结果相应不会太精确

# fuzzy查询
POST /sms-logs-index/sms-logs-type/_search
{
  "query": {
    "fuzzy": {
      "corpName": {
        "value": "苏拧易购",
        "prefix_length": 1  # 指定前面几个字符不允许出错
      }
    }
  }
}

java代码实现

@Test
public void findByFuzzy() throws IOException {
    // 1. 创建searchRequest对象
    SearchRequest request = new SearchRequest(index);
    request.types(type);

    // 2. 指定查询条件
    SearchSourceBuilder builder = new SearchSourceBuilder();
    builder.query(QueryBuilders.fuzzyQuery("corpName", "苏宁易购").prefixLength(2));
    request.source(builder);

    // 3. 执行
    SearchResponse response = client.search(request, RequestOptions.DEFAULT);

    // 4. 获取结果
    for (SearchHit hit : response.getHits().getHits()) {
        Map<String, Object> result = hit.getSourceAsMap();
        System.out.println(result);
    }
}

wildcard查询

通配查询, 和Mysql中的like是一个套路, 在查询时, 指定通配符*和占位符

# wildcard查询公司以"海尔"开头的
POST /sms-logs-index/sms-logs-type/_search
{
  "query": {
    "wildcard": {
      "corpName": {
        "value": "海尔*"  # 可以使用 * 和 ? 指定通配符和占位符
      }
    }
  }
}

java代码实现

@Test
public void findByWillCard() throws IOException {
    // 1. 创建searchRequest对象
    SearchRequest request = new SearchRequest(index);
    request.types(type);

    // 2. 指定查询条件
    SearchSourceBuilder builder = new SearchSourceBuilder();
    builder.query(QueryBuilders.wildcardQuery("corpName", "海尔*"));
    request.source(builder);

    // 3. 执行
    SearchResponse response = client.search(request, RequestOptions.DEFAULT);

    // 4. 获取结果
    for (SearchHit hit : response.getHits().getHits()) {
        Map<String, Object> result = hit.getSourceAsMap();
        System.out.println(result);
    }
}

range查询

范围查询, 只针对数值类型, 对某一个field进行大于或者小于的范围指定

# range查询
POST /sms-logs-index/sms-logs-type/_search
{
  "query": {
    "range": {
      "fee": {
        "gte": 10,
        "lte": 50  # gt >, lt <, gte >=, lte <=
      }
    }
  }
}

java代码实现

@Test
public void findByRange() throws IOException {
    // 1. 创建searchRequest对象
    SearchRequest request = new SearchRequest(index);
    request.types(type);

    // 2. 指定查询条件
    SearchSourceBuilder builder = new SearchSourceBuilder();
    builder.query(QueryBuilders.rangeQuery("fee").lt(50).gt(10));
    request.source(builder);

    // 3. 执行
    SearchResponse response = client.search(request, RequestOptions.DEFAULT);

    // 4. 获取结果
    for (SearchHit hit : response.getHits().getHits()) {
        Map<String, Object> result = hit.getSourceAsMap();
        System.out.println(result);
    }
}

regexp查询

正则表达式查询, 根据编写的正则表达式去匹配内容

注意: prefix, willcard, fuzzy和regexp查询效率相对较低, 对效率要求高时, 避免使用

# regexp查询 电话以38结尾的
POST /sms-logs-index/sms-logs-type/_search
{
  "query": {
    "regexp": {
      "mobile": "[0-9]{9}38"
    }
  }
}

java代码实现

@Test
public void findByRegexp() throws IOException {
    // 1. 创建searchRequest对象
    SearchRequest request = new SearchRequest(index);
    request.types(type);

    // 2. 指定查询条件
    SearchSourceBuilder builder = new SearchSourceBuilder();
    builder.query(QueryBuilders.regexpQuery("mobile", "[0-9]{9}38"));
    request.source(builder);

    // 3. 执行
    SearchResponse response = client.search(request, RequestOptions.DEFAULT);

    // 4. 获取结果
    for (SearchHit hit : response.getHits().getHits()) {
        Map<String, Object> result = hit.getSourceAsMap();
        System.out.println(result);
    }
}

深分页Scroll

ES对from + size是有限制的, from和size二者之和不能超过1W

原理:

from + size 在ES查询数据的方式

  1. 将用户指定的关键字进行分词,
  2. 将词汇去分词库中进行检索, 得到多个文档id
  3. 去各个分片中去拉取指定的数据, 耗时较长
  4. 将数据根据score分数进行排序, 耗时较长
  5. 根据from的值,将查询的数据进行取舍
  6. 返回结果

Scroll + size 在ES中查询数据的方式

  1. 将用户指定的关键字进行分词,
  2. 将词汇去分词库中进行检索, 得到多个文档id
  3. 将文档的id存放在es的上下文中
  4. 根据指定的size去ES中检索指定的数据, 拿完数据的文档id, 会从上下文中移除
  5. 如果需要下一页数据, 直接去ES的上下文中, 找后续内容
  6. 循环第四和第五步,获取查询内容

Scroll查询方式, 不适合做实时的查询

# Scroll查询, 返回第一页数据, 将文档id存放在ES上下文中, 指定生存时间1m
POST /sms-logs-index/sms-logs-type/_search?scroll=1m
{
  "query": {
    "match_all": {}
  },
  "size": 2,
  "sort": [
    {
      "fee": {
        "order": "desc"
      }
    }
  ]
}

# 根据scroll查询下一页数据
POST /_search/scroll
{
  "scroll_id": "DnF1ZXJ5VGhlbkZldGNoBQAAAAAAAWlZFnhjSU5MM0RBUk9hb2Eza1g5OWtzbncAAAAAAAFpWhZ4Y0lOTDNEQVJPYW9hM2tYOTlrc253AAAAAAABaVsWeGNJTkwzREFST2FvYTNrWDk5a3NudwAAAAAAAWlcFnhjSU5MM0RBUk9hb2Eza1g5OWtzbncAAAAAAAFpXRZ4Y0lOTDNEQVJPYW9hM2tYOTlrc253",  # 根据第一步得到的scroll_id
  "scroll": "1m"  # scroll信息的生存时间
}

# 删除scroll在ES上下文中的数据
DELETE /_search/scroll/DnF1ZXJ5VGhlbkZldGNoBQAAAAAAAWlZFnhjSU5MM0RBUk9hb2Eza1g5OWtzbncAAAAAAAFpWhZ4Y0lOTDNEQVJPYW9hM2tYOTlrc253AAAAAAABaVsWeGNJTkwzREFST2FvYTNrWDk5a3NudwAAAAAAAWlcFnhjSU5MM0RBUk9hb2Eza1g5OWtzbncAAAAAAAFpXRZ4Y0lOTDNEQVJPYW9hM2tYOTlrc253 # scroll_id

java代码实现

package com.example.test;

import com.example.utils.ESClient;
import com.fasterxml.jackson.databind.ObjectMapper;
import org.elasticsearch.action.get.GetRequest;
import org.elasticsearch.action.get.GetResponse;
import org.elasticsearch.action.search.*;
import org.elasticsearch.client.RequestOptions;
import org.elasticsearch.client.RestHighLevelClient;
import org.elasticsearch.common.unit.TimeValue;
import org.elasticsearch.index.query.QueryBuilders;
import org.elasticsearch.search.SearchHit;
import org.elasticsearch.search.builder.SearchSourceBuilder;
import org.elasticsearch.search.sort.SortOrder;
import org.junit.Test;

import java.io.IOException;
import java.util.Map;

/**
 * @author : ryxiong728
 * @email : ryxiong728@126.com
 * @date : 3/2/21
 * @Description:
 */
public class demo07Scroll {
    ObjectMapper mapper = new ObjectMapper();
    RestHighLevelClient client = ESClient.getClient();
    String index = "sms-logs-index";
    String type = "sms-logs-type";

    @Test
    public void scrollQuery() throws IOException {
        // 1. 创建searchRequest对象
        SearchRequest request = new SearchRequest(index);
        request.types(type);

        // 2. 指定scroll信息
        request.scroll(TimeValue.timeValueMinutes(1L));

        // 3. 指定查询条件
        SearchSourceBuilder builder = new SearchSourceBuilder();
        builder.size(4);
        builder.sort("fee", SortOrder.DESC);
        builder.query(QueryBuilders.matchAllQuery());
        request.source(builder);

        // 4. 获取返回结果的scrollId, source
        SearchResponse response = client.search(request, RequestOptions.DEFAULT);

        String scrollId = response.getScrollId();
        System.out.println("--------首页-------");
        for (SearchHit hit : response.getHits().getHits()) {
            System.out.println(hit.getSourceAsMap());
        }

        while (true) {
            // 5. 创建SearchScrollRequest
            SearchScrollRequest scrollRequest = new SearchScrollRequest(scrollId);

            // 6. 指定scrollId的生存时间
            scrollRequest.scroll(TimeValue.timeValueMinutes(1L));

            // 7. 执行查询获取返回结果
            SearchResponse scrollResponse = client.scroll(scrollRequest, RequestOptions.DEFAULT);

            // 8. 判断是否查询到了数据输出
            SearchHit[] hits = scrollResponse.getHits().getHits();
            if (hits != null && hits.length > 0) {
                System.out.println("---------下一页--------");
                for (SearchHit hit : hits) {
                    System.out.println(hit.getSourceAsMap());
                }
            } else {
                // 9. 判断没有查询到的数据- 退出循环
                System.out.println("---------结束--------");
                break;
            }
        }
        // 10. 创建ClearScrollRequest
        ClearScrollRequest clearScrollRequest = new ClearScrollRequest();

        // 11. 指定ScrollId
        clearScrollRequest.addScrollId(scrollId);

        // 12. 删除scrollId
        ClearScrollResponse clearScrollResponse = client.clearScroll(clearScrollRequest, RequestOptions.DEFAULT);

        // 13. 输出结果
        System.out.println("删除scroll" + clearScrollResponse.isSucceeded());
    }

}

delete-by-query

根据term, match等查询方式去删除大量的文档

注意: 如果你需要删除的内容, 是index下的大部分数据, 推荐创建一个全新的index, 将保留的文档内容, 添加到全新的索引

# delete-by-query查询删除
POST /sms-logs-index/sms-logs-type/_delete_by_query
{
  "query": {
    "range": {
      "fee": {
        "gte": 10,
        "lte": 15
      }
    }
  }
}

java代码实现

package com.example.test;

import com.example.utils.ESClient;
import com.fasterxml.jackson.databind.ObjectMapper;
import org.elasticsearch.action.search.*;
import org.elasticsearch.client.RequestOptions;
import org.elasticsearch.client.RestHighLevelClient;
import org.elasticsearch.common.unit.TimeValue;
import org.elasticsearch.index.query.QueryBuilders;
import org.elasticsearch.index.reindex.BulkByScrollResponse;
import org.elasticsearch.index.reindex.DeleteByQueryRequest;
import org.elasticsearch.search.SearchHit;
import org.elasticsearch.search.builder.SearchSourceBuilder;
import org.elasticsearch.search.sort.SortOrder;
import org.junit.Test;

import java.io.IOException;

/**
 * @author : ryxiong728
 * @email : ryxiong728@126.com
 * @date : 3/2/21
 * @Description:
 */
public class demo08Delete {
    ObjectMapper mapper = new ObjectMapper();
    RestHighLevelClient client = ESClient.getClient();
    String index = "sms-logs-index";
    String type = "sms-logs-type";

    @Test
    public void deleteByQuery() throws IOException {
        // 1. 创建DeleteByQueryRequest
        DeleteByQueryRequest request = new DeleteByQueryRequest(index);
        request.types(type);

        // 2. 指定检索条件
        request.setQuery(QueryBuilders.rangeQuery("fee").gt(10).lt(20));

        // 3. 执行删除
        BulkByScrollResponse response = client.deleteByQuery(request, RequestOptions.DEFAULT);

        // 4. 输出返回结果
        System.out.println(response.toString());
    }
}

复合查询

bool查询

符合过滤器, 将你的多个查询条件, 以一定的逻辑组合在一起.

  • must: 所有条件都符合,表示and
  • must_not : 所有条件都不匹配, 表示not
  • should: 所有条件, 满足其一即可, 表示or
# 复合查询
# 1. 省份为武汉或北京
# 2. 运营商不是电信
# 3. smsContent中包含 客厅 和 面积
POST /sms-logs-index/sms-logs-type/_search
{
  "query": {
    "bool": {
      "should": [
        {
          "term": {
            "province": {
              "value": "北京"
            }
          }
        },
        {
          "term": {
            "province": {
              "value": "武汉"
            }
          }
        }
      ],
      "must_not": [
        {
          "term": {
            "operatorId": {
              "value": "3"
            }
          }
        }
      ],
      "must": [
        {
          "match": {
            "smsContent": "客厅"
          }
        },
        {
          "match": {
            "smsContent": "面积"
          }
        }
      ]
    }
  }
}

java代码实现

package com.example.test;

import com.example.utils.ESClient;
import com.fasterxml.jackson.databind.ObjectMapper;
import org.elasticsearch.action.search.SearchRequest;
import org.elasticsearch.action.search.SearchRequestBuilder;
import org.elasticsearch.action.search.SearchResponse;
import org.elasticsearch.client.RequestOptions;
import org.elasticsearch.client.RestHighLevelClient;
import org.elasticsearch.index.query.BoolQueryBuilder;
import org.elasticsearch.index.query.QueryBuilders;
import org.elasticsearch.index.reindex.BulkByScrollResponse;
import org.elasticsearch.search.SearchHit;
import org.elasticsearch.search.builder.SearchSourceBuilder;
import org.junit.Test;

import java.io.IOException;

/**
 * @author : ryxiong728
 * @email : ryxiong728@126.com
 * @date : 3/2/21
 * @Description:
 */
public class demo08complex {
    ObjectMapper mapper = new ObjectMapper();
    RestHighLevelClient client = ESClient.getClient();
    String index = "sms-logs-index";
    String type = "sms-logs-type";

    @Test
    public void boolQuery() throws IOException {
        // 1. 创建SearchRequest
        SearchRequest request = new SearchRequest(index);
        request.types(type);

        // 2. 指定检索条件
        SearchSourceBuilder builder = new SearchSourceBuilder();
        BoolQueryBuilder boolQueryBuilder = QueryBuilders.boolQuery();
        // 指定省份为武汉或北京
        boolQueryBuilder.should(QueryBuilders.termQuery("province", "武汉"));
        boolQueryBuilder.should(QueryBuilders.termQuery("province", "北京"));
        // 指定运营方不为电信
        boolQueryBuilder.mustNot(QueryBuilders.termQuery("operatorId", 3));
        // smsContent中包含 面积 和 客厅
        boolQueryBuilder.must(QueryBuilders.matchQuery("smsContent", "面积"));
        boolQueryBuilder.must(QueryBuilders.matchQuery("smsContent", "客厅"));

        builder.query(boolQueryBuilder);
        request.source(builder);

        // 3. 执行查询
        SearchResponse response = client.search(request, RequestOptions.DEFAULT);

        // 4. 输出返回结果
        for (SearchHit hit : response.getHits().getHits()) {
            System.out.println(hit.getSourceAsMap());
        }
    }

}

boosting查询

boosting查询可以帮助我们影响查询后的score

  • positive: 只有匹配上positive查询的内容, 才会放到返回的结果中
  • negative: 在匹配上positive后同时匹配上了negative, 可以降低这样的文档score
  • negative_boost: 指定降低的系数, 必须小于1.0

关于查询时 分数如何计算:

  1. 搜索的关键字在文档中出现的频次越高, 分数越高
  2. 指定的文档内容越短, 分数就越高
  3. 指定的关键字也会被分词, 被分词的内容在分词库匹配的个数越多, 分数越高
# boosting查询 客厅面积
POST /sms-logs-index/sms-logs-type/_search
{
  "query": {
    "boosting": {
      "positive": {
        "match": {
          "smsContent": "客厅面积"
        }
      },
      "negative": {
        "match": {
          "smsContent": "差不多"
        }
      },
      "negative_boost": 0.5
    }
  }
}

java代码实现

@Test
public void boostingQuery() throws IOException {
    // 1. 创建SearchRequest
    SearchRequest request = new SearchRequest(index);
    request.types(type);

    // 2. 指定检索条件
    SearchSourceBuilder builder = new SearchSourceBuilder();
    BoostingQueryBuilder boostingQueryBuilder = QueryBuilders.boostingQuery(
            QueryBuilders.matchQuery("smsContent", "客厅面积"),
            QueryBuilders.matchQuery("smsContent", "差不多")
    ).negativeBoost(0.5f);

    builder.query(boostingQueryBuilder);
    request.source(builder);

    // 3. 执行查询
    SearchResponse response = client.search(request, RequestOptions.DEFAULT);

    // 4. 输出返回结果
    for (SearchHit hit : response.getHits().getHits()) {
        System.out.println(hit.getSourceAsMap());
    }
}

filter查询

query和filter区别:

  • query: 根据你的查询条件, 去计算文档的匹配度获取一个分数, 根据分数进行排序, 不会作缓存
  • filter: 根据你的查询条件,去查询文档, 不会计算匹配分数, 但是filter会对经常被过滤的数据进行缓存
# filter查询
POST /sms-logs-index/sms-logs-type/_search
{
  "query": {
    "bool": {
      "filter": [
        {
          "term": {
            "corpName": "苏宁易购"
          }
        },
        {
          "range": {
            "fee": {
              "gt": 4
            }
          }
        }
      ]
    }
  }
}

java代码实现

package com.example.test;

import com.example.utils.ESClient;
import com.fasterxml.jackson.databind.ObjectMapper;
import org.elasticsearch.action.search.SearchRequest;
import org.elasticsearch.action.search.SearchResponse;
import org.elasticsearch.client.RequestOptions;
import org.elasticsearch.client.RestHighLevelClient;
import org.elasticsearch.index.query.BoolQueryBuilder;
import org.elasticsearch.index.query.BoostingQueryBuilder;
import org.elasticsearch.index.query.QueryBuilders;
import org.elasticsearch.search.SearchHit;
import org.elasticsearch.search.builder.SearchSourceBuilder;
import org.junit.Test;

import java.io.IOException;

/**
 * @author : ryxiong728
 * @email : ryxiong728@126.com
 * @date : 3/2/21
 * @Description:
 */
public class demo10Filter {
    ObjectMapper mapper = new ObjectMapper();
    RestHighLevelClient client = ESClient.getClient();
    String index = "sms-logs-index";
    String type = "sms-logs-type";

    @Test
    public void filter() throws IOException {
        // 1. 创建SearchRequest
        SearchRequest request = new SearchRequest(index);
        request.types(type);

        // 2. 指定检索条件
        SearchSourceBuilder builder = new SearchSourceBuilder();
        BoolQueryBuilder boolQueryBuilder = QueryBuilders.boolQuery();
        boolQueryBuilder.filter(QueryBuilders.termQuery("corpName", "苏宁易购"));
        boolQueryBuilder.filter(QueryBuilders.rangeQuery("fee").gt(10));

        builder.query(boolQueryBuilder);
        request.source(builder);

        // 3. 执行查询
        SearchResponse response = client.search(request, RequestOptions.DEFAULT);

        // 4. 输出返回结果
        for (SearchHit hit : response.getHits().getHits()) {
            System.out.println(hit.getSourceAsMap());
        }
    }
}

高亮查询

高亮查询, 将用户输入的关键字, 以一定的特殊样式展示给用户, 让用户知道为什么结果被检索出来

高亮展示的数据, 本身是文档中的一个field, 单独将Field以highlight的形式返回给你

ES中提供一个highlight属性, 和query同级别

  • fields: 指定那几个字段以高亮显示

  • fragment_size: 指定高亮数据展示多少个字符

  • pre_tags: 指定前缀标签, 如<font color="red">

  • post_tags: 指定后缀标签, 如</font>

image-20210302133006505

# highlight查询
POST /sms-logs-index/sms-logs-type/_search
{
  "query": {
    "match": {
      "smsContent": "面积"
    }
  },
  "highlight": {
    "fields": {
      "smsContent": {}
    },
    "pre_tags": "<font color='red'>",
    "post_tags": "</font>",
    "fragment_size": 10
  }
}

java代码实现

package com.example.test;

import com.example.utils.ESClient;
import com.fasterxml.jackson.databind.ObjectMapper;
import org.elasticsearch.action.search.SearchRequest;
import org.elasticsearch.action.search.SearchResponse;
import org.elasticsearch.client.RequestOptions;
import org.elasticsearch.client.RestHighLevelClient;
import org.elasticsearch.index.query.BoolQueryBuilder;
import org.elasticsearch.index.query.QueryBuilders;
import org.elasticsearch.search.SearchHit;
import org.elasticsearch.search.builder.SearchSourceBuilder;
import org.elasticsearch.search.fetch.subphase.highlight.HighlightBuilder;
import org.junit.Test;

import java.io.IOException;

/**
 * @author : ryxiong728
 * @email : ryxiong728@126.com
 * @date : 3/2/21
 * @Description:
 */
public class demo11HighLight {
    ObjectMapper mapper = new ObjectMapper();
    RestHighLevelClient client = ESClient.getClient();
    String index = "sms-logs-index";
    String type = "sms-logs-type";

    @Test
    public void highlightQuery() throws IOException {
        // 1. 创建SearchRequest
        SearchRequest request = new SearchRequest(index);
        request.types(type);

        // 2. 指定检索条件
        SearchSourceBuilder builder = new SearchSourceBuilder();
        // 2.1 指定查询条件
        builder.query(QueryBuilders.matchQuery("smsContent", "面积"));
        // 2.2 指定高亮
        HighlightBuilder highlightBuilder = new HighlightBuilder();
        highlightBuilder.field("smsContent", 10).preTags("<font color='red'>").postTags("</font>");

        builder.highlighter(highlightBuilder);
        request.source(builder);

        // 3. 执行查询
        SearchResponse response = client.search(request, RequestOptions.DEFAULT);

        // 4. 输出返回结果
        for (SearchHit hit : response.getHits().getHits()) {
            System.out.println(hit.getHighlightFields().get("smsContent"));
        }
    }
}

聚合查询

ES的聚合查询和mysql的聚合查询类似, 相比mysql更强大, 提供了多种多样的统计数据方法

# 聚合查询RESTful语法
POST /sms-logs-index/sms-logs-type/_search
{
	"aggs": {
        "名字(agg)": {
            "agg_type": {
                "属性": "值"
            }
        }
    }
}

去重计数查询

去重计数Cardinality

  1. 将返回的文档中的一个指定的field进行去重, 统计一共有多少条
# cardinality去重计数查询
POST /sms-logs-index/sms-logs-type/_search
{
  "aggs": {
    "agg": {
      "cardinality": {
        "field": "province"
      }
    }
  }
}

java代码实现

package com.example.test;

import com.example.utils.ESClient;
import com.fasterxml.jackson.databind.ObjectMapper;
import org.elasticsearch.action.search.SearchRequest;
import org.elasticsearch.action.search.SearchResponse;
import org.elasticsearch.client.RequestOptions;
import org.elasticsearch.client.RestHighLevelClient;
import org.elasticsearch.index.query.QueryBuilders;
import org.elasticsearch.search.SearchHit;
import org.elasticsearch.search.aggregations.Aggregation;
import org.elasticsearch.search.aggregations.AggregationBuilders;
import org.elasticsearch.search.aggregations.metrics.cardinality.Cardinality;
import org.elasticsearch.search.builder.SearchSourceBuilder;
import org.elasticsearch.search.fetch.subphase.highlight.HighlightBuilder;
import org.junit.Test;

import java.io.IOException;

/**
 * @author : ryxiong728
 * @email : ryxiong728@126.com
 * @date : 3/2/21
 * @Description:
 */
public class Demo12Aggs {
    ObjectMapper mapper = new ObjectMapper();
    RestHighLevelClient client = ESClient.getClient();
    String index = "sms-logs-index";
    String type = "sms-logs-type";

    @Test
    public void cardinality() throws IOException {
        // 1. 创建SearchRequest
        SearchRequest request = new SearchRequest(index);
        request.types(type);

        // 2. 指定使用的聚合查询方式
        SearchSourceBuilder builder = new SearchSourceBuilder();
        builder.aggregation(AggregationBuilders.cardinality("agg").field("province"));
        request.source(builder);

        // 3. 执行查询
        SearchResponse response = client.search(request, RequestOptions.DEFAULT);

        // 4. 输出返回结果
        Cardinality agg = response.getAggregations().get("agg");  // 向下转型
        long value = agg.getValue();
        System.out.println(value);
    }
}

范围统计

统计一定范围内出现的文档个数, 比如, 针对一个Field的值在0~100, 100~200等之间出现的个数分别是多少

范围统计可以针对普通的数值, 也可以针对时间类型, 针对ip类型都可以做响应的统计

  • range: 数值统计

    # 数值方式范围统计
    POST /sms-logs-index/sms-logs-type/_search
    {
      "aggs": {
        "agg": {
          "range": {
            "field": "fee",
            "ranges": [
              {
                "from": 10,
                "to": 50
              },
              {
                "from": 50,
                "to": 100
              },{
                "from": 100
              }
            ]
          }
        }
      }
    }
    
  • date_range: 时间统计

    # 时间方式范围统计
    POST /sms-logs-index/sms-logs-type/_search
    {
      "aggs": {
        "agg": {
          "range": {
            "field": "createDate",
            "format": "yyyy",
            "ranges": [
              {
                "to": 1996
              },
              {
                "from": 1996
              }
            ]
          }
        }
      }
    }
    
  • ip_range: ip统计

    # ip方式范围统计
    POST /sms-logs-index/sms-logs-type/_search
    {
      "aggs": {
        "agg": {
          "ip_range": {
            "field": "ipAddr",
            "ranges": [
              {
                "to": "127.0.0.10"
              },
              {
                "from": "127.0.0.10"
              }
            ]
          }
        }
      }
    }
    

java代码实现

数值方式范围统计

@Test
public void range() throws IOException {
    // 1. 创建SearchRequest
    SearchRequest request = new SearchRequest(index);
    request.types(type);

    // 2. 指定使用的聚合查询方式
    SearchSourceBuilder builder = new SearchSourceBuilder();
    builder.aggregation(AggregationBuilders.range("agg").field("fee")
            .addUnboundedTo(10)
            .addRange(10, 50)
            .addUnboundedFrom(50)
    );
    request.source(builder);

    // 3. 执行查询
    SearchResponse response = client.search(request, RequestOptions.DEFAULT);

    // 4. 输出返回结果
    Range agg = response.getAggregations().get("agg");
    for (Range.Bucket bucket : agg.getBuckets()) {
        String key = bucket.getKeyAsString();
        Object from = bucket.getFrom();
        Object to = bucket.getTo();
        long docCount = bucket.getDocCount();
        System.out.println(String.format("key: %s, from: %s, to: %s, doc: %s",key, from, to, docCount));
    }
}

其他类似

统计聚合

查询指定field的最大值, 最小值, 平均值, 平方和...

使用 extended_stats

# 统计聚合查询
POST /sms-logs-index/sms-logs-type/_search
{
  "aggs": {
    "agg": {
      "extended_stats": {
        "field": "fee"
      }
    }
  }
}

java实现统计聚合查询

@Test
public void extendedStats() throws IOException {
    // 1. 创建SearchRequest
    SearchRequest request = new SearchRequest(index);
    request.types(type);

    // 2. 指定使用的聚合查询方式
    SearchSourceBuilder builder = new SearchSourceBuilder();
    builder.aggregation(AggregationBuilders.extendedStats("agg").field("fee"));
    request.source(builder);

    // 3. 执行查询
    SearchResponse response = client.search(request, RequestOptions.DEFAULT);

    // 4. 输出返回结果
    ExtendedStats agg = response.getAggregations().get("agg");
    double max = agg.getMax();
    double min = agg.getMin();
    System.out.println("fee的最大是为:" + max + ", 最小值为: " + min);
}

地图经纬度搜索

ES中提供了一个数据类型geo_point, 用来存储经纬度

# 创建一个索引, 一个name, 一个location
PUT /map
{
  "settings": {
    "number_of_replicas": 1,
    "number_of_shards": 5
  },
  "mappings": {
    "map": {
      "properties": {
        "name": {
          "type": "text"
        },
        "location": {
          "type": "geo_point"
        }
      }
    }
  }
}

# 添加测试数据
PUT /map/map/1
{
  "name": "天安门",
  "location": {
    "lon": 116.402981,
    "lat": 39.914492
  }
}

PUT /map/map/2
{
  "name": "海淀公园",
  "location": {
    "lon": 116.302509,
    "lat": 39.991152
  }
}

PUT /map/map/3
{
  "name": "北京动物园",
  "location": {
    "lon": 116.343184,
    "lat": 39.947468
  }
}

ES地图检索方式

  • geo_distance: 直线距离检索方式
  • geo_bounding_box: 以两个点确定一个矩形, 获取矩形内的全部数据
  • geo_polygon: 以多个点确定一个多边形, 获取多边形内的全部数据

基于RESTful实现地图检索

geo_distance

# geo_distance
POST /map/map/_search
{
  "query": {
    "geo_distance": {
      "location": {  # 找一个目标点
        "lon": 116.433733,
        "lat": 39.908404
      },
      "distance": 3000,  # 确定半径
      "distance_type": "arc"  # 确定形状为园
    }
  }
}

geo_bounding_box

# geo_bounding_box
POST /map/map/_search
{
  "query": {
    "geo_bounding_box": {
      "location": {
        "top_left": {
          "lon": 116.326943,
          "lat": 39.95499
        },
        "bottom_right": {
          "lon": 116.347783,
          "lat": 39.939281
        }
      }
    }
  }
}

geo_polygen

# geo_polygon
POST /map/map/_search
{
  "query": {
    "geo_polygon": {
      "location": {
        "points": [
          {
            "lon": 116.298916,
            "lat": 39.99878
          },
          {
            "lon": 116.29561,
            "lat": 39.972576
          },
          {
            "lon": 116.327661,
            "lat": 39.984736
          }
        ]
      }
    }
  }
}

java实现代码geo_polygon

package com.example.test;

import com.example.utils.ESClient;
import org.elasticsearch.action.search.SearchRequest;
import org.elasticsearch.action.search.SearchResponse;
import org.elasticsearch.client.RequestOptions;
import org.elasticsearch.client.RestHighLevelClient;
import org.elasticsearch.common.geo.GeoPoint;
import org.elasticsearch.index.query.QueryBuilders;
import org.elasticsearch.search.SearchHit;
import org.elasticsearch.search.aggregations.AggregationBuilders;
import org.elasticsearch.search.aggregations.bucket.range.Range;
import org.elasticsearch.search.aggregations.metrics.cardinality.Cardinality;
import org.elasticsearch.search.aggregations.metrics.stats.extended.ExtendedStats;
import org.elasticsearch.search.builder.SearchSourceBuilder;
import org.junit.Test;

import java.io.IOException;
import java.util.ArrayList;
import java.util.List;

/**
 * @author : ryxiong728
 * @email : ryxiong728@126.com
 * @date : 3/2/21
 * @Description:
 */
public class Demo13GeoSearch {
    RestHighLevelClient client = ESClient.getClient();
    String index = "map";
    String type = "map";

    @Test
    public void geoPolygon() throws IOException {
        // 1. 创建SearchRequest
        SearchRequest request = new SearchRequest(index);
        request.types(type);

        // 2. 指定检索方式
        SearchSourceBuilder builder = new SearchSourceBuilder();
        List<GeoPoint> points = new ArrayList<GeoPoint>();
        points.add(new GeoPoint(39.99878, 116.298916));
        points.add(new GeoPoint(39.972576, 116.29561));
        points.add(new GeoPoint(39.984739, 116.327661));
        builder.query(QueryBuilders.geoPolygonQuery("location", points));
        request.source(builder);

        // 3. 执行查询
        SearchResponse response = client.search(request, RequestOptions.DEFAULT);

        // 4. 输出返回结果
        for (SearchHit hit : response.getHits().getHits()) {
            System.out.println(hit.getSourceAsMap());
        }
    }
}
posted @ 2021-03-02 15:22  ryxiong728  阅读(101)  评论(0编辑  收藏  举报