ElasticSearch教程入门到精通笔记

ElasticSearch教程入门到精通笔记

概念

  • ELK

    • Elasticsearch 存储搜索,分布式全文搜索引擎
    • Logstash(Beats) 采集数据
    • Kibana 页面展示
  • Elasticsearch和Solr都是基于Lucene开发的。Lucene只是一个提供全文搜索功能类库的核心工具包。

    • ES是基于JAVA开发的。
    • 9000是ES集群间的通信端口,9200是浏览器访问端口。
    • 基于RESTful风格的请求,JSON格式的出入参。
  • ES是面向文档型的数据库,一条数据等于一个文档。

  • 倒排索引:根据关键词搜索文档id。

入门安装

官网下载安装

1、需要替换为mac本地的jdk安装路径,es自带的jdk默认会被mac识别为位置软件被禁止。所以要修改bin目录下的启动脚本文件elasticsearch
增加export JAVA_HOME=/opt/homebrew/Cellar/openjdk@17
#查看jdk的路径
查看当前默认的 JDK 路径
/usr/libexec/java_home
#输出示例:
/Library/Java/JavaVirtualMachines/jdk-17.jdk/Contents/Home
#列出所有已安装的 JDK 路径
/usr/libexec/java_home -V

#修改config目录下的jvm.options设置堆大小
-Xms1g
-Xmx1g

2、运行es
./elasticsearch -d   #后台运行
./elasticsearch



3、#验证安装是否成功
curl http://localhost:9200

#问题1、执行测试连接报错:curl: (52) Empty reply from server,安全配置冲突,关闭安全功能(仅限开发环境):在 config/elasticsearch.yml 中添加:
xpack.security.enabled: false
xpack.security.http.ssl:
  enabled: false


#5、停止 Elasticsearch
如果是前台运行,按 Ctrl+C。

#如果是后台进程:
# 查找进程 ID
ps aux | grep elasticsearch
# 终止进程
kill -9 <pid>

启动脚本文件elasticsearch配置后如下:

#!/bin/bash
export JAVA_HOME=/opt/homebrew/Cellar/openjdk@17
CLI_NAME=server
CLI_LIBS=lib/tools/server-cli
source "`dirname "$0"`"/elasticsearch-cli

homebrew安装

todo

ik分词器安装

  • 首先到github官网上去找对对应版本的ik分词器
  • 下载后解压缩到plugin文件夹下,并且要删除该压缩包,否则启动会失败。
  • 如果有版本的问题也可以尝试修改plugin-descriptor.properties文件里的版本,如果修改后还是不能启动只能找对应的准确版本。

细节

image.png
image.png
image.png

增删改查

  • put请求要求是幂等性的,post不是幂等性的。
  • 全量数据的覆盖性修改用put(post也行);局部更新用post,因为局部更新不是幂等的,全量才是。并明确指定为_update,如果是_doc会被认为是新增。
  • 查询关键字大小写不敏感。
  • 查询的关键词匹配是根据倒排索引的分词决定的,只要关键字命中了倒排索引就会把所有匹配的并集返回。
  • 分词设置
    • type:keyword,表示不能分词,要完整匹配,比如reqid。用户id等。
    • index:true,表示这个字段是可以索引查询的

细节

image.png
image.png
get索引
image.png
创建文档
image.png
image.png
根据id查看文档
image.png
image.png

根据id修改文档
image.png
根据id删除文档
image.png
根据条件删除文档
image.png
image.png
创建索引映射
image.png
image.png
image.png

查找索引全部文档
image.png
image.png

条件查询
image.png
image.png
组合查询
image.png
范围查询
image.png
排序
image.png
image.png
高亮
image.png
分页查询
image.png
聚合查询-最大值
image.png
最小值
image.png
求和
image.png
平均值
image.png
去重计总数,cardinality是基数的意思
image.png
针对字段进行统计
image.png
image.png
分组统计
image.png
image.png

JavaAPI增删改查——7.8.0版本

pom依赖

   <properties>
        <!--        注意es7.8.0版本依赖的是java8,如果是7.*.*版本设置jdk版本是8-->
        <maven.compiler.source>8</maven.compiler.source>
        <maven.compiler.target>8</maven.compiler.target>
        <es.version>7.8.0</es.version>
    </properties>
    <dependencies>
<!--        注意es7.8.0版本依赖的是java8-->
        <dependency>
            <groupId>org.elasticsearch</groupId>
            <artifactId>elasticsearch</artifactId>
            <version>${es.version}</version>
        </dependency>
        <!-- elasticsearch的客户端 如果es服务端是8.0.0开始的版本,就不建议使用elasticsearch-rest-high-level-client-->
        <!--        Elasticsearch从7.15版本开始,RestHighLevelClient已经被标记为废弃,建议使用新的Java API客户端-->
        <dependency>
            <groupId>org.elasticsearch.client</groupId>
            <artifactId>elasticsearch-rest-high-level-client</artifactId>
            <version>${es.version}</version>
        </dependency>
        <!-- elasticsearch依赖 2.x的 log4j -->
        <dependency>
            <groupId>org.apache.logging.log4j</groupId>
            <artifactId>log4j-api</artifactId>
            <version>2.8.2</version>
        </dependency>
        <dependency>
            <groupId>org.apache.logging.log4j</groupId>
            <artifactId>log4j-core</artifactId>
            <version>2.8.2</version>
        </dependency>
        <dependency>
            <groupId>com.fasterxml.jackson.core</groupId>
            <artifactId>jackson-databind</artifactId>
            <version>2.15.3</version>
        </dependency>
        <!-- junit单元测试 -->
        <dependency>
            <groupId>junit</groupId>
            <artifactId>junit</artifactId>
            <version>4.12</version>
        </dependency>
        <dependency>
            <groupId>com.alibaba.fastjson2</groupId>
            <artifactId>fastjson2</artifactId>
            <version>2.0.34</version>
        </dependency>
        <dependency>
            <groupId>org.projectlombok</groupId>
            <artifactId>lombok</artifactId>
            <version>1.18.30</version>
        </dependency>
    </dependencies>

单元测试篇

image.png

测试类
对象类
@NoArgsConstructor
@Data
public class Person {
    private String name;
    private String sex;
    private Integer age;
    private String birthDate;
    private String about;
    private List<String> interests;


}
索引CRUD类
public class ESIndexTest {

    public static void main(String[] args) throws IOException {
        //创建客户端
        RestHighLevelClient client = new RestHighLevelClient(RestClient.builder(new HttpHost("localhost",9200,"http")));

        //创建索引
        createIndex(client);

        //查询索引
//        getIndex(client);

        //删除索引
//        deleteIndex(client);

        //close
        client.close();

    }

    private static void createIndex(RestHighLevelClient client) throws IOException {
        CreateIndexRequest indexRequest = new CreateIndexRequest("person");
        CreateIndexResponse person = client.indices().create(indexRequest,RequestOptions.DEFAULT);
        System.out.println(person.isAcknowledged());
    }

    private static void getIndex(RestHighLevelClient client) throws IOException {
        GetIndexRequest indexRequest = new GetIndexRequest("person");
        GetIndexResponse person = client.indices().get(indexRequest,RequestOptions.DEFAULT);
        System.out.println(person.getAliases());
        System.out.println(JSON.toJSONString(person.getMappings().entrySet()));
        System.out.println(person.getSettings());
    }

    private static void deleteIndex(RestHighLevelClient client) throws IOException {
        DeleteIndexRequest indexRequest = new DeleteIndexRequest("person");
        AcknowledgedResponse person = client.indices().delete(indexRequest,RequestOptions.DEFAULT);
        System.out.println(person.isAcknowledged());
    }

}
document文件CRUD类
package com.roy.test;

import com.alibaba.fastjson2.JSON;
import com.fasterxml.jackson.databind.ObjectMapper;
import com.roy.test.bean.Person;
import org.apache.http.HttpHost;
import org.elasticsearch.action.admin.indices.delete.DeleteIndexRequest;
import org.elasticsearch.action.bulk.BulkRequest;
import org.elasticsearch.action.bulk.BulkResponse;
import org.elasticsearch.action.delete.DeleteRequest;
import org.elasticsearch.action.delete.DeleteResponse;
import org.elasticsearch.action.get.GetRequest;
import org.elasticsearch.action.get.GetResponse;
import org.elasticsearch.action.index.IndexRequest;
import org.elasticsearch.action.index.IndexResponse;
import org.elasticsearch.action.search.SearchRequest;
import org.elasticsearch.action.search.SearchResponse;
import org.elasticsearch.action.support.master.AcknowledgedResponse;
import org.elasticsearch.action.update.UpdateRequest;
import org.elasticsearch.action.update.UpdateResponse;
import org.elasticsearch.client.RequestOptions;
import org.elasticsearch.client.RestClient;
import org.elasticsearch.client.RestHighLevelClient;
import org.elasticsearch.client.indices.CreateIndexRequest;
import org.elasticsearch.client.indices.CreateIndexResponse;
import org.elasticsearch.client.indices.GetIndexRequest;
import org.elasticsearch.client.indices.GetIndexResponse;
import org.elasticsearch.common.unit.Fuzziness;
import org.elasticsearch.common.xcontent.XContentType;
import org.elasticsearch.index.query.*;
import org.elasticsearch.search.SearchHit;
import org.elasticsearch.search.SearchHits;
import org.elasticsearch.search.aggregations.AggregationBuilder;
import org.elasticsearch.search.aggregations.AggregationBuilders;
import org.elasticsearch.search.builder.SearchSourceBuilder;
import org.elasticsearch.search.fetch.subphase.highlight.HighlightBuilder;
import org.elasticsearch.search.sort.SortOrder;

import javax.swing.text.Highlighter;
import java.io.IOException;
import java.util.Arrays;
import java.util.Collections;

/***
 * @ClassName: ESTest
 * @Description:
 * @version : 1.0
 */
public class ESDocumentTest {

    public static void main(String[] args) throws IOException {
        //创建客户端
        RestHighLevelClient client = new RestHighLevelClient(RestClient.builder(new HttpHost("localhost", 9200, "http")));

        //创建文档 mac m2芯片不支持7.8.0版本
        createDocument(client);

        //局部更新
//        updateDocument(client);

        //删除文档
//        deleteDocument(client);
        // 获取文档
//        getDocument(client);
        //批量插入
        batchInsertDocument(client);
        //批量删除
        batchDeleteDocument(client);
        //close
        client.close();

    }

    private static void createDocument(RestHighLevelClient client) throws IOException {
        IndexRequest indexRequest = new IndexRequest();
        indexRequest.index("person").id("3001");
        Person person = new Person();
        person.setName("艾米");
        person.setAbout("艾米帝国佣兵");
        person.setAge(18);
        person.setInterests(Collections.singletonList("贪财"));
        //Caused by: ElasticsearchException[Elasticsearch exception [type=illegal_argument_exception, reason=failed to parse date field [1880-02-02] with format [yyyy/MM/dd HH:mm:ss||yyyy/MM/dd||epoch_millis]]]; nested: ElasticsearchException[Elasticsearch exception [type=date_time_parse_exception, reason=Failed to parse with all enclosed parsers]];
        //默认的时间格式是yyyy/MM/dd HH:mm:ss,需要注意
        person.setBirthDate("1998/02/02 12:12:12");
//        ObjectMapper objectMapper = new ObjectMapper();
//        String value = objectMapper.writeValueAsString(person);
        String value = JSON.toJSONString(person);
        indexRequest.source(value, XContentType.JSON);
        IndexResponse response = client.index(indexRequest, RequestOptions.DEFAULT);
        System.out.println(response.getResult());
    }

    private static void updateDocument(RestHighLevelClient client) throws IOException {
        UpdateRequest updateRequest = new UpdateRequest();
        updateRequest.index("person").id("3001");
        //局部更新
        updateRequest.doc(XContentType.JSON,"name","大青山");
        UpdateResponse response = client.update(updateRequest, RequestOptions.DEFAULT);
        System.out.println(response.getResult());
    }

    private static void getDocument(RestHighLevelClient client) throws IOException {
        GetRequest request = new GetRequest();
        request.index("person").id("3001");
        GetResponse response = client.get(request, RequestOptions.DEFAULT);
        System.out.println(response.getSourceAsString());
    }

    private static void deleteDocument(RestHighLevelClient client) throws IOException {
        DeleteRequest request = new DeleteRequest();
        request.index("person").id("3001");
        DeleteResponse response = client.delete(request, RequestOptions.DEFAULT);
        System.out.println(response.getResult());
    }

    /**
     * 批量插入
     * @param client
     * @throws IOException
     */
    private static void batchInsertDocument(RestHighLevelClient client) throws IOException {
        BulkRequest bulkRequest = new BulkRequest();
        bulkRequest.add(new IndexRequest().index("person").id("5001").source(XContentType.JSON,"name","霍恩斯"));
        bulkRequest.add(new IndexRequest().index("person").id("5002").source(XContentType.JSON,"name","池傲天"));
        bulkRequest.add(new IndexRequest().index("person").id("5003").source(XContentType.JSON,"name","池长风"));
        BulkResponse response = client.bulk(bulkRequest, RequestOptions.DEFAULT);
        System.out.println(response.getItems());
    }

    private static void batchDeleteDocument(RestHighLevelClient client) throws IOException {
        BulkRequest bulkRequest = new BulkRequest();
        bulkRequest.add(new DeleteRequest().index("person").id("5001"));
        bulkRequest.add(new DeleteRequest().index("person").id("5002"));
        bulkRequest.add(new DeleteRequest().index("person").id("5003"));
        BulkResponse response = client.bulk(bulkRequest, RequestOptions.DEFAULT);
        System.out.println(response.getItems());
    }

    /**
     * 查询所有
     * @param client
     * @throws IOException
     */
    private static void queryAllDocument(RestHighLevelClient client) throws IOException {
        SearchRequest request = new SearchRequest();
        request.indices("person");
        request.source(new SearchSourceBuilder().query(QueryBuilders.matchAllQuery()));
        SearchResponse response = client.search(request, RequestOptions.DEFAULT);
        System.out.println(response.getHits());
        SearchHits hits = response.getHits();
        SearchHit[] hitsHits = hits.getHits();
        for (SearchHit hitsHit : hitsHits) {
            System.out.println(hitsHit.getSourceAsString());
        }
    }

    /**
     * 查询名称叫艾米的
     * @param client
     * @throws IOException
     */
    private static void queryConditionDocument(RestHighLevelClient client) throws IOException {
        SearchRequest request = new SearchRequest();
        request.indices("person");
        request.source(new SearchSourceBuilder().query(QueryBuilders.termQuery("name","艾米")));
        SearchResponse response = client.search(request, RequestOptions.DEFAULT);
        System.out.println(response.getHits());
        SearchHits hits = response.getHits();
        SearchHit[] hitsHits = hits.getHits();
        for (SearchHit hitsHit : hitsHits) {
            System.out.println(hitsHit.getSourceAsString());
        }
    }

    /**
     * 分页查询
     * @param client
     * @throws IOException
     */
    private static void queryConditionDocumentPage(RestHighLevelClient client) throws IOException {
        SearchRequest request = new SearchRequest();
        request.indices("person");
        SearchSourceBuilder builder = new SearchSourceBuilder().query(QueryBuilders.termQuery("name", "艾米"));
        builder.from(0);
        builder.size(10);
        //根据年龄升序排序
        builder.sort("age", SortOrder.ASC);
        //字段展示和排除过滤
        String[] includes = {"name,age"};
        String[] excludes = {};
        builder.fetchSource(includes,excludes);

        request.source(builder);
        SearchResponse response = client.search(request, RequestOptions.DEFAULT);
        System.out.println(response.getHits());
        SearchHits hits = response.getHits();
        SearchHit[] hitsHits = hits.getHits();
        for (SearchHit hitsHit : hitsHits) {
            System.out.println(hitsHit.getSourceAsString());
        }
    }


    /**
     * 组合查询
     * @param client
     * @throws IOException
     */
    private static void queryCondiDocument(RestHighLevelClient client) throws IOException {
        SearchRequest request = new SearchRequest();
        request.indices("person");
        BoolQueryBuilder boolQuery = QueryBuilders.boolQuery();
        //必须名称叫大青山且年龄是30
//        boolQuery.must(QueryBuilders.matchQuery("name","大青山"));
//        boolQuery.must(QueryBuilders.matchQuery("age",30));
//        boolQuery.mustNot(QueryBuilders.matchQuery("age",30));


        //年龄等于30或者20都可以
        boolQuery.should(QueryBuilders.matchQuery("age",20));
        boolQuery.should(QueryBuilders.matchQuery("age",30));

        SearchSourceBuilder builder = new SearchSourceBuilder().query(boolQuery);
        request.source(builder);
        SearchResponse response = client.search(request, RequestOptions.DEFAULT);
        System.out.println(response.getHits());
        SearchHits hits = response.getHits();
        SearchHit[] hitsHits = hits.getHits();
        for (SearchHit hitsHit : hitsHits) {
            System.out.println(hitsHit.getSourceAsString());
        }
    }

    /**
     * 范围查询
     * @param client
     * @throws IOException
     */
    private static void queryRangeDocument(RestHighLevelClient client) throws IOException {
        SearchRequest request = new SearchRequest();
        request.indices("person");
        //查询年龄在30-50范围内的数据
        RangeQueryBuilder rangeQuery = QueryBuilders.rangeQuery("age");
        rangeQuery.gte(30);
        rangeQuery.lt(50);

        SearchSourceBuilder builder = new SearchSourceBuilder().query(rangeQuery);
        request.source(builder);
        SearchResponse response = client.search(request, RequestOptions.DEFAULT);
        System.out.println(response.getHits());
        SearchHits hits = response.getHits();
        SearchHit[] hitsHits = hits.getHits();
        for (SearchHit hitsHit : hitsHits) {
            System.out.println(hitsHit.getSourceAsString());
        }
    }

    /**
     * 模糊查询
     * @param client
     * @throws IOException
     */
    private static void queryFuzzyDocument(RestHighLevelClient client) throws IOException {
        SearchRequest request = new SearchRequest();
        request.indices("person");
        //1位差别的模糊查询,可以设置多位
        FuzzyQueryBuilder fuzzyQueryBuilder = QueryBuilders.fuzzyQuery("name", "amy").fuzziness(Fuzziness.ONE);

        SearchSourceBuilder builder = new SearchSourceBuilder().query(fuzzyQueryBuilder);

        //设置高亮
        HighlightBuilder highlightBuilder =new HighlightBuilder();
        //前后缀标签设置
        highlightBuilder.preTags("<font color='red'>");
        highlightBuilder.postTags("</font>");
        //设置对哪个字段高亮
        highlightBuilder.field("name");
        builder.highlighter();

        request.source(builder);
        SearchResponse response = client.search(request, RequestOptions.DEFAULT);
        System.out.println(response.getHits());
        SearchHits hits = response.getHits();
        SearchHit[] hitsHits = hits.getHits();
        for (SearchHit hitsHit : hitsHits) {
            System.out.println(hitsHit.getSourceAsString());
        }
    }

    /**
     * 聚合查询
     * @param client
     * @throws IOException
     */
    private static void aggrDocument(RestHighLevelClient client) throws IOException {
        SearchRequest request = new SearchRequest();
        request.indices("person");

        SearchSourceBuilder builder = new SearchSourceBuilder();
        //求平均年龄,avg("avgAge")表示设置聚合名称
        AggregationBuilder aggregationBuilder = AggregationBuilders.avg("avgAge").field("age");
        //求最大年龄
//        AggregationBuilder aggregationBuilder = AggregationBuilders.max("maxAge").field("age");
        builder.aggregation(aggregationBuilder);
        request.source(builder);
        SearchResponse response = client.search(request, RequestOptions.DEFAULT);
        System.out.println(response.getHits());
        SearchHits hits = response.getHits();
        SearchHit[] hitsHits = hits.getHits();
        for (SearchHit hitsHit : hitsHits) {
            System.out.println(hitsHit.getSourceAsString());
        }
    }

    /**
     * 分组查询
     * @param client
     * @throws IOException
     */
    private static void groupDocument(RestHighLevelClient client) throws IOException {
        SearchRequest request = new SearchRequest();
        request.indices("person");

        SearchSourceBuilder builder = new SearchSourceBuilder();
        //根据年龄分组
        AggregationBuilder aggregationBuilder = AggregationBuilders.terms("ageGroup").field("age");
        builder.aggregation(aggregationBuilder);
        request.source(builder);
        SearchResponse response = client.search(request, RequestOptions.DEFAULT);
        System.out.println(response.getHits());
        SearchHits hits = response.getHits();
        SearchHit[] hitsHits = hits.getHits();
        for (SearchHit hitsHit : hitsHits) {
            System.out.println(hitsHit.getSourceAsString());
        }
    }
}

环境部署

  • 默认集群名称:elasticsearch。这个名字是重要的,因为一个节点只能通过指定某个集群的名字,来加入这个集群。

环境

image.png
image.png

单机

image.png
image.png

image.png
image.png
image.png
image.png
image.png

集群

image.png
image.png
image.png

进阶

  • Elasticsearch 索引的精髓:一切设计都是为了提高搜索的性能。

  • 分片和复制的数量可以在索引创建的时候指定。在索引创建之后,你可以在任何时候动态地改变复制的数量,但是你事后不能改变分片的数量。

  • 默认情况下一个索引的分片是1,副本是1。 索引创建的时候指定分片和副本数。

  • 加入新的节点后的分片和备份分配原则

    • 主分片不能和其副本分片在同一台主机节点上。
    • 保证分配均匀。
  • 索引的分片在创建时就固定了,不能调整,但是副本数量可以调整,可以通过提高副本数量来提高查询的吞吐量。扩容的最大数量是主分片和副本分片之和,也就是每个分片放一个节点。

  • 路由计算:hash(id)%主分片数量

  • 分片控制:用户可以访问任何一个节点获取数据,这个节点称为协调节点。如果访问的接口很忙,则会把查询请求转发到其他节点上去查询请求。

  • 倒排索引

    • 分词器
    • 设置为keyword不能被分词
    • ik_max_word:最细力度的拆分
    • ik_smart:最粗力度的拆分
    • 词条:索引中最小存储和查询单元
    • 词典:字典,是词条的集合,用B+树或者hashmap实现。
    • 倒排表
  • 文档搜索

  • 近实时搜索

  • 文档分析(分析器)

    • 字符过滤器
    • 分词器
    • Token过滤器

kibana

  • 官网下载
  • 注意配置项变化,8.x.x版本已经不需要配置索引名,否则报错,未知项
    • kibana.index: ".kibana"

kibana.yml配置

# 默认端口
server:
  port: 5601
# ES服务器的地址
elasticsearch:
  hosts: ["http://localhost:9200"]
# 索引名 新版本默认不需要添加
# kibana:
#   index: '.kibana'
# 支持中文
i18n:
  locale: "zh-CN"

细节图

路由计算和分配控制
image.png
image.png
写流程——以三分片一副本举例
image.png
保存一致性设置
image.png
image.png
查询流程
image.png

近实时搜索
image.png
image.png
ik分词器
image.png
image.png

image.png

映射
image.png
分片
image.png
路由
image.png

image.png
image.png
image.png

image.png
更新
image.png

image.png
image.png
刷新
image.png
image.png

集成

springdata官网

再通过询问deepseek获取示例:Elasticsearch服务器版本是8.17.3,springboot项目要使用Spring Data Elasticsearch进行增删改查,给出代码示例。

优化

硬件要求

  • SSD
  • RAID 0
  • 多块硬盘
  • 不要使用远程挂载的存储。比如NFS或者SMB/CIFS

分配策略

  • 一个分片类似于一个独立的搜索引擎(底层为一个Lucene索引),因此分片越多则消耗资源越多(文件句柄、内存、CPU)。分片并不是初始化设置越多越好。
  • 每一个搜索请求都会命中索引中的每一个分片,如果分片在相同的节点则会竞争相同的资源。
  • 计算相关度的词项统计信息是基于分片的,分片过多会导致相关度降低。

image.png

路由选择

image.png
image.png

写入速度优化

image.png

  • 可以先关闭副本写入,等写入完成了用于查询再打开副本数量。

内存设置

  • 默认内存1G, Xms 表示堆的初始大小, Xmx 表示可分配的最大内存,都是 1GB。

image.png

重要配置

image.png

image.png

面试题

image.png

image.png
image.png
image.png
image.png
image.png
image.png
image.png
image.png
image.png
image.png
image.png
image.png

参考

ElasticSearch教程入门到精通

posted @ 2025-04-22 02:03  卡斯特梅的雨伞  阅读(102)  评论(0)    收藏  举报