Elasticsearch(7.5.0版本) 导航学习

简介:Elasticsearch是基于Apache Lucene的开源搜索引擎,采用Java语言开发的,它提供了一个分布式、高扩展、高实时能力的全文搜索与数据分析引擎,适合作为Nosql数据存储,但缺少分布式事务。ES通过简单的Restful Api来隐藏Lucence的复杂性,从而让全文搜索变得简单。与Logstash、Kibana组合并称ELK,达到可以同时实现日志收集、日志搜索和日志分析的能力。

1. 基本概念

名称 说明
Node(节点) 单个装有Es服务并且提供故障转移和扩展服务器
Cluster(集群) 由一个或多个Node组织在一起共同工作,共同分享整个数据具有负载均衡功能的集群,只有一个主节点
Document(文档) 一个文档是一个可被索引的基础信息单元
Index(索引) 索引就是一个拥有几分相似特征的文档集合
Field(列) 列是Es的最小单位,相当于数据的某一列
Shards(索引分片) Es将索引分成若干份,分布到不同节点上,每个部分就是一个分片
Replicas(索引副本) Replicas是索引一份或几份拷贝,用于提高系统的容错性,且自动对搜索请求进行负载均衡

   数据类型

    keyword:索引结构化的字段,类似mysql中string;

    数组:ES没有专用的数组类型,默认情况下任何字段都可以包含一个或者多个值,但是一个数组中的值要是同一种类型,有字符数组、整形数组、嵌套数组和对象数组

2. Elasticsearch和Solr的比较

  A. 都是当前比较火的全文搜索引擎;

  B. Elasticsearch更侧重于实时的数据分析,Solr这方面效率较ES低;

  C. Solr支持的文本格式比较多,如HTML、PDF、Word、Excel、CVS等,而Elasticsearch只支持json的格式。

 

3. Maven依赖

复制代码
<dependency>
    <groupId>org.elasticsearch</groupId>
    <artifactId>elasticsearch</artifactId>
    <version>7.5.0</version>
</dependency>

<dependency>
    <groupId>org.elasticsearch.client</groupId>
    <artifactId>elasticsearch-rest-high-level-client</artifactId>
    <version>7.5.0</version>
</dependency>
复制代码

 

4. Elasticsearch配置类,注入restHighLevelClient

复制代码
package com.ruhuanxingyun.elasticsearch.config;

import org.apache.http.HttpHost;
import org.elasticsearch.client.RestClient;
import org.elasticsearch.client.RestClientBuilder;
import org.elasticsearch.client.RestHighLevelClient;
import org.springframework.beans.factory.annotation.Value;
import org.springframework.context.annotation.Bean;
import org.springframework.context.annotation.Configuration;

import java.util.Arrays;

/**
 * @description: Elasticsearch配置类
 * @author: ruphie
 * @date: Create in 2020/12/1 21:08
 * @company: ruhuanxingyun
 */
@Configuration
public class ElasticsearchConfig {

    @Value("${spring.data.elasticsearch.rest.hosts}")
    private String hosts;

    @Bean
    public RestClientBuilder restClientBuilder() {
        String[] hostsArr = hosts.split(",");
        HttpHost[] httpHosts = Arrays.stream(hostsArr)
                .map(host -> {
                    String[] hostArr = host.split(":");
                    String ip = hostArr[0];
                    int port = Integer.parseInt(hostArr[1]);

                    return new HttpHost(ip, port, HttpHost.DEFAULT_SCHEME_NAME);
                }).toArray(HttpHost[]::new);

        return RestClient.builder(httpHosts);
    }

    @Bean
    public RestHighLevelClient highLevelClient(RestClientBuilder restClientBuilder) {
        return new RestHighLevelClient(restClientBuilder);
    }

}
复制代码

 

5. 官网地址

  A. https://www.elastic.co/guide/en/elasticsearch/client/java-rest/current/index.html

  B. https://www.elastic.co/guide/en/elasticsearch/reference/7.5/index.html

 

6. 环境搭建

  A. Elasticsearch环境搭建:https://www.cnblogs.com/ruhuanxingyun/p/11399484.html

  B. Filebeat环境搭建:https://www.cnblogs.com/ruhuanxingyun/p/11414708.html

  C. Logstash环境搭建:https://www.cnblogs.com/ruhuanxingyun/p/11414719.html

 

7. 核心的Http Api

  A.  Index APIs:负责索引Index的创建create、删除Delete、获取Get等;

    I. Java代码层操作:https://www.elastic.co/guide/en/elasticsearch/client/java-rest/current/_index_apis.html

    II. elasticsearch-head / postman / kibana等操作:https://www.elastic.co/guide/en/elasticsearch/reference/7.5/indices.html

    III. 实例参考:https://www.cnblogs.com/ruhuanxingyun/p/11429347.html

  B. Document APIs:负责索引文档的创建Index、删除Delete、获取Get等操作,它是根据doc_id进行查询;

    I. Java代码层操作:https://www.elastic.co/guide/en/elasticsearch/client/java-rest/current/java-rest-high-supported-apis.html

    II. elasticsearch-head / postman / kibana等操作:https://www.elastic.co/guide/en/elasticsearch/reference/7.5/docs.html

    III. 实例参考:https://www.cnblogs.com/ruhuanxingyun/p/11434385.html

    C. Search APIs:负责索引文档的查询Search,它是根据条件查询;

    I. Java代码层操作:https://www.elastic.co/guide/en/elasticsearch/client/java-rest/current/_search_apis.html

    II. elasticsearch-head / postman / kibana等操作:https://www.elastic.co/guide/en/elasticsearch/reference/7.5/search.html

    III. 实例参考:https://www.cnblogs.com/ruhuanxingyun/p/12201644.html

  D. cat APIs:负责查询索引相关的各类信息查询;

    I. Java代码层操作:执行请求-https://www.elastic.co/guide/en/elasticsearch/client/java-rest/current/java-rest-low-usage-requests.html、阅读响应-https://www.elastic.co/guide/en/elasticsearch/client/java-rest/current/java-rest-low-usage-responses.html

    II. elasticsearch-head / postman / kibana等操作:https://www.elastic.co/guide/en/elasticsearch/reference/7.5/cat.html

    III. 实例参考:https://www.cnblogs.com/ruhuanxingyun/p/12174465.html

  E. Cluster APIs:负责集群相关的各类信息查询;

    I. Java代码层操作:https://www.elastic.co/guide/en/elasticsearch/client/java-rest/current/_cluster_apis.html

    II. elasticsearch-head / postman / kibana等操作:https://www.elastic.co/guide/en/elasticsearch/reference/7.5/cluster.html

    III. 实例参考:https://www.cnblogs.com/ruhuanxingyun/p/12193148.html

  F. Query DSL:结构化查询语句

    I. java代码层操作:https://www.elastic.co/guide/en/elasticsearch/client/java-rest/current/java-rest-high-query-builders.html

    II. elasticsearch-head / postman /kibana等操作:https://www.elastic.co/guide/en/elasticsearch/reference/7.5/query-dsl.html

    III. 实例参考: https://www.cnblogs.com/ruhuanxingyun/p/11322670.html

  G. Text analysis:分词

    I. https://www.elastic.co/guide/en/elasticsearch/reference/7.5/analysis.html

    II. 实例参考:

  H. Mapping:映射

    I. https://www.elastic.co/guide/en/elasticsearch/reference/7.5/mapping.html

    II. 实例参考:

  I. Aggregations:聚合

    I. https://www.elastic.co/guide/en/elasticsearch/reference/7.5/search-aggregations.html

    II. 实例参考:https://www.cnblogs.com/ruhuanxingyun/p/12304502.html

 

8. ES的数据导出

  1. Docker方式使用elasticsearch-dump

docker run --rm -ti -v /es:/tmp taskrabbit/elasticsearch-dump --input=http://127.0.0.1:8200/report_log_2021.02 --output=/tmp/report_log_2021.02.json --searchBody='{"query":{"bool":{"filter":[{"term":{"type": 8}}, {"range":{"timestamp":{"gte":"2021-02-01 00:00:00.000", "lte":"2021-02-07 23:59:59.999"}}}]}}, "_source":["user.name","user.phone","timestamp","access.url","access.title"]}' --type=data --sourceOnly=true

  2. 原生方式使用elasticdump:npm install -g elasticdump

1
2
3
4
5
6
7
8
9
10
11
12
elasticdump \
  --input=http://username:password@10.10.10.10:9200/company \
  --output=company_data.json \
  --type=data \
  --searchBody='{
    "_source": ["company_name", "credit_code", "company_org_type", "category_name", "company_status", "legal_person_name", "registered_address", "registered_capital", "established_date", "business_scope", "registered_authority"],
    "query": {
        "terms": {
            "company_name": ["中市建设工程质量中心有限公司", "中市伟泰用品有限公司"]
        }
    }
  }'

  

9. ES数据迁移或同步:https://cloud.tencent.com/developer/article/1621564

 

  可参考:Elasticsearch导出数据方式

posted @   如幻行云  阅读(825)  评论(0编辑  收藏  举报
相关博文:
阅读排行:
· TypeScript + Deepseek 打造卜卦网站:技术与玄学的结合
· 阿里巴巴 QwQ-32B真的超越了 DeepSeek R-1吗?
· 【译】Visual Studio 中新的强大生产力特性
· 10年+ .NET Coder 心语 ── 封装的思维:从隐藏、稳定开始理解其本质意义
· 【设计模式】告别冗长if-else语句:使用策略模式优化代码结构
点击右上角即可分享
微信分享提示