elasticsearch

ES

此笔记根据B站狂神的课程整理

推荐优秀笔记

配置文件

jvm.options

## -Xms4g
## -Xmx4g

elasticsearch.yml

# http.port: 9200

启动(windows)

修改密码

# 需要启动es 然后再安装目录进入 cmd
elasticsearch-reset-password -u elastic
# *+6+PRpu-tH_dfnFnSY-

elasticsearch-head

# 进入安装目录 cmd
cnpm install
npm run start

修改配置文件elasticsearch.yml

# 解决跨域问题【9100 9200】
http.cors.enabled: true
http.cors.allow-origin: '*'
# 配置密码的还要加上
http.cors.allow-headers: Authorization, Content-Type
xpack.security.enabled: true
xpack.security.transport.ssl.enabled: true

# 上述方法不行，取消密码验证
xpack.security.enabled: false

重新启动

# 双击 elasticsearch.bat
npm run start

Kibana

ELK

ELK是Elasticsearch、Logstash、Kibana三大开源框架首字母大写简称。市面上也被成为Elastic Stack。其中Elasticsearch是一个基于Lucene、分布式、通过Restful方式进行交互的近实时搜索平台框架。像类似百度、谷歌这种大数据全文搜索引擎的场景都可以使用Elasticsearch作为底层支持框架，可见Elasticsearch提供的搜索能力确实强大，市面上很多时候我们简称Elasticsearch为es。

Logstash是ELK的中央数据流引擎，用于从不同目标（文件/数据存储/MQ）收集的不同格式数据，经过过滤后支持输出到不同目的地（文件/MQ/redis/elasticsearch/kafka等）。Kibana可以将elasticsearch的数据通过友好的页面展示出来，提供实时分析的功能。

市面上很多开发只要提到ELK能够一致说出它是一个日志分析架构技术栈总称，但实际上ELK不仅仅适用于日志分析，它还可以支持其它任何数据分析和收集的场景，日志分析和收集只是更具有代表性。并非唯一性。

核心概念

集群，节点，索引，类型，文档，分片，映射是什么？

elasticsearch是面向文档，关系型数据库和elasticsearch的对比！一切都是JSON！

Relational DB	Elasticsearch
数据库（database)	索引引(indices)
表（tables)	types
行(rows)	documents
字段(columns)	fields

一个索引类型中，包含多个文档，比如说文档1，文档2。当我们索引一篇文档时，可以通过这样的一各顺序找到它：索引类型>>文档ID，通过这个组合我们就能索引到某个具体的文档。注意：ID不必是整数，实际上它是个字符串。

文档就是一条条数据

user
1 Kite 	18
2 Lee 	26

之前说elasticsearch是面向文档的，那么就意味着索引和搜索数据的最小单位是文档，elasticsearch中，文档有几个重要属性：

自我包含，一篇文档同时包含字段和对应的值，也就是同时包含key：value
可以是层次型的，一个文档中包含自文档，复杂的逻辑实体就是这么来的！
灵活的结构，文档不依赖预先定义的模式，我们知道关系型数据库中，要提前定义字段才能使用，在elasticsearch中，对于字段是非常灵活的，有时候，我们可以忽略该字段，或者动态的添加一个新的字段。

尽管我们可以随意的新增或者忽略某个字段，但是，每个字段的类型非常重要，比如一个年龄字段类型，可以是字符串也可以是整
型。因为elasticsearch会保存字段和类型之间的映射及其他的设置。这种映射具体到每个映射的每种类型，这也是为什么在elasticsearch 中，类型有时候也称为映射类型。

类型

类型是文档的逻辑容器，就像关系型数据库一样，表格是行的容器。类型中对于字段的定义称为映射，比如name映射为字符串类型。我们说文档是无模式的，它们不需要拥有映射中所定义的所有字段，比如新增一个字段，那么elasticsearch是怎么做的呢？elasticsearch会自动的将新字段加入映射，但是这个字段的不确定它是什么类型，elasticsearch就开始猜，如果这个值是18，那么elasticsearch会认为它是整形。但是elasticsearch也可能猜不对，所以最安全的方式就是提前定义好所需要的映射，这点跟关系型数据库殊途同归了，先定义好字段，然后再使用。

name : string text

索引

索引就是数据库
索引是映射类型的容器，elasticsearch中的索引是一个非常大的文档集合。索引存储了映射类型的字段和其他设置。然后它们被存储到了各个分片上了。

物理设计：节点和分片如何工作

一个集群至少有一个节点，而一个节点就是一个elasricsearch进程，节点可以有多个索引，如果创建索引，那么索引将会有个5个分片（primary shard，又称主分片）构成，每一个主分片会有一个副本（replica shard，又称复制分片）

倒排索引

elasticsearch 使用的是一种称为倒排索引的结构，采用Lucene倒排索作为底层。这种结构适用于快速的全文搜索，一个索引由文档中所有不重复的列表构成，对于每一个词，都有一个包含它的文档列表。例如，现在有两个文档，每个文档包含如下内容：

study every day,good good up to forever #文档1包含的内容
To forever,study every day,good good up #文档2包含的内容

为了创建倒排索引，首先要将每个文档拆分成独立的词（或称为词条或者tokens），然后创建一个包含所有不重复的词条的排
列表，然后列出每个词条出现在哪个文档：

term	doc_1	doc_2
Study	√	×
To	×	×
every	√	√
forever	√	√
day	√	√
study	×	√
good	√	√
every	√	√
to	√	×
up	√	√

搜索 to forever

erm	doc_1	doc_2
to	√	×
forever	√	√
total	2	1

两个文档都匹配，但第一个文档比第二个匹配程度更高。如果没有别的条件，现在，这两个包含关键字的文档都将返回。
通过博客标签来搜索博客文章，倒排索引列表就是这样的结构：

博客文章（原始数据）		索引列表（倒排索引）
博客文章ID	标签	标签	博客文章ID
1	python	python	1, 2, 3
2	python	linux	3, 4
3	linux,python
4	linux

如果要搜索含有python标签的文章，那相对于查找所有原始数据而言，查找倒排索引后的数据将会快的多。只需要查看标签这一
栏，然后获取相关的文章ID即可。全过滤掉无关的所有数据，提高效率

elasticsearch的索引和Lucene的索引对比

在elasticsearch中，索引被分为多个分片，每份分片是一个Lucene的索引。所以一个elasticsearch索引是由多个Lucene索引组成

如无特指，说起索引都是指elasticsearch的索引。

ik分词插件

检查安装结果

# 插件安装在 elasticsearch安装目录的 /plugins 目录下 
# cmd elasticsearch-plugin list
D:\Program_Files\elasticsearch-8.1.0\bin> elasticsearch-plugin list

测试

启动

# 双击 elasticsearch.bat
npm run start
# kibana.bat

# 访问 http://localhost:5601
# 进入开发工具

Analyzer: ik_smart , ik_max_word , Tokenizer: ik_smart , ik_max_word

GET _analyze
{
  "analyzer": "ik_smart",
  "text": "人民共和国"
}

# 显示结果如下：
{
  "tokens" : [
    {
      "token" : "人民共和国",
      "start_offset" : 0,
      "end_offset" : 5,
      "type" : "CN_WORD",
      "position" : 0
    }
  ]
}

GET _analyze
{
  "analyzer": "ik_max_word",
  "text": "人民共和国"
}

# 显示结果如下：
{
  "tokens" : [
    {
      "token" : "人民共和国",
      "start_offset" : 0,
      "end_offset" : 5,
      "type" : "CN_WORD",
      "position" : 0
    },
    {
      "token" : "人民",
      "start_offset" : 0,
      "end_offset" : 2,
      "type" : "CN_WORD",
      "position" : 1
    },
    {
      "token" : "共和",
      "start_offset" : 1,
      "end_offset" : 3,
      "type" : "CN_WORD",
      "position" : 2
    },
    {
      "token" : "共和国",
      "start_offset" : 2,
      "end_offset" : 5,
      "type" : "CN_WORD",
      "position" : 3
    }
    # 此处涉及敏感词，已修改，非程序输出结果

建立自定义词典

plugins\elasticsearch-analysis-ik-8.1.0\config\IKAnalyzer.cfg.xml

<entry key="ext_dict">kite.dic</entry>

在 config 目录下新建 kite.dic

GET _analyze
{
  "analyzer": "ik_max_word",
  "text": "我是李三四"
}

# 自定义之前
{
  "tokens" : [
    {
      "token" : "我",
      "start_offset" : 0,
      "end_offset" : 1,
      "type" : "CN_CHAR",
      "position" : 0
    },
    {
      "token" : "是",
      "start_offset" : 1,
      "end_offset" : 2,
      "type" : "CN_CHAR",
      "position" : 1
    },
    {
      "token" : "李三",
      "start_offset" : 2,
      "end_offset" : 4,
      "type" : "CN_WORD",
      "position" : 2
    },
    {
      "token" : "三四",
      "start_offset" : 3,
      "end_offset" : 5,
      "type" : "TYPE_CNUM",
      "position" : 3
    }
  ]
}
# 自定义之后
{
  "tokens" : [
    {
      "token" : "我",
      "start_offset" : 0,
      "end_offset" : 1,
      "type" : "CN_CHAR",
      "position" : 0
    },
    {
      "token" : "是",
      "start_offset" : 1,
      "end_offset" : 2,
      "type" : "CN_CHAR",
      "position" : 1
    },
    #####################
    {
      "token" : "李三四",
      "start_offset" : 2,
      "end_offset" : 5,
      "type" : "CN_WORD",
      "position" : 2
    },
    #####################
    {
      "token" : "李三",
      "start_offset" : 2,
      "end_offset" : 4,
      "type" : "CN_WORD",
      "position" : 3
    },
    {
      "token" : "三四",
      "start_offset" : 3,
      "end_offset" : 5,
      "type" : "TYPE_CNUM",
      "position" : 4
    }
  ]
}

Restful

基础测试

索引操作

PUT test02		# 创建索引（数据库）
GET test02		# 获取索引信息
GET _cat/indices?v	# 查看所有索引的当前信息

POST test01/_doc	# 向 test01 中添加 document 
{
  "name":"Kite"
}					

DELETE test01	# 删除索引

PUT test01		# 设置索引规则
{
  "mappings": {
   "properties": {
     "name":{
       "type": "text"
     },
     "age":{
       "type": "long"
     },
     "birthday": {
       "type": "date"
     }
   }
  }
}

PUT test01/_doc/1	# 向 test01 中添加 document 并指定 id 为 1
{
  "name": "Kite",
  "age": 15,
  "birthday": "2000-01-01"
}
# 结果如下：
{
  "_index" : "test01",
  "_id" : "1",
  "_version" : 1,
  "result" : "created",
  "_shards" : {
    "total" : 2,
    "successful" : 1,
    "failed" : 0
  },
  "_seq_no" : 0,
  "_primary_term" : 1
}

PUT test01/_doc/1			# 修改数据 （会替换为当前写入数据）
{
  "name": "Kite Lee",
  "age": 15,
  "birthday": "2000-01-01"
}

{
  "_index" : "test01",
  "_id" : "1",	
  "_version" : 2,			
  "result" : "updated",
  "_shards" : {
    "total" : 2,
    "successful" : 1,
    "failed" : 0
  },
  "_seq_no" : 1,
  "_primary_term" : 1
}

POST test01/_update/1		# 修改数据(只修改涉及的属性) 
{
  "doc": {
    "name": "Alice"
  }
}

文档操作

PUT test01/_doc/1
{
  "name":"Kite",
  "age":15,
  "desc":"for you, thousands of times",
  "tags":["romantic", "honset", "brave"]
}

PUT test01/_doc/2
{
  "name":"Ermu",
  "age":28,
  "desc":"for myself, once",
  "tags":["stubborn", "dishonset", "coward"]
}
# id 查询
GET test01/_doc/1
GET test01/_doc/2
# 搜索查询
GET test01/_search?q=name:Kite
GET test01/_search?q=name:Kite Lee

查询

# 输入四组数据
PUT test01/_doc/1
{
  "name":"Kite Lee",
  "age":15,
  "desc":"for you, thousands of times",
  "tags":["romantic", "honset", "brave"]
}
PUT test01/_doc/2
{
  "name":"Ermu",
  "age":28,
  "desc":"for myself, once",
  "tags":["stubborn", "dishonset", "coward"]
}

PUT test01/_doc/3
{
  "name":"Kite Liu",
  "age":29,
  "desc":"Strong and Man",
  "tags":["stubborn", "dishonset", "coward"]
}

PUT test01/_doc/4
{
  "name":"Kite Li",
  "age":89,
  "desc":"talk is cheat, show me the code",
  "tags":["crazy"]
}

# 查询一
GET test01/_doc/1

# 查询二
GET test01/_search?q=name:Kite
GET test01/_search?q=name:Kite Lee
# 查询三
GET test01/_search
{
  "query": {
    "match": {
      "name": "Kite"
    }
  },
  "_source": ["name", "desc"]	# 对输出结果进行过滤
}
# 查询四 过滤、排序、分页
GET test01/_search
{
  "query": {
    "match": {
      "name": "Kite"
    }
  },
  "sort": [						# 排序
    {
      "age": {
        "order": "desc"
      }
    }
  ], 
  "_source": ["name", "age"],	# 过滤
  "from": 0,					# 分页
  "size": 2
}
# 查询五 bool(must) 多条件查询
GET test01/_search
{
  "query": {
    "bool": {
      "must": [			# 多个匹配都要符合
        {
          "match": {
            "name": "Kite"		# name 为text 类型 会进行分片查询可以匹配 Kite*
          }
        },
        {
          "match": {
            "age": 15
          }
        }
      ]
    }
  }
}
# 查询六 bool(should) 多条件查询
GET test01/_search
{
  "query": {
    "bool": {
      "should": [		# 多个匹配部分符合
        {
          "match": {
            "name": "Kite"		# name 为text 类型 会进行分片查询可以匹配 Kite*
          }
        },
        {
          "match": {
            "age": 15	
          }
        }
      ]
    }
  }
}
# 查询七 bool(must_not) 排除条件查询
GET test01/_search
{
  "query": {
    "bool": {
      "must_not": [			
        {
          "match": {
            "age": 15		# 筛选出 age ！= 15 的
          }
        }
      ]
    }
  }
}
# 查询八 bool(must) filter过滤 排除条件查询
GET test01/_search
{
  "query": {
    "bool": {
      "must": {
          "match": {
            "name": "Kite"
          }
      },
      "filter": {
        "range": {
          "age": {
            "gte": 10,		# 大于等于 10	大于 gt
            "lte": 20		# 小于等于 20	小于 lt
          }
        }
      }
    }
  }
}

# 查询九 多匹配条件查询
GET test01/_search
{
  "query": {
    "match": {
      "tags": "brave crazy"	# tags 含有 brave 或 crazy 即可
    }
  }
}
# 查询十 精确查询 term

# 新建 索引
PUT test02
{
  "mappings": {
    "properties": {
      "name": {
        "type": "text"
      },
      "desc": {
        "type": "keyword"
      }
    }
  }
}

# 新建两组数据
PUT test02/_doc/1
{
  "name": "Kite Lee is active",
  "desc": "Kite Lee is negative"
}

PUT test02/_doc/2
{
  "name": "Kite Lee is active2",
  "desc": "Kite Lee is negative2"
}


# 查询 keyword 的分词	Kite Lee is active	（没有被拆分）
GET _analyze
{
  "analyzer": "keyword",
  "text": "Kite Lee is active"
}

# 查询 普通字符串 的分词	Kite Lee is active	（拆分）
GET _analyze
{
  "analyzer": "standard",
  "text": "Kite Lee is active"
}
# name 为 text 属性 所以查询为空
GET test02/_search
{
  "query": {
    "term": {
      "name": "Kite Lee is active"
    }
  }
}

# desc 为 keyword 属性 可以查询到
GET test02/_search
{
  "query": {
    "term": {
      "desc": "Kite Lee is negative"
    }
  }
}

# 使用 match {desc.keyword}
GET test02/_search
{
  "query": {
    "match": {
      "desc.keyword": "Kite Lee"
    }
  }
}

GET test02/_search
{
  "query": {
    "match": {
      "name": "Kite Lee"
    }
  }
}

高亮

GET test02/_search
{
  "query": {
    "match": {
      "name": "Kite Lee"
    }
  },
  "highlight": {
    "fields": {
      "name": {}
    }
  }
}

# 结果
{
  "took" : 66,
  "timed_out" : false,
  "_shards" : {
    "total" : 1,
    "successful" : 1,
    "skipped" : 0,
    "failed" : 0
  },
  "hits" : {
    "total" : {
      "value" : 2,
      "relation" : "eq"
    },
    "max_score" : 0.36464313,
    "hits" : [
      {
        "_index" : "test02",
        "_id" : "1",
        "_score" : 0.36464313,
        "_source" : {
          "name" : "Kite Lee is active",
          "desc" : "Kite Lee is negative"
        },
        "highlight" : {
          "name" : [
            "<em>Kite</em> <em>Lee</em> is active"
          ]
        }
      },
      {
        "_index" : "test02",
        "_id" : "2",
        "_score" : 0.36464313,
        "_source" : {
          "name" : "Kite Lee is active2",
          "desc" : "Kite Lee is negative2"
        },
        "highlight" : {
          "name" : [
            "<em>Kite</em> <em>Lee</em> is active2"
          ]
        }
      }
    ]
  }
}
# 自定义高亮格式
GET test02/_search
{
  "query": {
    "match": {
      "name": "Kite Lee"
    }
  },
  "highlight": {
    "pre_tags": "<p class='key' style='color:red'p>",
    "post_tags": "</p>", 
    "fields": {
      "name": {}
    }
  }
}

# 结果 
{
  "took" : 1,
  "timed_out" : false,
  "_shards" : {
    "total" : 1,
    "successful" : 1,
    "skipped" : 0,
    "failed" : 0
  },
  "hits" : {
    "total" : {
      "value" : 2,
      "relation" : "eq"
    },
    "max_score" : 0.36464313,
    "hits" : [
      {
        "_index" : "test02",
        "_id" : "1",
        "_score" : 0.36464313,
        "_source" : {
          "name" : "Kite Lee is active",
          "desc" : "Kite Lee is negative"
        },
        "highlight" : {
          "name" : [
            "<p class='key' style='color:red'p>Kite</p> <p class='key' style='color:red'p>Lee</p> is active"
          ]
        }
      },
      {
        "_index" : "test02",
        "_id" : "2",
        "_score" : 0.36464313,
        "_source" : {
          "name" : "Kite Lee is active2",
          "desc" : "Kite Lee is negative2"
        },
        "highlight" : {
          "name" : [
            "<p class='key' style='color:red'p>Kite</p> <p class='key' style='color:red'p>Lee</p> is active2"
          ]
        }
      }
    ]
  }
}

SpringBoot 集成 ES

https://www.elastic.co/guide/en/elasticsearch/client/java-rest/current/index.html

maven 依赖

<dependency>
    <groupId>org.elasticsearch.client</groupId>
    <artifactId>elasticsearch-rest-high-level-client</artifactId>
    <version>7.17.3</version>
</dependency>

RestHighLevelClient client = new RestHighLevelClient(
        RestClient.builder(
                new HttpHost("localhost", 9200, "http"),
                new HttpHost("localhost", 9201, "http")));

client.close();

依次访问三个位置，查看版本

或者直接查看依赖

如果要修改依赖版本--pom.xml

<properties>
    <elasticsearch.version>7.15.2</elasticsearch.version>
</properties>

package com.lee.config;

import org.apache.http.HttpHost;
import org.elasticsearch.client.RestClient;
import org.elasticsearch.client.RestHighLevelClient;
import org.springframework.context.annotation.Bean;
import org.springframework.context.annotation.Configuration;

@Configuration
public class ElasticSearchConfig {

    // <Bean id=requestClient class=RestHighLevelClient>
    @Bean
    public RestHighLevelClient requestClient() {
        RestHighLevelClient client = new RestHighLevelClient(
                RestClient.builder(new HttpHost("localhost", 9200, "http")));
        return client;
    }
}

配置文件源码位置

索引测试

创建索引 new CreateIndexRequest("kite_index")

验证索引是否存在 new GetIndexRequest("kite_index")

删除索引 new DeleteIndexRequest("kite_index")

package com.lee;

import org.elasticsearch.action.admin.indices.delete.DeleteIndexRequest;
import org.elasticsearch.action.support.master.AcknowledgedResponse;
import org.elasticsearch.client.RequestOptions;
import org.elasticsearch.client.RestHighLevelClient;
import org.elasticsearch.client.indices.CreateIndexRequest;
import org.elasticsearch.client.indices.CreateIndexResponse;
import org.elasticsearch.client.indices.GetIndexRequest;
import org.junit.jupiter.api.Test;
import org.springframework.beans.factory.annotation.Autowired;
import org.springframework.beans.factory.annotation.Qualifier;
import org.springframework.boot.test.context.SpringBootTest;

import java.io.IOException;

/**
 * es api 测试
 */
@SpringBootTest
class Es01ApiApplicationTests {

    @Autowired
    @Qualifier("restHighLevelClient")
    private RestHighLevelClient client;
    // private RestHighLevelClient restHighLevelClient;

    // 测试索引的创建
    @Test
    void testCreatIndex() throws IOException {
        // 1. 创建索引请求
        CreateIndexRequest request = new CreateIndexRequest("kite_index");
        // 2. 客户端执行请求 CreateIndexResponse
        CreateIndexResponse createIndexResponse = client.indices().create(request, RequestOptions.DEFAULT);
        System.out.println(createIndexResponse);
    }

    // 测试获取索引请求
    @Test
    void testGetIndex() throws IOException {

        GetIndexRequest request = new GetIndexRequest("kite_index");
        boolean exists = client.indices().exists(request, RequestOptions.DEFAULT);
        System.out.println(exists);

    }

    // 测试删除索引
    @Test
    void testDeleteIndex() throws IOException {
        DeleteIndexRequest request = new DeleteIndexRequest("kite_index");
        AcknowledgedResponse isDelete = client.indices().delete(request, RequestOptions.DEFAULT);
        // 注意 isDelete.isAcknowledged()
        System.out.println(isDelete.isAcknowledged());
    }
}

文档测试

新增一个实体类 User

import lombok.AllArgsConstructor;
import lombok.Data;
import lombok.NoArgsConstructor;
import org.springframework.stereotype.Component;

@Data
@AllArgsConstructor
@NoArgsConstructor
@Component
public class User {
    private String name;
    private int age;
}

import com.alibaba.fastjson.JSON;
import com.lee.pojo.User;
import org.elasticsearch.action.admin.indices.delete.DeleteIndexRequest;
import org.elasticsearch.action.bulk.BulkRequest;
import org.elasticsearch.action.bulk.BulkResponse;
import org.elasticsearch.action.delete.DeleteRequest;
import org.elasticsearch.action.delete.DeleteResponse;
import org.elasticsearch.action.get.GetRequest;
import org.elasticsearch.action.get.GetResponse;
import org.elasticsearch.action.index.IndexRequest;
import org.elasticsearch.action.index.IndexResponse;
import org.elasticsearch.action.search.SearchRequest;
import org.elasticsearch.action.search.SearchResponse;
import org.elasticsearch.action.support.master.AcknowledgedResponse;
import org.elasticsearch.action.update.UpdateRequest;
import org.elasticsearch.action.update.UpdateResponse;
import org.elasticsearch.client.RequestOptions;
import org.elasticsearch.client.RestHighLevelClient;
import org.elasticsearch.client.indices.CreateIndexRequest;
import org.elasticsearch.client.indices.CreateIndexResponse;
import org.elasticsearch.client.indices.GetIndexRequest;
import org.elasticsearch.common.xcontent.XContentType;
import org.elasticsearch.core.TimeValue;
import org.elasticsearch.index.query.MatchAllQueryBuilder;
import org.elasticsearch.index.query.QueryBuilders;
import org.elasticsearch.index.query.TermQueryBuilder;
import org.elasticsearch.search.SearchHit;
import org.elasticsearch.search.builder.SearchSourceBuilder;
import org.elasticsearch.search.fetch.subphase.FetchSourceContext;
import org.junit.jupiter.api.Test;
import org.springframework.beans.factory.annotation.Autowired;
import org.springframework.beans.factory.annotation.Qualifier;
import org.springframework.boot.test.context.SpringBootTest;

import java.io.IOException;
import java.util.ArrayList;

/**
 * es api 测试
 */
@SpringBootTest
class Es01ApiApplicationTests {

    private final String kite_index = "kite_index";
    @Autowired
    @Qualifier("restHighLevelClient")
    private RestHighLevelClient client;
    // private RestHighLevelClient restHighLevelClient;

    // 测试索引的创建
    @Test
    void testCreatIndex() throws IOException {
        // 1. 创建索引请求
        CreateIndexRequest request = new CreateIndexRequest(kite_index);
        // 2. 客户端执行请求 CreateIndexResponse
        CreateIndexResponse createIndexResponse = client.indices().create(request, RequestOptions.DEFAULT);
        System.out.println(createIndexResponse);
    }

    // 测试获取索引请求
    @Test
    void testGetIndex() throws IOException {

        GetIndexRequest request = new GetIndexRequest(kite_index);
        boolean exists = client.indices().exists(request, RequestOptions.DEFAULT);
        System.out.println(exists);

    }

    // 测试删除索引
    @Test
    void testDeleteIndex() throws IOException {
        DeleteIndexRequest request = new DeleteIndexRequest(kite_index);
        AcknowledgedResponse isDelete = client.indices().delete(request, RequestOptions.DEFAULT);
        // 注意 isDelete.isAcknowledged()
        System.out.println(isDelete.isAcknowledged());

    }

    // -------------------------------- 文档测试 ----------------------------------------------

    /**
     * 添加文档
     * @throws IOException
     */

    @Test
    void testAddDocument() throws IOException {

        // 创建对象
        User user = new User("Kite", 15);
        // 创建请求
        IndexRequest request = new IndexRequest(kite_index);

        request.id("1");
        request.timeout(TimeValue.timeValueSeconds(1));
        // request.timeout("1s");

        // 将数据放入请求
        request.source(JSON.toJSONString(user), XContentType.JSON);

        // ObjectMapper jackson = new ObjectMapper();
        // request.source(jackson.writeValueAsString(user), XContentType.JSON);

        // 客户端发送请求

        IndexResponse indexResponse = null;
        try {
            indexResponse = client.index(request, RequestOptions.DEFAULT);
        } catch (IOException e) {
            // 会报错IOException : Unable to parse response body for Response, 但是完成了添加
            // e.printStackTrace();
        }

        // System.out.println(indexResponse.status());
    }

    /**
     * 判断是否存在 /kite_index/_doc/1
     * @throws IOException
     */

    @Test
    void testExistsDocument() throws IOException {
        GetRequest getRequest = new GetRequest(kite_index, "1");
        // 不获取返回的 _source 的上下文
        getRequest.fetchSourceContext(new FetchSourceContext(false));
        getRequest.storedFields("_none_");

        boolean exists = client.exists(getRequest, RequestOptions.DEFAULT);
        System.out.println(exists);
    }

    /**
     *  获得文档的信息
     * @throws IOException
     */

    @Test
    void testGetDocument() throws IOException {
        GetRequest getRequest = new GetRequest(kite_index, "1");
        GetResponse getResponse = client.get(getRequest, RequestOptions.DEFAULT);
        System.out.println(getResponse.getSourceAsString());    // {"age":15,"name":"Kite"}
        System.out.println(getResponse);
        
            /*
             与命令格式相同
            {
                "_index":"kite_index",
                "_type":null,"_id":"1",
                "_version":1,
                "_seq_no":0,
                "_primary_term":1,
                "found":true,
                "_source":{"age":15,"name":"Kite"}
            }
             */
    }

    /**
     *  更新文档
     * @throws IOException
     */
    @Test
    void testUpdateDocument() throws IOException {
        UpdateRequest updateRequest = new UpdateRequest("kite_index", "1");
        updateRequest.timeout(TimeValue.timeValueSeconds(1));

        User user2 = new User("Lee", 28);

        // ObjectMapper jackson = new ObjectMapper();
        // updateRequest.doc(jackson.writeValueAsString(user2), XContentType.JSON);
        updateRequest.doc(JSON.toJSONString(user2), XContentType.JSON);
            // 使用 fastJSON 更新出错，会增加一条json , 后来又恢复正常, 不知道原因

        UpdateResponse updateResponse;
        try {
            updateResponse = client.update(updateRequest, RequestOptions.DEFAULT);
        } catch (IOException e) {
            // 同 添加文档 时的 IO异常
        }
        // System.out.println(updateResponse.status());
    }


    /**
     * 删除文档
     * @throws IOException
     */
    @Test
    void testDeleteDocument() throws IOException {
        DeleteRequest deleteRequest = new DeleteRequest("kite_index", "1");
        deleteRequest.timeout("1s");

        try {
            DeleteResponse deleteResponse = client.delete(deleteRequest, RequestOptions.DEFAULT);
        } catch (IOException e) {
            // e.printStackTrace();
        }
        // System.out.println(deleteResponse.status());
    }

    /**
     *  批量插入数据
     * @throws IOException
     */
    @Test
    void testBulkRequest() throws IOException {
        BulkRequest bulkRequest = new BulkRequest();
        bulkRequest.timeout(TimeValue.timeValueSeconds(10));

        ArrayList<User> users = new ArrayList<>();

        users.add(new User("Alice", 25));
        users.add(new User("Bob", 50));
        users.add(new User("Cindy", 75));
        users.add(new User("David", 100));

        for (int i = 0; i < users.size(); i++) {
            bulkRequest.add(
                    new IndexRequest("kite_index")
                            .id(String.valueOf(i + 1))
                            .source(JSON.toJSONString(users.get(i)), XContentType.JSON)
            );
        }

        try {
            BulkResponse bulkResponse = client.bulk(bulkRequest, RequestOptions.DEFAULT);
        } catch (IOException e) {
            // e.printStackTrace();
        }
        // System.out.println(bulkResponse.hasFailures());
    }

    /**
     * 查询
     * @throws IOException
     */

    @Test
    void testSearch() throws IOException {
        SearchRequest searchRequest = new SearchRequest(kite_index);
        // 构建搜索条件
        SearchSourceBuilder searchSourceBuilder = new SearchSourceBuilder();

        // 查询条件
        //      精确匹配
        TermQueryBuilder termQuery = QueryBuilders.termQuery("name", "Cindy");
        //      匹配所有
        MatchAllQueryBuilder matchAllQuery = QueryBuilders.matchAllQuery();

        searchSourceBuilder.query(termQuery);
        // 分页
        // searchSourceBuilder.from(2);
        // searchSourceBuilder.size(1);

        searchSourceBuilder.timeout(TimeValue.timeValueSeconds(60));

        searchRequest.source(searchSourceBuilder);

        SearchResponse searchResponse = client.search(searchRequest, RequestOptions.DEFAULT);
        System.out.println(JSON.toJSONString(searchResponse.getHits()));

        System.out.println("========================================");
        for (SearchHit documentFields : searchResponse.getHits().getHits()) {
            System.out.println(documentFields.getSourceAsMap());
        }
    }
}

模仿京东搜索

application.properties

server.port=9090
# 关闭 thymeleaf 的缓存
spring.thymeleaf.cache=false

爬取数据

package com.lee.utils;

import com.lee.pojo.Content;
import org.jsoup.Jsoup;
import org.jsoup.nodes.Document;
import org.jsoup.nodes.Element;
import org.jsoup.select.Elements;

import java.io.IOException;
import java.net.URL;
import java.util.ArrayList;
import java.util.List;

public class HtmlParseUtil {

    public List<Content> parseJD(String keywords) throws IOException {

        String url = "https://search.jd.com/Search?keyword=" + keywords + "&enc=utf-8";

        Document document = Jsoup.parse(new URL(url), 60000);

        Element element = document.getElementById("J_goodsList");

        // System.out.println(element.html());
        Elements elements = element.getElementsByTag("li");

        ArrayList<Content> goodsList = new ArrayList<>();

        for (Element elem : elements) {
            String img = elem.getElementsByTag("img").eq(0).attr("data-lazy-img");
            String price = elem.getElementsByClass("p-price").eq(0).text();
            String title = elem.getElementsByClass("p-name").eq(0).text();
            Content content = new Content();
            content.setTitle(title);
            content.setImg(img);
            content.setPrice(price);

            goodsList.add(content);

            // System.out.println("===============================================");
            // System.out.println("img : " + img + "\n price : " + price + ",\t title : " + title);
        }

        return goodsList;
    }

    public static void main(String[] args) throws IOException {
        new HtmlParseUtil().parseJD("深入理解").forEach(System.out::println);

    }

}

数据写入ES 实现查询功能

package com.lee.config;

import org.apache.http.HttpHost;
import org.elasticsearch.client.RestClient;
import org.elasticsearch.client.RestHighLevelClient;
import org.springframework.context.annotation.Bean;
import org.springframework.context.annotation.Configuration;

@Configuration
public class ElasticSearchConfig {

    // <Bean id=requestClient class=RestHighLevelClient>
    @Bean
    public RestHighLevelClient restHighLevelClient() {
        RestHighLevelClient client = new RestHighLevelClient(
                RestClient.builder(new HttpHost("localhost", 9200, "http")));
        return client;
    }
}

创建实体类 Content

import lombok.AllArgsConstructor;
import lombok.Data;
import lombok.NoArgsConstructor;

@Data
@AllArgsConstructor
@NoArgsConstructor
public class Content {
    private String title;
    private String img;
    private String price;
}

package com.lee.service;

import com.alibaba.fastjson.JSON;
import com.lee.pojo.Content;
import com.lee.utils.HtmlParseUtil;
import org.elasticsearch.action.bulk.BulkRequest;
import org.elasticsearch.action.bulk.BulkResponse;
import org.elasticsearch.action.index.IndexRequest;
import org.elasticsearch.action.search.SearchRequest;
import org.elasticsearch.action.search.SearchResponse;
import org.elasticsearch.client.RequestOptions;
import org.elasticsearch.client.RestHighLevelClient;
import org.elasticsearch.common.xcontent.XContentType;
import org.elasticsearch.core.TimeValue;
import org.elasticsearch.index.query.QueryBuilders;
import org.elasticsearch.index.query.TermQueryBuilder;
import org.elasticsearch.search.SearchHit;
import org.elasticsearch.search.builder.SearchSourceBuilder;
import org.springframework.beans.factory.annotation.Autowired;
import org.springframework.stereotype.Service;

import javax.naming.directory.SearchResult;
import java.io.IOException;
import java.util.ArrayList;
import java.util.List;
import java.util.Map;

@Service
public class ContentService {

    @Autowired	
    private RestHighLevelClient restHighLevelClient;	// 注入 Bean

    // 1. 解析数据放入 es 索引中
    public boolean parseContent(String keywords) throws IOException {

        List<Content> contents = new HtmlParseUtil().parseJD(keywords);
        BulkRequest bulkRequest = new BulkRequest();
        bulkRequest.timeout("2m");

        for (int i = 0; i < contents.size(); i++) {
            bulkRequest.add(new IndexRequest("jd_goods")
                    .source(JSON.toJSONString(contents.get(i)), XContentType.JSON));
        }
        BulkResponse bulkResponse = restHighLevelClient.bulk(bulkRequest, RequestOptions.DEFAULT);
        return !bulkResponse.hasFailures();

    }

    // 获取这些数据实现搜索功能
    public List<Map<String, Object>> searchPage(String keywords, int pageNo, int pageSize) throws IOException {
        if (pageNo <= 1) {
            pageNo = 1;
        }

        // 条件搜索
        SearchRequest searchRequest = new SearchRequest("jd_goods");
        SearchSourceBuilder searchSourceBuilder = new SearchSourceBuilder();

        // 分页
        searchSourceBuilder.from(pageNo);
        searchSourceBuilder.size(pageSize);

        // 精准匹配
        TermQueryBuilder termQueryBuilder = QueryBuilders.termQuery("title", keywords);
        searchSourceBuilder.query(termQueryBuilder);
        searchSourceBuilder.timeout(TimeValue.timeValueMinutes(1));

        // 执行搜索
        searchRequest.source(searchSourceBuilder);
        SearchResponse searchResponse = restHighLevelClient.search(searchRequest, RequestOptions.DEFAULT);

        // 解析结果
        ArrayList<Map<String, Object>> list = new ArrayList<>();
        for (SearchHit documentFields : searchResponse.getHits().getHits()) {
            list.add(documentFields.getSourceAsMap());
        }
        return list;
    }


    public static void main(String[] args) throws IOException {

        new ContentService().parseContent("java");
    }

}

package com.lee.controller;

import com.lee.service.ContentService;
import org.springframework.beans.factory.annotation.Autowired;
import org.springframework.web.bind.annotation.GetMapping;
import org.springframework.web.bind.annotation.PathVariable;
import org.springframework.web.bind.annotation.RestController;

import java.io.IOException;
import java.util.List;
import java.util.Map;


@RestController
public class ContentController {

    @Autowired
    private ContentService contentService;

    @GetMapping("/parse/{keywords}")
    public boolean parseContent(@PathVariable("keywords") String keywords) throws IOException {

        return contentService.parseContent(keywords);
    }
        return contentService.searchPage(keywords, pageNo, pageSize);
    }
}

对接前端

npm init

npm install vue

npm install axios

前端代码格式

index.html 导入资源的路径

<link rel="stylesheet" th:href="@{/css/style.css}"/>
<script th:src="@{/js/jquery.min.js}"></script>
<script th:src="@{/js/vue.min.js}"></script>
<script th:src="@{/js/axios.min.js}"></script>

controller

package com.lee.controller;

import com.lee.service.ContentService;
import org.springframework.beans.factory.annotation.Autowired;
import org.springframework.web.bind.annotation.GetMapping;
import org.springframework.web.bind.annotation.PathVariable;
import org.springframework.web.bind.annotation.RestController;

import java.io.IOException;
import java.util.List;
import java.util.Map;


@RestController
public class ContentController {

    @Autowired
    private ContentService contentService;

    @GetMapping("/parse/{keywords}")
    public boolean parseContent(@PathVariable("keywords") String keywords) throws IOException {

        return contentService.parseContent(keywords);
    }

    @GetMapping("/search/{keywords}/{pageNo}/{pageSize}")
    public List<Map<String, Object>> search(@PathVariable("keywords") String keywords,
                                            @PathVariable("pageNo") int pageNo,
                                            @PathVariable("pageSize") int pageSize)
            throws IOException {

        return contentService.searchPage(keywords, pageNo, pageSize);
    }
}

参数传递实现

<script>
    new Vue({
        el:"#app",
        data:{
            "keyword": '', // 搜索的关键字
            "results":[] // 后端返回的结果
        },
        methods:{
            searchKey(){
                var keywords = this.keyword;
                console.log(keywords);
                axios.get('search/'+keywords+'/0/20').then(response=>{	// get 方法
                    console.log(response.data);
                    this.results=response.data;
                })
            }
        }
    });
</script>

页面显示数据

v-for="result in results"

<!-- 商品详情 -->
<div class="view grid-nosku" >
    <div class="product" v-for="result in results">
        <div class="product-iWrap">
            <!--商品封面-->
            <div class="productImg-wrap">
                <a class="productImg">
                    <img :src="result.img">
                </a>
            </div>
            <!--价格-->
            <p class="productPrice">
                <em v-text="result.price"></em>
            </p>
            <!--标题-->
            <p class="productTitle">
                <a v-html="result.name"></a>
            </p>
            <!-- 店铺名 -->
            <div class="productShop">
                <span>店铺： Kite Lee </span>
            </div>
            <!-- 成交信息 -->
            <p class="productStatus">
                <span>月成交<em>999笔</em></span>
                <span>评价 <a>3</a></span>
            </p>
        </div>
    </div>
</div>

高亮搜索词

// 获取数据实现搜索并高亮功能
public List<Map<String, Object>> searchPageHighlight(String keywords, int pageNo, int pageSize) throws IOException {
    if (pageNo <= 1) {
        pageNo = 1;
    }

    // 条件搜索
    SearchRequest searchRequest = new SearchRequest("jd_goods");
    SearchSourceBuilder searchSourceBuilder = new SearchSourceBuilder();

    // 分页
    searchSourceBuilder.from(pageNo);
    searchSourceBuilder.size(pageSize);

    // 精准匹配
    TermQueryBuilder termQueryBuilder = QueryBuilders.termQuery("title", keywords);
    searchSourceBuilder.query(termQueryBuilder);
    searchSourceBuilder.timeout(TimeValue.timeValueMinutes(1));

    // 高亮   ！！！
    HighlightBuilder highlightBuilder = new HighlightBuilder();
    highlightBuilder.field("title");
    highlightBuilder.requireFieldMatch(false);  // 关闭多个高亮
    highlightBuilder.preTags("<span style='color:red'>");
    highlightBuilder.postTags("</span>");
    searchSourceBuilder.highlighter(highlightBuilder);


    // 执行搜索
    searchRequest.source(searchSourceBuilder);
    SearchResponse searchResponse = restHighLevelClient.search(searchRequest, RequestOptions.DEFAULT);

    // 解析结果
    ArrayList<Map<String, Object>> list = new ArrayList<>();
    for (SearchHit hit : searchResponse.getHits().getHits()) {

        // ！！！
        Map<String, HighlightField> highlightFields = hit.getHighlightFields();
        HighlightField title = highlightFields.get("title");

        Map<String, Object> sourceAsMap = hit.getSourceAsMap();
        if (title != null) {
            Text[] fragments = title.fragments();
            String new_title = "";
            for (Text text : fragments) {
                new_title += text;
            }
            sourceAsMap.put("title", new_title);    // 高亮字段替换原来字段
        }


        list.add(sourceAsMap);
    }
    return list;
}

controller

@GetMapping("/search/{keywords}/{pageNo}/{pageSize}")
public List<Map<String, Object>> search(@PathVariable("keywords") String keywords,
                                        @PathVariable("pageNo") int pageNo,
                                        @PathVariable("pageSize") int pageSize)
        throws IOException {

    // return contentService.searchPage(keywords, pageNo, pageSize);
    return contentService.searchPageHighlight(keywords, pageNo, pageSize);   // 高亮搜索词汇
}

posted @ 2022-05-14 17:35 Kite_Lee 阅读(74) 评论(0) 收藏举报

刷新页面返回顶部