elasticsearch
ES
此笔记根据B站狂神的课程整理
推荐优秀笔记
配置文件
jvm.options
## -Xms4g
## -Xmx4g
elasticsearch.yml
# http.port: 9200
启动(windows)
修改密码
# 需要启动es 然后再安装目录进入 cmd
elasticsearch-reset-password -u elastic
# *+6+PRpu-tH_dfnFnSY-
elasticsearch-head
# 进入安装目录 cmd
cnpm install
npm run start
修改配置文件elasticsearch.yml
# 解决跨域问题【9100 9200】
http.cors.enabled: true
http.cors.allow-origin: '*'
# 配置密码的还要加上
http.cors.allow-headers: Authorization, Content-Type
xpack.security.enabled: true
xpack.security.transport.ssl.enabled: true
# 上述方法不行,取消密码验证
xpack.security.enabled: false
重新启动
# 双击 elasticsearch.bat
npm run start
Kibana
ELK
ELK是Elasticsearch、Logstash、Kibana三大开源框架首字母大写简称。市面上也被成为Elastic Stack。其中Elasticsearch是一个基于Lucene、分布式、通过Restful方式进行交互的近实时搜索平台框架。像类似百度、谷歌这种大数据全文搜索引擎的场景都可以使用Elasticsearch作为底层支持框架,可见Elasticsearch提供的搜索能力确实强大,市面上很多时候我们简称Elasticsearch为es。
Logstash是ELK的中央数据流引擎,用于从不同目标(文件/数据存储/MQ)收集的不同格式数据,经过过滤后支持输出到不同目的地(文件/MQ/redis/elasticsearch/kafka等)。Kibana可以将elasticsearch的数据通过友好的页面展示出来,提供实时分析的功能。
市面上很多开发只要提到ELK能够一致说出它是一个日志分析架构技术栈总称,但实际上ELK不仅仅适用于日志分析,它还可以支持其它任何数据分析和收集的场景,日志分析和收集只是更具有代表性。并非唯一性。
核心概念
集群,节点,索引,类型,文档,分片,映射是什么?
elasticsearch是面向文档,关系型数据库和elasticsearch的对比!一切都是JSON!
Relational DB | Elasticsearch |
---|---|
数据库(database) | 索引引(indices) |
表(tables) | types |
行(rows) | documents |
字段(columns) | fields |
一个索引类型中,包含多个文档,比如说文档1,文档2。当我们索引一篇文档时,可以通过这样的一各顺序找到它:索引类型>>文档ID,通过这个组合我们就能索引到某个具体的文档。注意:ID不必是整数,实际上它是个字符串。
文档就是一条条数据
user
1 Kite 18
2 Lee 26
之前说elasticsearch是面向文档的,那么就意味着索引和搜索数据的最小单位是文档,elasticsearch中,文档有几个重要属性:
- 自我包含,一篇文档同时包含字段和对应的值,也就是同时包含key:value
- 可以是层次型的,一个文档中包含自文档,复杂的逻辑实体就是这么来的!
- 灵活的结构,文档不依赖预先定义的模式,我们知道关系型数据库中,要提前定义字段才能使用,在elasticsearch中,对于字段是非常灵活的,有时候,我们可以忽略该字段,或者动态的添加一个新的字段。
尽管我们可以随意的新增或者忽略某个字段,但是,每个字段的类型非常重要,比如一个年龄字段类型,可以是字符串也可以是整
型。因为elasticsearch会保存字段和类型之间的映射及其他的设置。这种映射具体到每个映射的每种类型,这也是为什么在elasticsearch 中,类型有时候也称为映射类型。
类型
类型是文档的逻辑容器,就像关系型数据库一样,表格是行的容器。类型中对于字段的定义称为映射,比如name映射为字符串类型。我们说文档是无模式的,它们不需要拥有映射中所定义的所有字段,比如新增一个字段,那么elasticsearch是怎么做的呢?elasticsearch会自动的将新字段加入映射,但是这个字段的不确定它是什么类型,elasticsearch就开始猜,如果这个值是18,那么elasticsearch会认为它是整形。但是elasticsearch也可能猜不对,所以最安全的方式就是提前定义好所需要的映射,这点跟关系型数据库殊途同归了,先定义好字段,然后再使用。
- name : string text
索引
索引就是数据库
索引是映射类型的容器,elasticsearch中的索引是一个非常大的文档集合。索引存储了映射类型的字段和其他设置。然后它们被存储到了各个分片上了。
物理设计:节点和分片如何工作
一个集群至少有一个节点,而一个节点就是一个elasricsearch进程,节点可以有多个索引,如果创建索引,那么索引将会有个5个分片(primary shard,又称主分片)构成,每一个主分片会有一个副本(replica shard,又称复制分片)
倒排索引
elasticsearch 使用的是一种称为倒排索引的结构,采用Lucene倒排索作为底层。这种结构适用于快速的全文搜索,一个索引由文档中所有不重复的列表构成,对于每一个词,都有一个包含它的文档列表。例如,现在有两个文档,每个文档包含如下内容:
study every day,good good up to forever #文档1包含的内容
To forever,study every day,good good up #文档2包含的内容
为了创建倒排索引,首先要将每个文档拆分成独立的词(或称为词条或者tokens),然后创建一个包含所有不重复的词条的排
列表,然后列出每个词条出现在哪个文档:
term | doc_1 | doc_2 |
---|---|---|
Study | √ | × |
To | × | × |
every | √ | √ |
forever | √ | √ |
day | √ | √ |
study | × | √ |
good | √ | √ |
every | √ | √ |
to | √ | × |
up | √ | √ |
搜索 to forever
erm | doc_1 | doc_2 |
---|---|---|
to | √ | × |
forever | √ | √ |
total | 2 | 1 |
两个文档都匹配,但第一个文档比第二个匹配程度更高。如果没有别的条件,现在,这两个包含关键字的文档都将返回。
通过博客标签来搜索博客文章,倒排索引列表就是这样的结构:
博客文章(原始数据) | 索引列表(倒排索引) | ||
---|---|---|---|
博客文章ID | 标签 | 标签 | 博客文章ID |
1 | python | python | 1, 2, 3 |
2 | python | linux | 3, 4 |
3 | linux,python | ||
4 | linux |
如果要搜索含有python标签的文章,那相对于查找所有原始数据而言,查找倒排索引后的数据将会快的多。只需要查看标签这一
栏,然后获取相关的文章ID即可。全过滤掉无关的所有数据,提高效率
elasticsearch的索引和Lucene的索引对比
在elasticsearch中,索引被分为多个分片,每份分片是一个Lucene的索引。所以一个elasticsearch索引是由多个Lucene索引组成
如无特指,说起索引都是指elasticsearch的索引。
ik分词插件
检查安装结果
# 插件安装在 elasticsearch安装目录的 /plugins 目录下
# cmd elasticsearch-plugin list
D:\Program_Files\elasticsearch-8.1.0\bin> elasticsearch-plugin list
测试
启动
# 双击 elasticsearch.bat
npm run start
# kibana.bat
# 访问 http://localhost:5601
# 进入开发工具
Analyzer:
ik_smart
,ik_max_word
, Tokenizer:ik_smart
,ik_max_word
GET _analyze
{
"analyzer": "ik_smart",
"text": "人民共和国"
}
# 显示结果如下:
{
"tokens" : [
{
"token" : "人民共和国",
"start_offset" : 0,
"end_offset" : 5,
"type" : "CN_WORD",
"position" : 0
}
]
}
GET _analyze
{
"analyzer": "ik_max_word",
"text": "人民共和国"
}
# 显示结果如下:
{
"tokens" : [
{
"token" : "人民共和国",
"start_offset" : 0,
"end_offset" : 5,
"type" : "CN_WORD",
"position" : 0
},
{
"token" : "人民",
"start_offset" : 0,
"end_offset" : 2,
"type" : "CN_WORD",
"position" : 1
},
{
"token" : "共和",
"start_offset" : 1,
"end_offset" : 3,
"type" : "CN_WORD",
"position" : 2
},
{
"token" : "共和国",
"start_offset" : 2,
"end_offset" : 5,
"type" : "CN_WORD",
"position" : 3
}
# 此处涉及敏感词,已修改,非程序输出结果
建立自定义词典
plugins\elasticsearch-analysis-ik-8.1.0\config\IKAnalyzer.cfg.xml
<entry key="ext_dict">kite.dic</entry>
在 config 目录下新建 kite.dic
GET _analyze
{
"analyzer": "ik_max_word",
"text": "我是李三四"
}
# 自定义之前
{
"tokens" : [
{
"token" : "我",
"start_offset" : 0,
"end_offset" : 1,
"type" : "CN_CHAR",
"position" : 0
},
{
"token" : "是",
"start_offset" : 1,
"end_offset" : 2,
"type" : "CN_CHAR",
"position" : 1
},
{
"token" : "李三",
"start_offset" : 2,
"end_offset" : 4,
"type" : "CN_WORD",
"position" : 2
},
{
"token" : "三四",
"start_offset" : 3,
"end_offset" : 5,
"type" : "TYPE_CNUM",
"position" : 3
}
]
}
# 自定义之后
{
"tokens" : [
{
"token" : "我",
"start_offset" : 0,
"end_offset" : 1,
"type" : "CN_CHAR",
"position" : 0
},
{
"token" : "是",
"start_offset" : 1,
"end_offset" : 2,
"type" : "CN_CHAR",
"position" : 1
},
#####################
{
"token" : "李三四",
"start_offset" : 2,
"end_offset" : 5,
"type" : "CN_WORD",
"position" : 2
},
#####################
{
"token" : "李三",
"start_offset" : 2,
"end_offset" : 4,
"type" : "CN_WORD",
"position" : 3
},
{
"token" : "三四",
"start_offset" : 3,
"end_offset" : 5,
"type" : "TYPE_CNUM",
"position" : 4
}
]
}
Restful
基础测试
索引操作
PUT test02 # 创建索引(数据库)
GET test02 # 获取索引信息
GET _cat/indices?v # 查看所有索引的当前信息
POST test01/_doc # 向 test01 中添加 document
{
"name":"Kite"
}
DELETE test01 # 删除索引
PUT test01 # 设置索引规则
{
"mappings": {
"properties": {
"name":{
"type": "text"
},
"age":{
"type": "long"
},
"birthday": {
"type": "date"
}
}
}
}
PUT test01/_doc/1 # 向 test01 中添加 document 并指定 id 为 1
{
"name": "Kite",
"age": 15,
"birthday": "2000-01-01"
}
# 结果如下:
{
"_index" : "test01",
"_id" : "1",
"_version" : 1,
"result" : "created",
"_shards" : {
"total" : 2,
"successful" : 1,
"failed" : 0
},
"_seq_no" : 0,
"_primary_term" : 1
}
PUT test01/_doc/1 # 修改数据 (会替换为当前写入数据)
{
"name": "Kite Lee",
"age": 15,
"birthday": "2000-01-01"
}
{
"_index" : "test01",
"_id" : "1",
"_version" : 2,
"result" : "updated",
"_shards" : {
"total" : 2,
"successful" : 1,
"failed" : 0
},
"_seq_no" : 1,
"_primary_term" : 1
}
POST test01/_update/1 # 修改数据(只修改涉及的属性)
{
"doc": {
"name": "Alice"
}
}
文档操作
PUT test01/_doc/1
{
"name":"Kite",
"age":15,
"desc":"for you, thousands of times",
"tags":["romantic", "honset", "brave"]
}
PUT test01/_doc/2
{
"name":"Ermu",
"age":28,
"desc":"for myself, once",
"tags":["stubborn", "dishonset", "coward"]
}
# id 查询
GET test01/_doc/1
GET test01/_doc/2
# 搜索查询
GET test01/_search?q=name:Kite
GET test01/_search?q=name:Kite Lee
查询
# 输入四组数据
PUT test01/_doc/1
{
"name":"Kite Lee",
"age":15,
"desc":"for you, thousands of times",
"tags":["romantic", "honset", "brave"]
}
PUT test01/_doc/2
{
"name":"Ermu",
"age":28,
"desc":"for myself, once",
"tags":["stubborn", "dishonset", "coward"]
}
PUT test01/_doc/3
{
"name":"Kite Liu",
"age":29,
"desc":"Strong and Man",
"tags":["stubborn", "dishonset", "coward"]
}
PUT test01/_doc/4
{
"name":"Kite Li",
"age":89,
"desc":"talk is cheat, show me the code",
"tags":["crazy"]
}
# 查询一
GET test01/_doc/1
# 查询二
GET test01/_search?q=name:Kite
GET test01/_search?q=name:Kite Lee
# 查询三
GET test01/_search
{
"query": {
"match": {
"name": "Kite"
}
},
"_source": ["name", "desc"] # 对输出结果进行过滤
}
# 查询四 过滤、排序、分页
GET test01/_search
{
"query": {
"match": {
"name": "Kite"
}
},
"sort": [ # 排序
{
"age": {
"order": "desc"
}
}
],
"_source": ["name", "age"], # 过滤
"from": 0, # 分页
"size": 2
}
# 查询五 bool(must) 多条件查询
GET test01/_search
{
"query": {
"bool": {
"must": [ # 多个匹配都要符合
{
"match": {
"name": "Kite" # name 为text 类型 会进行分片查询可以匹配 Kite*
}
},
{
"match": {
"age": 15
}
}
]
}
}
}
# 查询六 bool(should) 多条件查询
GET test01/_search
{
"query": {
"bool": {
"should": [ # 多个匹配部分符合
{
"match": {
"name": "Kite" # name 为text 类型 会进行分片查询可以匹配 Kite*
}
},
{
"match": {
"age": 15
}
}
]
}
}
}
# 查询七 bool(must_not) 排除条件查询
GET test01/_search
{
"query": {
"bool": {
"must_not": [
{
"match": {
"age": 15 # 筛选出 age != 15 的
}
}
]
}
}
}
# 查询八 bool(must) filter过滤 排除条件查询
GET test01/_search
{
"query": {
"bool": {
"must": {
"match": {
"name": "Kite"
}
},
"filter": {
"range": {
"age": {
"gte": 10, # 大于等于 10 大于 gt
"lte": 20 # 小于等于 20 小于 lt
}
}
}
}
}
}
# 查询九 多匹配条件查询
GET test01/_search
{
"query": {
"match": {
"tags": "brave crazy" # tags 含有 brave 或 crazy 即可
}
}
}
# 查询十 精确查询 term
# 新建 索引
PUT test02
{
"mappings": {
"properties": {
"name": {
"type": "text"
},
"desc": {
"type": "keyword"
}
}
}
}
# 新建两组数据
PUT test02/_doc/1
{
"name": "Kite Lee is active",
"desc": "Kite Lee is negative"
}
PUT test02/_doc/2
{
"name": "Kite Lee is active2",
"desc": "Kite Lee is negative2"
}
# 查询 keyword 的分词 Kite Lee is active (没有被拆分)
GET _analyze
{
"analyzer": "keyword",
"text": "Kite Lee is active"
}
# 查询 普通字符串 的分词 Kite Lee is active (拆分)
GET _analyze
{
"analyzer": "standard",
"text": "Kite Lee is active"
}
# name 为 text 属性 所以查询为空
GET test02/_search
{
"query": {
"term": {
"name": "Kite Lee is active"
}
}
}
# desc 为 keyword 属性 可以查询到
GET test02/_search
{
"query": {
"term": {
"desc": "Kite Lee is negative"
}
}
}
# 使用 match {desc.keyword}
GET test02/_search
{
"query": {
"match": {
"desc.keyword": "Kite Lee"
}
}
}
GET test02/_search
{
"query": {
"match": {
"name": "Kite Lee"
}
}
}
高亮
GET test02/_search
{
"query": {
"match": {
"name": "Kite Lee"
}
},
"highlight": {
"fields": {
"name": {}
}
}
}
# 结果
{
"took" : 66,
"timed_out" : false,
"_shards" : {
"total" : 1,
"successful" : 1,
"skipped" : 0,
"failed" : 0
},
"hits" : {
"total" : {
"value" : 2,
"relation" : "eq"
},
"max_score" : 0.36464313,
"hits" : [
{
"_index" : "test02",
"_id" : "1",
"_score" : 0.36464313,
"_source" : {
"name" : "Kite Lee is active",
"desc" : "Kite Lee is negative"
},
"highlight" : {
"name" : [
"<em>Kite</em> <em>Lee</em> is active"
]
}
},
{
"_index" : "test02",
"_id" : "2",
"_score" : 0.36464313,
"_source" : {
"name" : "Kite Lee is active2",
"desc" : "Kite Lee is negative2"
},
"highlight" : {
"name" : [
"<em>Kite</em> <em>Lee</em> is active2"
]
}
}
]
}
}
# 自定义高亮格式
GET test02/_search
{
"query": {
"match": {
"name": "Kite Lee"
}
},
"highlight": {
"pre_tags": "<p class='key' style='color:red'p>",
"post_tags": "</p>",
"fields": {
"name": {}
}
}
}
# 结果
{
"took" : 1,
"timed_out" : false,
"_shards" : {
"total" : 1,
"successful" : 1,
"skipped" : 0,
"failed" : 0
},
"hits" : {
"total" : {
"value" : 2,
"relation" : "eq"
},
"max_score" : 0.36464313,
"hits" : [
{
"_index" : "test02",
"_id" : "1",
"_score" : 0.36464313,
"_source" : {
"name" : "Kite Lee is active",
"desc" : "Kite Lee is negative"
},
"highlight" : {
"name" : [
"<p class='key' style='color:red'p>Kite</p> <p class='key' style='color:red'p>Lee</p> is active"
]
}
},
{
"_index" : "test02",
"_id" : "2",
"_score" : 0.36464313,
"_source" : {
"name" : "Kite Lee is active2",
"desc" : "Kite Lee is negative2"
},
"highlight" : {
"name" : [
"<p class='key' style='color:red'p>Kite</p> <p class='key' style='color:red'p>Lee</p> is active2"
]
}
}
]
}
}
SpringBoot 集成 ES
https://www.elastic.co/guide/en/elasticsearch/client/java-rest/current/index.html
maven 依赖
<dependency>
<groupId>org.elasticsearch.client</groupId>
<artifactId>elasticsearch-rest-high-level-client</artifactId>
<version>7.17.3</version>
</dependency>
RestHighLevelClient client = new RestHighLevelClient(
RestClient.builder(
new HttpHost("localhost", 9200, "http"),
new HttpHost("localhost", 9201, "http")));
client.close();
依次访问三个位置,查看版本
或者直接查看依赖
如果要修改依赖版本--pom.xml
<properties>
<elasticsearch.version>7.15.2</elasticsearch.version>
</properties>
package com.lee.config;
import org.apache.http.HttpHost;
import org.elasticsearch.client.RestClient;
import org.elasticsearch.client.RestHighLevelClient;
import org.springframework.context.annotation.Bean;
import org.springframework.context.annotation.Configuration;
@Configuration
public class ElasticSearchConfig {
// <Bean id=requestClient class=RestHighLevelClient>
@Bean
public RestHighLevelClient requestClient() {
RestHighLevelClient client = new RestHighLevelClient(
RestClient.builder(new HttpHost("localhost", 9200, "http")));
return client;
}
}
配置文件源码位置
索引测试
创建索引
new CreateIndexRequest("kite_index")
验证索引是否存在
new GetIndexRequest("kite_index")
删除索引
new DeleteIndexRequest("kite_index")
package com.lee;
import org.elasticsearch.action.admin.indices.delete.DeleteIndexRequest;
import org.elasticsearch.action.support.master.AcknowledgedResponse;
import org.elasticsearch.client.RequestOptions;
import org.elasticsearch.client.RestHighLevelClient;
import org.elasticsearch.client.indices.CreateIndexRequest;
import org.elasticsearch.client.indices.CreateIndexResponse;
import org.elasticsearch.client.indices.GetIndexRequest;
import org.junit.jupiter.api.Test;
import org.springframework.beans.factory.annotation.Autowired;
import org.springframework.beans.factory.annotation.Qualifier;
import org.springframework.boot.test.context.SpringBootTest;
import java.io.IOException;
/**
* es api 测试
*/
@SpringBootTest
class Es01ApiApplicationTests {
@Autowired
@Qualifier("restHighLevelClient")
private RestHighLevelClient client;
// private RestHighLevelClient restHighLevelClient;
// 测试索引的创建
@Test
void testCreatIndex() throws IOException {
// 1. 创建索引请求
CreateIndexRequest request = new CreateIndexRequest("kite_index");
// 2. 客户端执行请求 CreateIndexResponse
CreateIndexResponse createIndexResponse = client.indices().create(request, RequestOptions.DEFAULT);
System.out.println(createIndexResponse);
}
// 测试获取索引请求
@Test
void testGetIndex() throws IOException {
GetIndexRequest request = new GetIndexRequest("kite_index");
boolean exists = client.indices().exists(request, RequestOptions.DEFAULT);
System.out.println(exists);
}
// 测试删除索引
@Test
void testDeleteIndex() throws IOException {
DeleteIndexRequest request = new DeleteIndexRequest("kite_index");
AcknowledgedResponse isDelete = client.indices().delete(request, RequestOptions.DEFAULT);
// 注意 isDelete.isAcknowledged()
System.out.println(isDelete.isAcknowledged());
}
}
文档测试
新增一个实体类 User
import lombok.AllArgsConstructor;
import lombok.Data;
import lombok.NoArgsConstructor;
import org.springframework.stereotype.Component;
@Data
@AllArgsConstructor
@NoArgsConstructor
@Component
public class User {
private String name;
private int age;
}
import com.alibaba.fastjson.JSON;
import com.lee.pojo.User;
import org.elasticsearch.action.admin.indices.delete.DeleteIndexRequest;
import org.elasticsearch.action.bulk.BulkRequest;
import org.elasticsearch.action.bulk.BulkResponse;
import org.elasticsearch.action.delete.DeleteRequest;
import org.elasticsearch.action.delete.DeleteResponse;
import org.elasticsearch.action.get.GetRequest;
import org.elasticsearch.action.get.GetResponse;
import org.elasticsearch.action.index.IndexRequest;
import org.elasticsearch.action.index.IndexResponse;
import org.elasticsearch.action.search.SearchRequest;
import org.elasticsearch.action.search.SearchResponse;
import org.elasticsearch.action.support.master.AcknowledgedResponse;
import org.elasticsearch.action.update.UpdateRequest;
import org.elasticsearch.action.update.UpdateResponse;
import org.elasticsearch.client.RequestOptions;
import org.elasticsearch.client.RestHighLevelClient;
import org.elasticsearch.client.indices.CreateIndexRequest;
import org.elasticsearch.client.indices.CreateIndexResponse;
import org.elasticsearch.client.indices.GetIndexRequest;
import org.elasticsearch.common.xcontent.XContentType;
import org.elasticsearch.core.TimeValue;
import org.elasticsearch.index.query.MatchAllQueryBuilder;
import org.elasticsearch.index.query.QueryBuilders;
import org.elasticsearch.index.query.TermQueryBuilder;
import org.elasticsearch.search.SearchHit;
import org.elasticsearch.search.builder.SearchSourceBuilder;
import org.elasticsearch.search.fetch.subphase.FetchSourceContext;
import org.junit.jupiter.api.Test;
import org.springframework.beans.factory.annotation.Autowired;
import org.springframework.beans.factory.annotation.Qualifier;
import org.springframework.boot.test.context.SpringBootTest;
import java.io.IOException;
import java.util.ArrayList;
/**
* es api 测试
*/
@SpringBootTest
class Es01ApiApplicationTests {
private final String kite_index = "kite_index";
@Autowired
@Qualifier("restHighLevelClient")
private RestHighLevelClient client;
// private RestHighLevelClient restHighLevelClient;
// 测试索引的创建
@Test
void testCreatIndex() throws IOException {
// 1. 创建索引请求
CreateIndexRequest request = new CreateIndexRequest(kite_index);
// 2. 客户端执行请求 CreateIndexResponse
CreateIndexResponse createIndexResponse = client.indices().create(request, RequestOptions.DEFAULT);
System.out.println(createIndexResponse);
}
// 测试获取索引请求
@Test
void testGetIndex() throws IOException {
GetIndexRequest request = new GetIndexRequest(kite_index);
boolean exists = client.indices().exists(request, RequestOptions.DEFAULT);
System.out.println(exists);
}
// 测试删除索引
@Test
void testDeleteIndex() throws IOException {
DeleteIndexRequest request = new DeleteIndexRequest(kite_index);
AcknowledgedResponse isDelete = client.indices().delete(request, RequestOptions.DEFAULT);
// 注意 isDelete.isAcknowledged()
System.out.println(isDelete.isAcknowledged());
}
// -------------------------------- 文档测试 ----------------------------------------------
/**
* 添加文档
* @throws IOException
*/
@Test
void testAddDocument() throws IOException {
// 创建对象
User user = new User("Kite", 15);
// 创建请求
IndexRequest request = new IndexRequest(kite_index);
request.id("1");
request.timeout(TimeValue.timeValueSeconds(1));
// request.timeout("1s");
// 将数据放入请求
request.source(JSON.toJSONString(user), XContentType.JSON);
// ObjectMapper jackson = new ObjectMapper();
// request.source(jackson.writeValueAsString(user), XContentType.JSON);
// 客户端发送请求
IndexResponse indexResponse = null;
try {
indexResponse = client.index(request, RequestOptions.DEFAULT);
} catch (IOException e) {
// 会报错IOException : Unable to parse response body for Response, 但是完成了添加
// e.printStackTrace();
}
// System.out.println(indexResponse.status());
}
/**
* 判断是否存在 /kite_index/_doc/1
* @throws IOException
*/
@Test
void testExistsDocument() throws IOException {
GetRequest getRequest = new GetRequest(kite_index, "1");
// 不获取返回的 _source 的上下文
getRequest.fetchSourceContext(new FetchSourceContext(false));
getRequest.storedFields("_none_");
boolean exists = client.exists(getRequest, RequestOptions.DEFAULT);
System.out.println(exists);
}
/**
* 获得文档的信息
* @throws IOException
*/
@Test
void testGetDocument() throws IOException {
GetRequest getRequest = new GetRequest(kite_index, "1");
GetResponse getResponse = client.get(getRequest, RequestOptions.DEFAULT);
System.out.println(getResponse.getSourceAsString()); // {"age":15,"name":"Kite"}
System.out.println(getResponse);
/*
与命令格式相同
{
"_index":"kite_index",
"_type":null,"_id":"1",
"_version":1,
"_seq_no":0,
"_primary_term":1,
"found":true,
"_source":{"age":15,"name":"Kite"}
}
*/
}
/**
* 更新文档
* @throws IOException
*/
@Test
void testUpdateDocument() throws IOException {
UpdateRequest updateRequest = new UpdateRequest("kite_index", "1");
updateRequest.timeout(TimeValue.timeValueSeconds(1));
User user2 = new User("Lee", 28);
// ObjectMapper jackson = new ObjectMapper();
// updateRequest.doc(jackson.writeValueAsString(user2), XContentType.JSON);
updateRequest.doc(JSON.toJSONString(user2), XContentType.JSON);
// 使用 fastJSON 更新出错,会增加一条json , 后来又恢复正常, 不知道原因
UpdateResponse updateResponse;
try {
updateResponse = client.update(updateRequest, RequestOptions.DEFAULT);
} catch (IOException e) {
// 同 添加文档 时的 IO异常
}
// System.out.println(updateResponse.status());
}
/**
* 删除文档
* @throws IOException
*/
@Test
void testDeleteDocument() throws IOException {
DeleteRequest deleteRequest = new DeleteRequest("kite_index", "1");
deleteRequest.timeout("1s");
try {
DeleteResponse deleteResponse = client.delete(deleteRequest, RequestOptions.DEFAULT);
} catch (IOException e) {
// e.printStackTrace();
}
// System.out.println(deleteResponse.status());
}
/**
* 批量插入数据
* @throws IOException
*/
@Test
void testBulkRequest() throws IOException {
BulkRequest bulkRequest = new BulkRequest();
bulkRequest.timeout(TimeValue.timeValueSeconds(10));
ArrayList<User> users = new ArrayList<>();
users.add(new User("Alice", 25));
users.add(new User("Bob", 50));
users.add(new User("Cindy", 75));
users.add(new User("David", 100));
for (int i = 0; i < users.size(); i++) {
bulkRequest.add(
new IndexRequest("kite_index")
.id(String.valueOf(i + 1))
.source(JSON.toJSONString(users.get(i)), XContentType.JSON)
);
}
try {
BulkResponse bulkResponse = client.bulk(bulkRequest, RequestOptions.DEFAULT);
} catch (IOException e) {
// e.printStackTrace();
}
// System.out.println(bulkResponse.hasFailures());
}
/**
* 查询
* @throws IOException
*/
@Test
void testSearch() throws IOException {
SearchRequest searchRequest = new SearchRequest(kite_index);
// 构建搜索条件
SearchSourceBuilder searchSourceBuilder = new SearchSourceBuilder();
// 查询条件
// 精确匹配
TermQueryBuilder termQuery = QueryBuilders.termQuery("name", "Cindy");
// 匹配所有
MatchAllQueryBuilder matchAllQuery = QueryBuilders.matchAllQuery();
searchSourceBuilder.query(termQuery);
// 分页
// searchSourceBuilder.from(2);
// searchSourceBuilder.size(1);
searchSourceBuilder.timeout(TimeValue.timeValueSeconds(60));
searchRequest.source(searchSourceBuilder);
SearchResponse searchResponse = client.search(searchRequest, RequestOptions.DEFAULT);
System.out.println(JSON.toJSONString(searchResponse.getHits()));
System.out.println("========================================");
for (SearchHit documentFields : searchResponse.getHits().getHits()) {
System.out.println(documentFields.getSourceAsMap());
}
}
}
模仿京东搜索
application.properties
server.port=9090
# 关闭 thymeleaf 的缓存
spring.thymeleaf.cache=false
爬取数据
package com.lee.utils;
import com.lee.pojo.Content;
import org.jsoup.Jsoup;
import org.jsoup.nodes.Document;
import org.jsoup.nodes.Element;
import org.jsoup.select.Elements;
import java.io.IOException;
import java.net.URL;
import java.util.ArrayList;
import java.util.List;
public class HtmlParseUtil {
public List<Content> parseJD(String keywords) throws IOException {
String url = "https://search.jd.com/Search?keyword=" + keywords + "&enc=utf-8";
Document document = Jsoup.parse(new URL(url), 60000);
Element element = document.getElementById("J_goodsList");
// System.out.println(element.html());
Elements elements = element.getElementsByTag("li");
ArrayList<Content> goodsList = new ArrayList<>();
for (Element elem : elements) {
String img = elem.getElementsByTag("img").eq(0).attr("data-lazy-img");
String price = elem.getElementsByClass("p-price").eq(0).text();
String title = elem.getElementsByClass("p-name").eq(0).text();
Content content = new Content();
content.setTitle(title);
content.setImg(img);
content.setPrice(price);
goodsList.add(content);
// System.out.println("===============================================");
// System.out.println("img : " + img + "\n price : " + price + ",\t title : " + title);
}
return goodsList;
}
public static void main(String[] args) throws IOException {
new HtmlParseUtil().parseJD("深入理解").forEach(System.out::println);
}
}
数据写入ES 实现查询功能
package com.lee.config;
import org.apache.http.HttpHost;
import org.elasticsearch.client.RestClient;
import org.elasticsearch.client.RestHighLevelClient;
import org.springframework.context.annotation.Bean;
import org.springframework.context.annotation.Configuration;
@Configuration
public class ElasticSearchConfig {
// <Bean id=requestClient class=RestHighLevelClient>
@Bean
public RestHighLevelClient restHighLevelClient() {
RestHighLevelClient client = new RestHighLevelClient(
RestClient.builder(new HttpHost("localhost", 9200, "http")));
return client;
}
}
创建实体类 Content
import lombok.AllArgsConstructor;
import lombok.Data;
import lombok.NoArgsConstructor;
@Data
@AllArgsConstructor
@NoArgsConstructor
public class Content {
private String title;
private String img;
private String price;
}
package com.lee.service;
import com.alibaba.fastjson.JSON;
import com.lee.pojo.Content;
import com.lee.utils.HtmlParseUtil;
import org.elasticsearch.action.bulk.BulkRequest;
import org.elasticsearch.action.bulk.BulkResponse;
import org.elasticsearch.action.index.IndexRequest;
import org.elasticsearch.action.search.SearchRequest;
import org.elasticsearch.action.search.SearchResponse;
import org.elasticsearch.client.RequestOptions;
import org.elasticsearch.client.RestHighLevelClient;
import org.elasticsearch.common.xcontent.XContentType;
import org.elasticsearch.core.TimeValue;
import org.elasticsearch.index.query.QueryBuilders;
import org.elasticsearch.index.query.TermQueryBuilder;
import org.elasticsearch.search.SearchHit;
import org.elasticsearch.search.builder.SearchSourceBuilder;
import org.springframework.beans.factory.annotation.Autowired;
import org.springframework.stereotype.Service;
import javax.naming.directory.SearchResult;
import java.io.IOException;
import java.util.ArrayList;
import java.util.List;
import java.util.Map;
@Service
public class ContentService {
@Autowired
private RestHighLevelClient restHighLevelClient; // 注入 Bean
// 1. 解析数据放入 es 索引中
public boolean parseContent(String keywords) throws IOException {
List<Content> contents = new HtmlParseUtil().parseJD(keywords);
BulkRequest bulkRequest = new BulkRequest();
bulkRequest.timeout("2m");
for (int i = 0; i < contents.size(); i++) {
bulkRequest.add(new IndexRequest("jd_goods")
.source(JSON.toJSONString(contents.get(i)), XContentType.JSON));
}
BulkResponse bulkResponse = restHighLevelClient.bulk(bulkRequest, RequestOptions.DEFAULT);
return !bulkResponse.hasFailures();
}
// 获取这些数据实现搜索功能
public List<Map<String, Object>> searchPage(String keywords, int pageNo, int pageSize) throws IOException {
if (pageNo <= 1) {
pageNo = 1;
}
// 条件搜索
SearchRequest searchRequest = new SearchRequest("jd_goods");
SearchSourceBuilder searchSourceBuilder = new SearchSourceBuilder();
// 分页
searchSourceBuilder.from(pageNo);
searchSourceBuilder.size(pageSize);
// 精准匹配
TermQueryBuilder termQueryBuilder = QueryBuilders.termQuery("title", keywords);
searchSourceBuilder.query(termQueryBuilder);
searchSourceBuilder.timeout(TimeValue.timeValueMinutes(1));
// 执行搜索
searchRequest.source(searchSourceBuilder);
SearchResponse searchResponse = restHighLevelClient.search(searchRequest, RequestOptions.DEFAULT);
// 解析结果
ArrayList<Map<String, Object>> list = new ArrayList<>();
for (SearchHit documentFields : searchResponse.getHits().getHits()) {
list.add(documentFields.getSourceAsMap());
}
return list;
}
public static void main(String[] args) throws IOException {
new ContentService().parseContent("java");
}
}
package com.lee.controller;
import com.lee.service.ContentService;
import org.springframework.beans.factory.annotation.Autowired;
import org.springframework.web.bind.annotation.GetMapping;
import org.springframework.web.bind.annotation.PathVariable;
import org.springframework.web.bind.annotation.RestController;
import java.io.IOException;
import java.util.List;
import java.util.Map;
@RestController
public class ContentController {
@Autowired
private ContentService contentService;
@GetMapping("/parse/{keywords}")
public boolean parseContent(@PathVariable("keywords") String keywords) throws IOException {
return contentService.parseContent(keywords);
}
return contentService.searchPage(keywords, pageNo, pageSize);
}
}
对接前端
npm init
npm install vue
npm install axios
前端代码格式
index.html 导入资源的路径
<link rel="stylesheet" th:href="@{/css/style.css}"/>
<script th:src="@{/js/jquery.min.js}"></script>
<script th:src="@{/js/vue.min.js}"></script>
<script th:src="@{/js/axios.min.js}"></script>
controller
package com.lee.controller;
import com.lee.service.ContentService;
import org.springframework.beans.factory.annotation.Autowired;
import org.springframework.web.bind.annotation.GetMapping;
import org.springframework.web.bind.annotation.PathVariable;
import org.springframework.web.bind.annotation.RestController;
import java.io.IOException;
import java.util.List;
import java.util.Map;
@RestController
public class ContentController {
@Autowired
private ContentService contentService;
@GetMapping("/parse/{keywords}")
public boolean parseContent(@PathVariable("keywords") String keywords) throws IOException {
return contentService.parseContent(keywords);
}
@GetMapping("/search/{keywords}/{pageNo}/{pageSize}")
public List<Map<String, Object>> search(@PathVariable("keywords") String keywords,
@PathVariable("pageNo") int pageNo,
@PathVariable("pageSize") int pageSize)
throws IOException {
return contentService.searchPage(keywords, pageNo, pageSize);
}
}
参数传递实现
<script>
new Vue({
el:"#app",
data:{
"keyword": '', // 搜索的关键字
"results":[] // 后端返回的结果
},
methods:{
searchKey(){
var keywords = this.keyword;
console.log(keywords);
axios.get('search/'+keywords+'/0/20').then(response=>{ // get 方法
console.log(response.data);
this.results=response.data;
})
}
}
});
</script>
页面显示数据
v-for="result in results"
<!-- 商品详情 -->
<div class="view grid-nosku" >
<div class="product" v-for="result in results">
<div class="product-iWrap">
<!--商品封面-->
<div class="productImg-wrap">
<a class="productImg">
<img :src="result.img">
</a>
</div>
<!--价格-->
<p class="productPrice">
<em v-text="result.price"></em>
</p>
<!--标题-->
<p class="productTitle">
<a v-html="result.name"></a>
</p>
<!-- 店铺名 -->
<div class="productShop">
<span>店铺: Kite Lee </span>
</div>
<!-- 成交信息 -->
<p class="productStatus">
<span>月成交<em>999笔</em></span>
<span>评价 <a>3</a></span>
</p>
</div>
</div>
</div>
高亮搜索词
// 获取数据实现搜索并高亮功能
public List<Map<String, Object>> searchPageHighlight(String keywords, int pageNo, int pageSize) throws IOException {
if (pageNo <= 1) {
pageNo = 1;
}
// 条件搜索
SearchRequest searchRequest = new SearchRequest("jd_goods");
SearchSourceBuilder searchSourceBuilder = new SearchSourceBuilder();
// 分页
searchSourceBuilder.from(pageNo);
searchSourceBuilder.size(pageSize);
// 精准匹配
TermQueryBuilder termQueryBuilder = QueryBuilders.termQuery("title", keywords);
searchSourceBuilder.query(termQueryBuilder);
searchSourceBuilder.timeout(TimeValue.timeValueMinutes(1));
// 高亮 !!!
HighlightBuilder highlightBuilder = new HighlightBuilder();
highlightBuilder.field("title");
highlightBuilder.requireFieldMatch(false); // 关闭多个高亮
highlightBuilder.preTags("<span style='color:red'>");
highlightBuilder.postTags("</span>");
searchSourceBuilder.highlighter(highlightBuilder);
// 执行搜索
searchRequest.source(searchSourceBuilder);
SearchResponse searchResponse = restHighLevelClient.search(searchRequest, RequestOptions.DEFAULT);
// 解析结果
ArrayList<Map<String, Object>> list = new ArrayList<>();
for (SearchHit hit : searchResponse.getHits().getHits()) {
// !!!
Map<String, HighlightField> highlightFields = hit.getHighlightFields();
HighlightField title = highlightFields.get("title");
Map<String, Object> sourceAsMap = hit.getSourceAsMap();
if (title != null) {
Text[] fragments = title.fragments();
String new_title = "";
for (Text text : fragments) {
new_title += text;
}
sourceAsMap.put("title", new_title); // 高亮字段替换原来字段
}
list.add(sourceAsMap);
}
return list;
}
controller
@GetMapping("/search/{keywords}/{pageNo}/{pageSize}")
public List<Map<String, Object>> search(@PathVariable("keywords") String keywords,
@PathVariable("pageNo") int pageNo,
@PathVariable("pageSize") int pageSize)
throws IOException {
// return contentService.searchPage(keywords, pageNo, pageSize);
return contentService.searchPageHighlight(keywords, pageNo, pageSize); // 高亮搜索词汇
}
【推荐】国内首个AI IDE,深度理解中文开发场景,立即下载体验Trae
【推荐】编程新体验,更懂你的AI,立即体验豆包MarsCode编程助手
【推荐】抖音旗下AI助手豆包,你的智能百科全书,全免费不限次数
【推荐】轻量又高性能的 SSH 工具 IShell:AI 加持,快人一步
· 被坑几百块钱后,我竟然真的恢复了删除的微信聊天记录!
· 【自荐】一款简洁、开源的在线白板工具 Drawnix
· 没有Manus邀请码?试试免邀请码的MGX或者开源的OpenManus吧
· 园子的第一款AI主题卫衣上架——"HELLO! HOW CAN I ASSIST YOU TODAY
· 无需6万激活码!GitHub神秘组织3小时极速复刻Manus,手把手教你使用OpenManus搭建本