谷粒商城分布式高级(二)—— ElasticSearch全文检索

 


一、ElasticSearch-全文检索

1、简介

复制代码
https://www.elastic.co/cn/what-is/elasticsearch 
全文搜索属于最常见的需求,开源的 Elasticsearch 是目前全文搜索引擎的首选。 
它可以快速地储存、搜索和分析海量数据。维基百科、Stack Overflow、Github 都采用它
  Elastic 的底层是开源库 Lucene。但是,你没法直接用 Lucene,必须自己写代码去调用它的接口。Elastic 是 Lucene 的封装,提供了 REST API 的操作接口,开箱即用。
  
  REST API:天然的跨平台。
  官方文档:https://www.elastic.co/guide/en/elasticsearch/reference/current/index.html
  官方中文:https://www.elastic.co/guide/cn/elasticsearch/guide/current/foreword_id.html
  社区中文:
      https://es.xiaoleilu.com/index.html
      http://doc.codingdict.com/elasticsearch/0/
复制代码

2、基本概念

1、Index(索引)
    动词,相当于 MySQL 中的 insert; 
    名词,相当于 MySQL 中的 Database
2、Type(类型)
    在 Index(索引)中,可以定义一个或多个类型。 
    类似于 MySQL 中的 Table;每一种类型的数据放在一起;
3、Document(文档) 
    保存在某个索引(Index)下,某种类型(Type)的一个数据(Document),文档是 JSON 格 式的,
Document 就像是 MySQL 中的某个 Table 里面的内容;
4、倒排索引机制

3、Docker 安装 Es

1、下载镜像文件
  docker pull elasticsearch:7.4.2 存储和检索数据
  docker pull kibana:7.4.2 可视化检索数据 
  注意:版本要统一
先检查一下虚拟机的可用内存
复制代码
2、安装 ElasticSearch
(1)创建挂载目录
    mkdir -p /mydata/elasticsearch/plugins
    mkdir -p /mydata/elasticsearch/config
    mkdir -p /mydata/elasticsearch/data

 (2)设置es可以被远程任何机器访问

     echo "http.host: 0.0.0.0" >> /mydata/elasticsearch/config/elasticsearch.yml

 (3)递归更改权限,es需要访问

    chmod -R 777 /mydata/elasticsearch

    注意:一定要授权,否则后面启动的时候会访问拒绝,没权限

          

 (4)创建实例,启动 Elastic search

    docker run --name elasticsearch -p 9200:9200 -p 9300:9300 -e "discovery.type=single-node" \-e ES_JAVA_OPTS="-Xms64m -Xmx128m" -v /mydata/elasticsearch/config/elasticsearch.yml:/usr/share/elasticsearch/config/elasticsearch.yml -v /mydata/elasticsearch/data:/usr/share/elasticsearch/data -v /mydata/elasticsearch/plugins:/usr/share/elasticsearch/plugins -d elasticsearch:7.4.2

  注意:

    # 9200是用户交互端口 9300是集群心跳端口

    # -e指定是单阶段运行

    # -e指定占用的内存大小,生产时可以设置32G 

  以后再外面装好插件重启即可;

  特别注意:

    -e ES_JAVA_OPTS="-Xms64m -Xmx128m" \ 测试环境下,设置 ES 的初始内存和最大内存,否则导致过大启动不了 ES 

(5)设置随docker自启动
   docker update elasticsearch --restart=always

 (6)测试访问

    查看elasticsearch版本信息:http://192.168.56.10:9200/

  显示elasticsearch 节点信息http://192.168.11.129:9200/_cat/nodes

  127.0.0.1 69 99 9 1.07 0.78 0.56 dilm * 79966af1bf0e

  79966af1bf0e代表上面的节点,*代表是主节点

复制代码
复制代码
3、安装 kibana
(1)创建实例并启动
    docker run --name kibana -e ELASTICSEARCH_HOSTS=http://192.168.56.10:9200 -p 5601:5601 -d kibana:7.4.2 
  注意:http://192.168.56.10:9200 一定改为自己虚拟机的地址
 (2)设置随docker自启动
   docker update kibana --restart=always
 (3)测试
  访问:http://192.168.56.10:5601/app/kibana
复制代码

4、初步检索

复制代码
1、_cat
(1)GET /_cat/nodes:查看所有节点
    请求:http://192.168.56.10:9200/_cat/nodes
     返回:127.0.0.1 53 99 18 1.64 1.34 0.89 dilm * 79966af1bf0e
     注意:79966af1bf0e代表上面的节点,*代表是主节点
(2)GET /_cat/health:查看es健康状况
    请求:http://192.168.56.10:9200/_cat/health
    返回:1639660246 13:10:46 elasticsearch green 1 1 2 2 0 0 0 0 - 100.0%
    注意:green表示健康值正常
(3)GET /_cat/master:查看主节点
    请求:http://192.168.56.10:9200/_cat/master
    返回:yV_3GAgSRlCZRbsaZGmwvg 127.0.0.1 127.0.0.1 79966af1bf0e
    注意:主节点唯一编号、虚拟机地址
(4)GET /_cat/indices:查看所有索引,等价于mysql的show database;
    请求:http://192.168.56.10:9200/_cat/indices
    返回:green open .kibana_task_manager_1 TgtHkBPEQa27TRzx7csj_A 1 0 2 0 38.3kb 38.3kb
       green open .kibana_1 WX3cdamiRh-ylxDECyU5qg 1 0 3 0 14.8kb 14.8kb
复制代码
复制代码
2、索引一个文档(保存)
  保存一个数据,保存在哪个索引的哪个类型下,指定用哪个唯一标识
  (1)PUT customer/external/1;在 customer 索引下的 external 类型下保存 1 号数据为 
      请求:http://192.168.56.10:9200/customer/external/1
         在customer索引下的external类型下保存1号数据为
         { "name":"John Doe" }
    注意:
      PUT和POST都可以 POST新增。如果不指定id,会自动生成id。指定id就会修改这个数据,并新增版本号;
      PUT可以新增也可以修改。PUT必须指定id;由于PUT需要指定id,我们一般用来做修改操作,不指定id会报错。 唯一区分是post不指定id时永远为创建
复制代码
复制代码
创建数据成功后,显示201 created表示插入记录成功。
返回数据:
带有下划线开头的,称为元数据,反映了当前的基本信息。  
{
    "_index": "customer", 表明该数据在哪个数据库下;
    "_type": "external", 表明该数据在哪个类型下;
    "_id": "1",  表明被保存数据的id;
    "_version": 1,  被保存数据的版本
    "result": "created", 这里是创建了一条数据,如果重新put一条数据,则该状态会变为updated,并且版本号也会发生变化。
    "_shards": {
        "total": 2,
        "successful": 1,
        "failed": 0
    },
    "_seq_no": 0,
    "_primary_term": 1
}

下面选用POST方式:
添加数据的时候,不指定ID,会自动的生成id,并且类型是新增:
{
    "_index": "customer",
    "_type": "external",
    "_id": "5MIjvncBKdY1wAQm-wNZ",
    "_version": 1,
    "result": "created",
    "_shards": {
        "total": 2,
        "successful": 1,
        "failed": 0
    },
    "_seq_no": 11,
    "_primary_term": 6
}

再次使用POST插入数据,不指定ID,仍然是新增的:
{
    "_index": "customer",
    "_type": "external",
    "_id": "5cIkvncBKdY1wAQmcQNk",
    "_version": 1,
    "result": "created",
    "_shards": {
        "total": 2,
        "successful": 1,
        "failed": 0
    },
    "_seq_no": 12,
    "_primary_term": 6
}


添加数据的时候,指定ID,会使用该id,并且类型是新增:
{
    "_index": "customer",
    "_type": "external",
    "_id": "2",
    "_version": 1,
    "result": "created",
    "_shards": {
        "total": 2,
        "successful": 1,
        "failed": 0
    },
    "_seq_no": 13,
    "_primary_term": 6
}


再次使用POST插入数据,指定同样的ID,类型为updated
{
    "_index": "customer",
    "_type": "external",
    "_id": "2",
    "_version": 2,
    "result": "updated",
    "_shards": {
        "total": 2,
        "successful": 1,
        "failed": 0
    },
    "_seq_no": 14,
    "_primary_term": 6
}
复制代码
复制代码
3、查询文档 & 乐观锁字段
(1)GET /customer/external/1:查询文档
    请求:http://192.168.56.10:9200/customer/external/1
     返回:{        "_index": "customer",        "_type": "external",        "_id": "1",        "_version": 10,        "_seq_no": 18,//并发控制字段,每次更新都会+1,用来做乐观锁        "_primary_term": 6,//同上,主分片重新分配,如重启,就会变化        "found": true,         "_source": {         "name": "John Doe"        }        }
 通过“if_seq_no=1&if_primary_term=1”,当序列号匹配的时候,才进行修改,否则不修改。 

(2)实例:将id=1的数据更新为name=1,然后再次更新为name=2,起始1_seq_no=24,_primary_term=1
  (a)将name更新为1
     PUT http://192.168.56.10:9200/customer/external/1

  (b)将name更新为2,更新过程中使用seq_no=24

     PUT http://192.168.56.10:9200/customer/external/1?if_seq_no=24&if_primary_term=1

  出现更新错误

  查询新的数据,发现_seq_no=25,是第一次更新完毕之后的_seq_no

  再次更新,更新成功

  PUT http://192.168.56.10:9200/customer/external/1?if_seq_no=26&if_primary_term=1

复制代码
复制代码
4、更新文档
(1)更新文档_update
  (a)带有update的POST
    POST customer
/externel/1/_update     {     "doc":{      "name":"111"     }     }     或者   (b)不带update的POST
    POST customer
/externel/1     {     "doc":{      "name":"222"     }     }     或者   (c)不带update的PUT
    PUT customer
/externel/1     {     "doc":{       "name":"222"     }     }
(2)不同
  (a)POST操作会对比源文档数据,如果相同不会有什么操作,文档version不增加。
  (b)PUT操作总会将数据重新保存并增加version版本
  (3)POST时带_update对比元数据如果一样就不进行任何操作。
(3)看场景
  (a)对于大并发更新,不带update
  (b)对于大并发查询偶尔更新,带update;对比更新,重新计算分配规则
(4)实例
 

  如果再次执行更新,则不执行任何操作,序列号也不发生变化,返回:

  {
    "_index": "customer",
      "_type": "external",
    "_id": "1",
    "_version": 9,
    "result": "noop",
    "_shards": {
      "total": 0,
      "successful": 0,
      "failed": 0
    },
    "_seq_no": 28,
    "_primary_term": 2
  }

  POST更新方式,会对比原来的数据,和原来的相同,则不执行任何操作(version和_seq_no)都不变。

 POST更新文档,不带_update 
 在更新过程中,重复执行更新操作,数据也能够更新成功,不会和原来的数据进行对比。_seq_no会变化

  {
    "_index": "customer",
    "_type": "external",
    "_id": "1",
    "_version": 13,
    "result": "updated",
    "_shards": {
      "total": 2,
      "successful": 1,
      "failed": 0
    },
    "_seq_no": 32,
    "_primary_term": 2
  }

复制代码
复制代码
5、删除文档 & 索引
(1)语法
    DELETE customer/external/1
    DELETE customer
  注:elasticsearch并没有提供删除类型的操作,只提供了删除索引和文档的操作。
(2)实例
  删除id=1的数据,删除后继续查询
  DELETE http://192.168.56.10:9200/customer/external/1

  {
    "_index": "customer",
    "_type": "external",
    "_id": "1",
    "_version": 14,
    "result": "deleted",
    "_shards": {
      "total": 2,
      "successful": 1,
      "failed": 0
    },
    "_seq_no": 33,
    "_primary_term": 2
  }

    再次执行DELETE http://192.168.56.10:9200/customer/external/1

  {
    "_index": "customer",
    "_type": "external",
    "_id": "1",
    "_version": 1,
    "result": "not_found",
    "_shards": {
      "total": 2,
      "successful": 1,
      "failed": 0
    },
    "_seq_no": 34,
    "_primary_term": 2
  }

    GET http://192.168.56.10:9200/customer/external/1

  {
    "_index": "customer",
    "_type": "external",
    "_id": "1",
    "found": false
  }

 (3)删除整个customer索引数据

    删除前,所有的索引http://192.168.56.10:9200/_cat/indices

  green open .kibana_task_manager_1 TgtHkBPEQa27TRzx7csj_A 1 0 2 0 30.4kb 30.4kb
  green open .kibana_1 WX3cdamiRh-ylxDECyU5qg 1 0 5 0 18.3kb 18.3kb
  yellow open customer Vaet8xVXTOSEnDQzcmfnyw 1 1 6 4 13.9kb 13.9kb

  删除“ customer ”索引 DELTE http://192.168.56.10:9200/customer
  响应

  {
    "acknowledged": true
  }

    删除后,所有的索引http://192.168.56.10:9200/_cat/indices

  green open .kibana_task_manager_1 TgtHkBPEQa27TRzx7csj_A 1 0 2 0 30.4kb 30.4kb
  green open .kibana_1 WX3cdamiRh-ylxDECyU5qg 1 0 5 0 18.3kb 18.3kb

复制代码
复制代码
6、bulk 批量 API
 (1)匹配导入数据   POST http://192.168.56.10:9200/customer/external/_bulk   两行为一个整体   {"index":{"_id":"1"}}   {"name":"a"}   {"index":{"_id":"2"}}   {"name":"b"} 注意格式json和text均不可,要去kibana里Dev Tools

语法格式:

  { action: { metadata }}\n
  { request body}\n
  { action: { metadata }}\n
  { request body}\n
  bulk API 以此按顺序执行所有的 action(动作)。如果一个单个的动作因任何原因而失败,它将继续处理它后面剩余的动作。当 bulk API 返回时,它将提供每个动作的状态(与发送的顺序相同),所以您可以检查是否一个指定的动作是不是失败了。
 
(2)实例1:执行多条数据
POST /customer/external/_bulk
{"index":{"_id":"1"}}
{"name":"John Doe"}
{"index":{"_id":"2"}}
{"name":"John Doe"}
  执行结果:
#! Deprecation: [types removal] Specifying types in bulk requests is deprecated.
{
  "took" : 105,
  "errors" : false,
  "items" : [
    {
      "index" : {
        "_index" : "customer",
        "_type" : "external",
        "_id" : "1",
        "_version" : 2,
        "result" : "updated",
        "_shards" : {
          "total" : 2,
          "successful" : 1,
          "failed" : 0
        },
        "_seq_no" : 2,
        "_primary_term" : 1,
        "status" : 200
      }
    },
    {
      "index" : {
        "_index" : "customer",
        "_type" : "external",
        "_id" : "2",
        "_version" : 2,
        "result" : "updated",
        "_shards" : {
          "total" : 2,
          "successful" : 1,
          "failed" : 0
        },
        "_seq_no" : 3,
        "_primary_term" : 1,
        "status" : 200
      }
    }
  ]
}
(3)实例2:对于整个索引执行批量操作
POST /_bulk
{"delete":{"_index":"website","_type":"blog","_id":"123"}}
{"create":{"_index":"website","_type":"blog","_id":"123"}}
{"title":"my first blog post"}
{"index":{"_index":"website","_type":"blog"}}
{"title":"my second blog post"}
{"update":{"_index":"website","_type":"blog","_id":"123"}}
{"doc":{"title":"my updated blog post"}}
  执行结果:
#! Deprecation: [types removal] Specifying types in bulk requests is deprecated.
{
  "took" : 625,
  "errors" : false,
  "items" : [
    {
      "delete" : {
        "_index" : "website",
        "_type" : "blog",
        "_id" : "123",
        "_version" : 1,
        "result" : "not_found",
        "_shards" : {
          "total" : 2,
          "successful" : 1,
          "failed" : 0
        },
        "_seq_no" : 0,
        "_primary_term" : 1,
        "status" : 404
      }
    },
    {
      "create" : {
        "_index" : "website",
        "_type" : "blog",
        "_id" : "123",
        "_version" : 2,
        "result" : "created",
        "_shards" : {
          "total" : 2,
          "successful" : 1,
          "failed" : 0
        },
        "_seq_no" : 1,
        "_primary_term" : 1,
        "status" : 201
      }
    },
    {
      "index" : {
        "_index" : "website",
        "_type" : "blog",
        "_id" : "u3xrxn0BOYHMBBh2yP0L",
        "_version" : 1,
        "result" : "created",
        "_shards" : {
          "total" : 2,
          "successful" : 1,
          "failed" : 0
        },
        "_seq_no" : 2,
        "_primary_term" : 1,
        "status" : 201
      }
    },
    {
      "update" : {
        "_index" : "website",
        "_type" : "blog",
        "_id" : "123",
        "_version" : 3,
        "result" : "updated",
        "_shards" : {
          "total" : 2,
          "successful" : 1,
          "failed" : 0
        },
        "_seq_no" : 3,
        "_primary_term" : 1,
        "status" : 200
      }
    }
  ]
}
复制代码
复制代码
7、样本测试数据
  准备了一份顾客银行账户信息的虚构的JSON文档样本。每个文档都有下列的schema(模式)。
{ "account_number": 1, "balance": 39225, "firstname": "Amber", "lastname": "Duke", "age": 32, "gender": "M", "address": "880 Holmes Lane", "employer": "Pyrami", "email": "amberduke@pyrami.com", "city": "Brogan", "state": "IL" }
https://gitee.com/xlh_blog/common_content/blob/master/es%E6%B5%8B%E8%AF%95%E6%95%B0%E6%8D%AE.json,导入数据
POST bank/account/_bulk 上面的数据
http://192.168.56.10:9200/_cat/indices 刚导入了1000条 yellow open bank GAry5upkQga9NZ6gY2Gyfw 1 1 1000 0 427.7kb 427.7kb
复制代码

 数据内容

5、进阶检索

复制代码
1、SearchAPI
  ES支持两种基本方式检索;
  (a)通过REST request uri 发送搜索参数 (uri +检索参数);
  (b)通过REST request body 来发送它们(uri+请求体);
(1)检索信息
(a)一切检索从_search开始

 (b)uri+请求体进行检索

复制代码
复制代码
2、Query DSL
(1)基本语法格式
  Elasticsearch 提供了一个可以执行查询的 Json 风格的 DSL(domain-specific language 领域特定语言)。这个被称为 Query DSL。该查询语言非常全面,
并且刚开始的时候感觉有点复杂,真正学好它的方法是从一些基础的示例开始的
(a)一个查询语句 的典型结构

 (b)如果是针对某个字段,那么它的结构如下:

(2)返回部分字段

(3)match【匹配查询】 

  (a)基本类型(非字符串),精确匹配

  (b)字符串,全文检索

  (c)字符换,多个单词(分词+全文检索)

(4)match_phrase【短语匹配】

  文本字段的匹配,使用keyword,匹配的条件就是要显示字段的全部值,要进行精确匹配的。

  match_phrase是做短语匹配,只要文本中包含匹配条件,就能匹配到。

(5)multi_match【多字段匹配】

(6)bool【复合查询】

  复合语句可以合并 任何 其它查询语句,包括复合语句,了解这一点是很重要的。这就意味着,复合语句之间可以互相嵌套,
可以表达非常复杂的逻辑。
 (a)must:必须达到 must 列举的所有条件
  实例:查询gender=m,并且address=mill的数据

 (b)should:应该达到 should 列举的条件,如果达到会增加相关文档的评分,并不会改变查询的结果。如果 query 中只有 should 且只有一种匹配规则,那么 should 的条件就会被作为默认匹配条件而去改变查询结果

 (c)must_not 必须不是指定的情况

(7)filter【结果过滤】

(8)term

  和 match 一样。匹配某个属性的值。全文检索字段用 match,其他非 text 字段匹配用 term。

  不要使用term来进行文本字段查询

  es默认存储text值时用分词分析,所以要搜索text值,使用match

  字段.keyword:要一一匹配到

  match_phrase:子串包含即可

(9)aggregations(执行聚合)

  聚合提供了从数据中分组和提取数据的能力。最简单的聚合方法大致等于 SQL GROUP BY 和 SQL 聚合函数。在 Elasticsearch 中,您有执行搜索返回 hits(命中结果),
并且同时返回聚合结果,把一个响应中的所有 hits(命中结果)分隔开的能力。这是非常强大且有效的,您可以执行查询和多个聚合,并且在一次使用中得到各自的(任何一个的)返回结果,使用
一次简洁和简化的 API 来避免网络往返。
 (a)搜索 address 中包含 mill 的所有人的年龄分布以及平均年龄,但不显示这些人的详情

运行结果:

{
  "took" : 274,
  "timed_out" : false,
  "_shards" : {
    "total" : 1,
    "successful" : 1,
    "skipped" : 0,
    "failed" : 0
  },
  "hits" : {
    "total" : {
      "value" : 4,
      "relation" : "eq"
    },
    "max_score" : null,
    "hits" : [ ]
  },
  "aggregations" : {
    "ageAvg" : {
      "value" : 34.0
    },
    "group_by_state" : {
      "doc_count_error_upper_bound" : 0,
      "sum_other_doc_count" : 0,
      "buckets" : [
        {
          "key" : 38,
          "doc_count" : 2
        },
        {
          "key" : 28,
          "doc_count" : 1
        },
        {
          "key" : 32,
          "doc_count" : 1
        }
      ]
    }
  }
}

 (b)复杂:按照年龄聚合,并且请求这些年龄段的这些人的平均薪资

 (c)复杂:查出所有年龄分布,并且这些年龄段中 M 的平均薪资和 F 的平均薪资以及这个年龄段的总体平均薪资

复制代码
复制代码
3、Mappig
  映射定义文档如何被存储检索的
1)字段类型

(2) 映射

 (a)查看 mapping 信息

    GET bank/_mapping

 (b)修改 mapping 信息
    https://www.elastic.co/guide/en/elasticsearch/reference/current/mapping.html

(3)新版本改变

1、创建映射

  输出:

{
  "acknowledged" : true,
  "shards_acknowledged" : true,
  "index" : "my_index"
}

  查看映射 GET /my_index

{
  "my_index" : {
    "aliases" : { },
    "mappings" : {
      "properties" : {
        "age" : {
          "type" : "integer"
        },
        "email" : {
          "type" : "keyword"
        },
        "name" : {
          "type" : "text"
        }
      }
    },
    "settings" : {
      "index" : {
        "creation_date" : "1639723434557",
        "number_of_shards" : "1",
        "number_of_replicas" : "1",
        "uuid" : "O71P-B5WRa6z9YkMLt3vMg",
        "version" : {
          "created" : "7040299"
        },
        "provided_name" : "my_index"
      }
    }
  }
}

2、添加新的字段映射

  输出:

{
  "acknowledged" : true
}

  注意:这里的 “index”: false,表明新增的字段不能被检索,只是一个冗余字段。

3、更新映射

  对于已经存在的映射字段,我们不能更新。更新必须创建新的索引进行数据迁移

4、数据迁移

  先创建出 new_twitter 的正确映射。然后使用如下方式进行数据迁移 

复制代码
复制代码
4、分词
  一个 tokenizer(分词器)接收一个字符流,将之分割为独立的 tokens(词元,通常是独立的单词),然后输出 tokens 流。
  例如,whitespace tokenizer 遇到空白字符时分割文本。它会将文本 "Quick brown fox!" 分割为 [Quick, brown, fox!]。
  该 tokenizer(分词器)还负责记录各个 term(词条)的顺序或 position 位置(用于 phrase 短语和 word proximity 词近邻查询),
以及 term(词条)所代表的原始 word(单词)的 start(起始)和 end(结束)的 character offsets(字符偏移量)(用于高亮显示搜索的内容)。
  Elasticsearch 提供了很多内置的分词器,可以用来构建 custom analyzers(自定义分词器)。
 
复制代码
复制代码
(1)安装 ik 分词器
  注意:不能用默认 elasticsearch-plugin install xxx.zip 进行自动安装 
  https://github.com/medcl/elasticsearch-analysis-ik/releases?after=v6.4.2 对应 es 版本安装
 
 (a)进入es容器内部 plugins 目录
    docker exec -it 容器 id /bin/bash
    注意:由于我们在安装 es 的时候,已经将容器内部的 plugins 目录映射到 /mydata/elasticsearch/plugins/,所以我们直接切换到映射目录即可

  (b)下载 ik 分词器

    wget https://github.com/medcl/elasticsearch-analysis-ik/releases/download/v7.4.2/elasticsearch-analysis-ik-7.4.2.zip

  注意:

  (1)这一步可以使用wget命令下载,也可以直接下载下来使用xshell命令上传

  (2)关于wget和unzip命令找不到的问题,查看 谷粒商城分布式基础(二)—— 环境搭建(虚拟机 & JDK & Maven & docker & mysql & redis & vue) 中 “1、安装 Linux 虚拟机” 的 第13点

  我们还可以进入es的容器中,可以看到映射对应的文件目录里面也可以看到

  (c)解压下载文件

  最后,记得删除掉原下载的压缩包

  (d)设置权限

  (e)确认是否安装好了分词器

  (f)重启 es

复制代码
复制代码
2)测试分词器

{
  "tokens" : [
    {
      "token" : "我",
      "start_offset" : 0,
      "end_offset" : 1,
      "type" : "<IDEOGRAPHIC>",
      "position" : 0
    },
    {
      "token" : "是",
      "start_offset" : 1,
      "end_offset" : 2,
      "type" : "<IDEOGRAPHIC>",
      "position" : 1
    },
    {
      "token" : "中",
      "start_offset" : 2,
      "end_offset" : 3,
      "type" : "<IDEOGRAPHIC>",
      "position" : 2
    },
    {
      "token" : "国",
      "start_offset" : 3,
      "end_offset" : 4,
      "type" : "<IDEOGRAPHIC>",
      "position" : 3
    },
    {
      "token" : "人",
      "start_offset" : 4,
      "end_offset" : 5,
      "type" : "<IDEOGRAPHIC>",
      "position" : 4
    }
  ]
}

{
  "tokens" : [
    {
      "token" : "我",
      "start_offset" : 0,
      "end_offset" : 1,
      "type" : "CN_CHAR",
      "position" : 0
    },
    {
      "token" : "是",
      "start_offset" : 1,
      "end_offset" : 2,
      "type" : "CN_CHAR",
      "position" : 1
    },
    {
      "token" : "中国人",
      "start_offset" : 2,
      "end_offset" : 5,
      "type" : "CN_WORD",
      "position" : 2
    }
  ]
}

{
  "tokens" : [
    {
      "token" : "我",
      "start_offset" : 0,
      "end_offset" : 1,
      "type" : "CN_CHAR",
      "position" : 0
    },
    {
      "token" : "是",
      "start_offset" : 1,
      "end_offset" : 2,
      "type" : "CN_CHAR",
      "position" : 1
    },
    {
      "token" : "中国人",
      "start_offset" : 2,
      "end_offset" : 5,
      "type" : "CN_WORD",
      "position" : 2
    },
    {
      "token" : "中国",
      "start_offset" : 2,
      "end_offset" : 4,
      "type" : "CN_WORD",
      "position" : 3
    },
    {
      "token" : "国人",
      "start_offset" : 3,
      "end_offset" : 5,
      "type" : "CN_WORD",
      "position" : 4
    }
  ]
}

  能够看出不同的分词器,分词有明显的区别,所以以后定义一个索引不能再使用默认的 mapping 了,要手工建立 mapping, 因为要选择分词器。
复制代码
复制代码
3)自定义词库
  注意:请先按照 谷粒商城分布式高级(一)—— 环境搭建(高级篇补充)(ElasticSearch) 中的 “2、Nginx” 安装好nginx
   (a)修改/mydata/elasticsearch/plugins/ik/config/中的 IKAnalyzer.cfg.xml
    原来的xml

    修改后的xml

  (b)然后重启 es 服务,重启 nginx

    查看是否启动成功:

      http://192.168.56.10:9200/

      http://192.168.56.10:5601/app/kibana

  (c)在 kibana 中测试分词效果

复制代码

6、Elasticsearch-Rest-Client

复制代码

最终选择 Elasticsearch-Rest-Client(elasticsearch-rest-high-level-client)

https://www.elastic.co/guide/en/elasticsearch/client/java-rest/current/java-rest-high.html
复制代码
复制代码
1、SpringBoot 整合 ElasticSearch
  创建检索服务 gulimall-search
(1)New Module,选择Spring Initiializr

  选择 Web——Spring Web

(2)我们修改一下pom.xml,版本环境保持一致 

(3)引入公共依赖 gulimall-common

  gulimll-common中已经引入了服务的注册发现和配置中心的依赖包

<dependency>
<groupId>com.atguigu.gulimall</groupId>
<artifactId>gulimall-common</artifactId>
<version>0.0.1-SNAPSHOT</version>
<exclusions>
<exclusion>
<groupId>com.baomidou</groupId>
<artifactId>mybatis-plus-boot-starter</artifactId>
</exclusion>
</exclusions>
</dependency>

(4)导入 es 的依赖

<dependency>
<groupId>org.elasticsearch.client</groupId>
<artifactId>elasticsearch-rest-high-level-client</artifactId>
<version>7.4.2</version>
</dependency>

在spring-boot-dependencies中所依赖的ES版本位6.8.5,要改掉
<properties>
<java.version>1.8</java.version>
<spring-cloud.version>Greenwich.SR3</spring-cloud.version>
<elasticsearch.version>7.4.2</elasticsearch.version>
</properties>

(5)配置注册中心

  (a)修改application.yml

  (b)启用服务的注册/发现

(6)添加 es 配置类

package com.atguigu.gulimall.search.config;


import org.apache.http.HttpHost;
import org.elasticsearch.client.RestClient;
import org.elasticsearch.client.RestHighLevelClient;
import org.springframework.boot.SpringBootConfiguration;
import org.springframework.context.annotation.Bean;
import org.springframework.context.annotation.Configuration;

/**
* 1、导入依赖
* 2、编写配置,给容器中注入一个RestHighLevelClient
* 3、参考API
*/
@Configuration
public class GulimallElasticSearchConfig {

@Bean
public RestHighLevelClient esRestClient(){
RestHighLevelClient client = new RestHighLevelClient(
RestClient.builder(
new HttpHost("192.168.56.10", 9200, "http")));
return client;
}

}

(7)单元测试获取 client对象

package com.atguigu.gulimall.search;

import org.elasticsearch.client.RestHighLevelClient;
import org.junit.Test;
import org.junit.runner.RunWith;
import org.springframework.beans.factory.annotation.Autowired;
import org.springframework.boot.test.context.SpringBootTest;
import org.springframework.test.context.junit4.SpringRunner;

import javax.annotation.Resource;

@RunWith(SpringRunner.class)
@SpringBootTest
public class GulimallSearchApplicationTests {

@Resource
private RestHighLevelClient client;

@Test
public void contextLoads() {
System.out.println("客户端"+client);
}

}

运行结果:
复制代码
复制代码
2、测试
  请求测试项,比如es添加了安全访问规则,访问es需要添加一个安全头,就可以通过requestOptions设置
  官方建议把requestOptions创建成单实例
package com.atguigu.gulimall.search.config;


import org.apache.http.HttpHost;
import org.elasticsearch.client.RequestOptions;
import org.elasticsearch.client.RestClient;
import org.elasticsearch.client.RestHighLevelClient;
import org.springframework.boot.SpringBootConfiguration;
import org.springframework.context.annotation.Bean;
import org.springframework.context.annotation.Configuration;

/**
* 1、导入依赖
* 2、编写配置,给容器中注入一个RestHighLevelClient
* 3、参考API
*/
@Configuration
public class GulimallElasticSearchConfig {

public static final RequestOptions COMMON_OPTIONS;
static {
RequestOptions.Builder builder = RequestOptions.DEFAULT.toBuilder();
COMMON_OPTIONS = builder.build();
}

@Bean
public RestHighLevelClient esRestClient(){
RestHighLevelClient client = new RestHighLevelClient(
RestClient.builder(
new HttpHost("192.168.56.10", 9200, "http")));
return client;
}

}
(1)测试保存数据
  保存方式分为同步和异步,异步方式多了个listener回调
/**
* 测试存储数据到es
*/
@Test
public void indexData() throws IOException {
//设置索引
IndexRequest indexRequest = new IndexRequest("users");
indexRequest.id("1");//数据的id

User user = new User();
user.setUserName("张三");
user.setAge(20);
user.setGender("男");
String jsonString = JSON.toJSONString(user);

//设置要保存的内容,指定数据和类型
indexRequest.source(jsonString, XContentType.JSON);

//执行创建索引和保存数据
IndexResponse index = client.index(indexRequest, GulimallElasticSearchConfig.COMMON_OPTIONS);

System.out.println(index);
}


@Data
class User{
private String userName;
private String gender;
private Integer age;
}
 运行结果:

  查看kibana

(2) 测试获取数据

  https://www.elastic.co/guide/en/elasticsearch/client/java-rest/current/java-rest-high-search.html

@Test
public void find() throws IOException{
//1 创建检索请求
SearchRequest searchRequest = new SearchRequest();
searchRequest.indices("bank");
SearchSourceBuilder sourceBuilder = new SearchSourceBuilder();
// 构造检索条件
//sourceBuilder.query();
//sourceBuilder.from();
//sourceBuilder.size();
//sourceBuilder.aggregation();
sourceBuilder.query(QueryBuilders.matchQuery("address","mill"));
System.out.println(sourceBuilder.toString());

searchRequest.source(sourceBuilder);

// 2 执行检索
SearchResponse response = client.search(searchRequest, GulimallElasticSearchConfig.COMMON_OPTIONS);
// 3 分析响应结果
System.out.println(response.toString());
}

查询结果:

{"query":{"match":{"address":{"query":"mill","operator":"OR","prefix_length":0,"max_expansions":50,"fuzzy_transpositions":true,"lenient":false,"zero_terms_query":"NONE","auto_generate_synonyms_phrase_query":true,"boost":1.0}}}}
{"took":11,"timed_out":false,"_shards":{"total":1,"successful":1,"skipped":0,"failed":0},"hits":{"total":{"value":4,"relation":"eq"},"max_score":5.4032025,"hits":[{"_index":"bank","_type":"account","_id":"970","_score":5.4032025,"_source":{"account_number":970,"balance":19648,"firstname":"Forbes","lastname":"Wallace","age":28,"gender":"M","address":"990 Mill Road","employer":"Pheast","email":"forbeswallace@pheast.com","city":"Lopezo","state":"AK"}},{"_index":"bank","_type":"account","_id":"136","_score":5.4032025,"_source":{"account_number":136,"balance":45801,"firstname":"Winnie","lastname":"Holland","age":38,"gender":"M","address":"198 Mill Lane","employer":"Neteria","email":"winnieholland@neteria.com","city":"Urie","state":"IL"}},{"_index":"bank","_type":"account","_id":"345","_score":5.4032025,"_source":{"account_number":345,"balance":9812,"firstname":"Parker","lastname":"Hines","age":38,"gender":"M","address":"715 Mill Avenue","employer":"Baluba","email":"parkerhines@baluba.com","city":"Blackgum","state":"KY"}},{"_index":"bank","_type":"account","_id":"472","_score":5.4032025,"_source":{"account_number":472,"balance":25571,"firstname":"Lee","lastname":"Long","age":32,"gender":"F","address":"288 Mill Street","employer":"Comverges","email":"leelong@comverges.com","city":"Movico","state":"MT"}}]}}

 

@Test
public void find() throws IOException{
//1 创建检索请求
SearchRequest searchRequest = new SearchRequest();
searchRequest.indices("bank");
SearchSourceBuilder sourceBuilder = new SearchSourceBuilder();
// 构造检索条件
//sourceBuilder.query();
//sourceBuilder.from();
//sourceBuilder.size();
//sourceBuilder.aggregation();

sourceBuilder.query(QueryBuilders.matchQuery("address","mill"));

//AggregationBuilders工具类构建AggregationBuilder
// 构建第一个聚合条件:按照年龄的值分布
TermsAggregationBuilder agg1 = AggregationBuilders.terms("agg1").field("age").size(10);// 聚合名称
// 参数为AggregationBuilder
sourceBuilder.aggregation(agg1);

// 构建第二个聚合条件:平均薪资
AvgAggregationBuilder agg2 = AggregationBuilders.avg("agg2").field("balance");
sourceBuilder.aggregation(agg2);

System.out.println("检索条件"+sourceBuilder.toString());

searchRequest.source(sourceBuilder);

// 2 执行检索
SearchResponse response = client.search(searchRequest, GulimallElasticSearchConfig.COMMON_OPTIONS);
// 3 分析响应结果
System.out.println(response.toString());
}

 运行结果:

检索条件{"query":{"match":{"address":{"query":"mill","operator":"OR","prefix_length":0,"max_expansions":50,"fuzzy_transpositions":true,"lenient":false,"zero_terms_query":"NONE","auto_generate_synonyms_phrase_query":true,"boost":1.0}}},"aggregations":{"agg1":{"terms":{"field":"age","size":10,"min_doc_count":1,"shard_min_doc_count":0,"show_term_doc_count_error":false,"order":[{"_count":"desc"},{"_key":"asc"}]}},"agg2":{"avg":{"field":"balance"}}}}
{"took":5,"timed_out":false,"_shards":{"total":1,"successful":1,"skipped":0,"failed":0},"hits":{"total":{"value":4,"relation":"eq"},"max_score":5.4032025,"hits":[{"_index":"bank","_type":"account","_id":"970","_score":5.4032025,"_source":{"account_number":970,"balance":19648,"firstname":"Forbes","lastname":"Wallace","age":28,"gender":"M","address":"990 Mill Road","employer":"Pheast","email":"forbeswallace@pheast.com","city":"Lopezo","state":"AK"}},{"_index":"bank","_type":"account","_id":"136","_score":5.4032025,"_source":{"account_number":136,"balance":45801,"firstname":"Winnie","lastname":"Holland","age":38,"gender":"M","address":"198 Mill Lane","employer":"Neteria","email":"winnieholland@neteria.com","city":"Urie","state":"IL"}},{"_index":"bank","_type":"account","_id":"345","_score":5.4032025,"_source":{"account_number":345,"balance":9812,"firstname":"Parker","lastname":"Hines","age":38,"gender":"M","address":"715 Mill Avenue","employer":"Baluba","email":"parkerhines@baluba.com","city":"Blackgum","state":"KY"}},{"_index":"bank","_type":"account","_id":"472","_score":5.4032025,"_source":{"account_number":472,"balance":25571,"firstname":"Lee","lastname":"Long","age":32,"gender":"F","address":"288 Mill Street","employer":"Comverges","email":"leelong@comverges.com","city":"Movico","state":"MT"}}]},"aggregations":{"avg#agg2":{"value":25208.0},"lterms#agg1":{"doc_count_error_upper_bound":0,"sum_other_doc_count":0,"buckets":[{"key":38,"doc_count":2},{"key":28,"doc_count":1},{"key":32,"doc_count":1}]}}}

 

(3)把检索结果封装为java bean 

@Test
public void find() throws IOException{
//1 创建检索请求
SearchRequest searchRequest = new SearchRequest();
searchRequest.indices("bank");
SearchSourceBuilder sourceBuilder = new SearchSourceBuilder();
// 构造检索条件
//sourceBuilder.query();
//sourceBuilder.from();
//sourceBuilder.size();
//sourceBuilder.aggregation();

sourceBuilder.query(QueryBuilders.matchQuery("address","mill"));

//AggregationBuilders工具类构建AggregationBuilder
// 构建第一个聚合条件:按照年龄的值分布
TermsAggregationBuilder agg1 = AggregationBuilders.terms("agg1").field("age").size(10);// 聚合名称
// 参数为AggregationBuilder
sourceBuilder.aggregation(agg1);

// 构建第二个聚合条件:平均薪资
AvgAggregationBuilder agg2 = AggregationBuilders.avg("agg2").field("balance");
sourceBuilder.aggregation(agg2);

//System.out.println("检索条件"+sourceBuilder.toString());

searchRequest.source(sourceBuilder);

// 2 执行检索
SearchResponse response = client.search(searchRequest, GulimallElasticSearchConfig.COMMON_OPTIONS);
// 3 分析响应结果
//System.out.println(response.toString());
//Map map = JSON.parseObject(response.toString(), Map.class);
//3.1 获取所有查到的数据
SearchHits hits = response.getHits();
SearchHit[] searchHits = hits.getHits();
for (SearchHit hit : searchHits){
hit.getId();hit.getType();hit.getIndex();
String sourceAsString = hit.getSourceAsString();
Account account = JSON.parseObject(sourceAsString, Account.class);
System.out.println("account:"+account);
}
}

@Data
static class Account{
private int account_number;
private int balance;
private String firstname;
private String lastname;
private int age;
private String gender;
private String address;
private String employer;
private String email;
private String city;
private String state;
}

运行结果:

account:GulimallSearchApplicationTests.Account(account_number=970, balance=19648, firstname=Forbes, lastname=Wallace, age=28, gender=M, address=990 Mill Road, employer=Pheast, email=forbeswallace@pheast.com, city=Lopezo, state=AK)
account:GulimallSearchApplicationTests.Account(account_number=136, balance=45801, firstname=Winnie, lastname=Holland, age=38, gender=M, address=198 Mill Lane, employer=Neteria, email=winnieholland@neteria.com, city=Urie, state=IL)
account:GulimallSearchApplicationTests.Account(account_number=345, balance=9812, firstname=Parker, lastname=Hines, age=38, gender=M, address=715 Mill Avenue, employer=Baluba, email=parkerhines@baluba.com, city=Blackgum, state=KY)
account:GulimallSearchApplicationTests.Account(account_number=472, balance=25571, firstname=Lee, lastname=Long, age=32, gender=F, address=288 Mill Street, employer=Comverges, email=leelong@comverges.com, city=Movico, state=MT)

复制代码

 

posted @   沧海一粟hr  阅读(258)  评论(0编辑  收藏  举报
编辑推荐:
· 10年+ .NET Coder 心语,封装的思维:从隐藏、稳定开始理解其本质意义
· .NET Core 中如何实现缓存的预热?
· 从 HTTP 原因短语缺失研究 HTTP/2 和 HTTP/3 的设计差异
· AI与.NET技术实操系列:向量存储与相似性搜索在 .NET 中的实现
· 基于Microsoft.Extensions.AI核心库实现RAG应用
阅读排行:
· 阿里巴巴 QwQ-32B真的超越了 DeepSeek R-1吗?
· 【译】Visual Studio 中新的强大生产力特性
· 10年+ .NET Coder 心语 ── 封装的思维:从隐藏、稳定开始理解其本质意义
· 【设计模式】告别冗长if-else语句:使用策略模式优化代码结构
· 字符编码:从基础到乱码解决
点击右上角即可分享
微信分享提示

目录导航