ElasticSerach（三）

1、ES 查询操作

1.1、过滤—先匹配，再过滤

GET movie_index/_search
{
  "query": {
    "match": {
      "name": "red"
    }
  },
  "post_filter": {
    "term": {
      "actorList.id": "3"
    }
  }
}

1.2、过滤—匹配过滤同时

GET movie_index/_search
{
  "query": {
    "bool": {
      "must": [
        {
          "match": {
            "name": "red"
          }
        }
      ],
      "filter": {
        "term": {
          "actorList.id": 3
        }
      }
    }
  }
}

1.3、过滤—按范围

#score >6 <9
GET movie_index/_search
{
  "query": {
    "range": {
      "doubanScore": {
        "gte": 6,
        "lte": 9
      }
    }
  }
}

gt	大于
lt	小于
gte	大于等于 great than or equals
lte	小于等于 less than or equals

1.4、排序

GET movie_index/_search
{
  "sort": [
    {
      "doubanScore": {
        "order": "desc" # 降序
      }
    }
  ]
}

包含指定名称在排序

GET movie_index/_search
{
  "query": {
    "match": {
      "name": "red"
    }
  }, 
  "sort": [
    {
      "doubanScore": {
        "order": "desc"
      }
    }
  ]
}

1.5、分页

#分页
GET movie_index/_search
{
  "from": 0,  #从什么地方开始
  "size": 2   #一页展示几条数据
}

1.6、查询指定列

#查询指定字段
GET movie_index/_search
{
  "_source": "{id,name}"
}

1.7、高亮显示

#高亮显示
GET movie_index/_search
{
  "query": {
    "match": {
      "name": "red"
    }
  },
  "highlight": {
    "fields": {"name": {}}
  }
}

1.8、聚合

聚合提供了对数据进行分组、统计的能力，类似于SQL中Group By和SQL聚合函数。在ElasticSearch中，可以同时返回搜索结果及其聚合计算结果，这是非常强大和高效的

需求1：取出每个演员共参演了多少部电影

#取出每个演员共参演了多少部电影
#terms 聚合操作
GET movie_index/_search
{
  "aggs": {
    "myAggs": {
      "terms": {
        "field": "actorList.name.keyword",
        "size": 10
      }
    }
  }
}

需求2：每个演员参演电影的平均分是多少，并按评分排序

GET movie_index/_search
{
  "aggs": {
    "groupByname": {
      "terms": {
        "field": "actorList.name.keyword",
        "size": 10,
        "order": {
          "avgScore": "asc"
        }
      },
      "aggs": {
        "avgScore": {
          "avg": {
            "field": "doubanScore" 
          }
        }
      }
    }
  }
}

结果

{
  "took" : 38,
  "timed_out" : false,
  "_shards" : {
    "total" : 5,
    "successful" : 5,
    "skipped" : 0,
    "failed" : 0
  },
  "hits" : {
    "total" : 3,
    "max_score" : 1.0,
    "hits" : [
      {
        "_index" : "movie_index",
        "_type" : "movie",
        "_id" : "2",
        "_score" : 1.0,
        "_source" : {
          "id" : 101,
          "name" : "peration meigong river",
          "doubanScore" : 8.1,
          "actorList" : [
            {
              "id" : 1,
              "name" : "zhang han yu"
            }
          ]
        }
      },
      {
        "_index" : "movie_index",
        "_type" : "movie",
        "_id" : "1",
        "_score" : 1.0,
        "_source" : {
          "id" : 100,
          "name" : "operation red sea",
          "doubanScore" : 8.5,
          "actorlist" : [
            {
              "id" : 1,
              "name" : "zhang yi"
            },
            {
              "id" : 2,
              "name" : "hai qing"
            },
            {
              "id" : 3,
              "name" : "zhang han yu"
            }
          ]
        }
      },
      {
        "_index" : "movie_index",
        "_type" : "movie",
        "_id" : "3",
        "_score" : 1.0,
        "_source" : {
          "id" : 300,
          "name" : "incident red sea",
          "doubanScore" : 5.0,
          "actorList" : [
            {
              "id" : 4,
              "name" : "zhang san feng"
            }
          ]
        }
      }
    ]
  },
  "aggregations" : {
    "groupByname" : {
      "doc_count_error_upper_bound" : 0,
      "sum_other_doc_count" : 0,
      "buckets" : [
        {
          "key" : "zhang san feng",
          "doc_count" : 1,
          "avgScore" : {
            "value" : 5.0
          }
        },
        {
          "key" : "zhang han yu",
          "doc_count" : 1,
          "avgScore" : {
            "value" : 8.100000381469727
          }
        }
      ]
    }
  }
}

View Code

2、分词

2.1、查看默认分词

#默认分词策略
GET _analyze
{
  "text":  "hello world"
}

结果

{
  "tokens" : [
    {
      "token" : "hello",
      "start_offset" : 0,
      "end_offset" : 5,
      "type" : "<ALPHANUM>",
      "position" : 0
    },
    {
      "token" : "world",
      "start_offset" : 6,
      "end_offset" : 11,
      "type" : "<ALPHANUM>",
      "position" : 1
    }
  ]
}

View Code

2.2、中文分词

#默认分词策略
GET _analyze
{
  "text":  "我是中国人"

结果

{
  "tokens" : [
    {
      "token" : "我",
      "start_offset" : 0,
      "end_offset" : 1,
      "type" : "<IDEOGRAPHIC>",
      "position" : 0
    },
    {
      "token" : "是",
      "start_offset" : 1,
      "end_offset" : 2,
      "type" : "<IDEOGRAPHIC>",
      "position" : 1
    },
    {
      "token" : "中",
      "start_offset" : 2,
      "end_offset" : 3,
      "type" : "<IDEOGRAPHIC>",
      "position" : 2
    },
    {
      "token" : "国",
      "start_offset" : 3,
      "end_offset" : 4,
      "type" : "<IDEOGRAPHIC>",
      "position" : 3
    },
    {
      "token" : "人",
      "start_offset" : 4,
      "end_offset" : 5,
      "type" : "<IDEOGRAPHIC>",
      "position" : 4
    }
  ]
}

View Code

2.3、中文分词器

通过上面的查询，我们可以看到ES本身自带的中文分词，就是单纯把中文一个字一个字的分开，根本没有词汇的概念。但是实际应用中，用户都是以词汇为条件，进行查询匹配的，如果能够把文章以词汇为单位切分开，那么与用户的查询条件能够更贴切的匹配上，查询速度也更加快速。常见的一些开源分词器对比，我们使用IK分词器

分词器	优势	劣势
Smart Chinese Analysis	官方插件	中文分词效果惨不忍睹
IKAnalyzer	简单易用，支持自定义词典和远程词典	词库需要自行维护，不支持词性识别
结巴分词	新词识别功能	不支持词性识别
Ansj中文分词	分词精准度不错，支持词性识别	对标hanlp词库略少，学习成本高
Hanlp	目前词库最完善，支持的特性非常多	需要更优的分词效果，学习成本高

2.4、IK 分词器

IK 分词器下载地址 https://github.com/medcl/elasticsearch-analysis-ik

[hui@hadoop201 plugins]$ pwd
/opt/module/elasticsearch/plugins
[hui@hadoop201 plugins]$ mkdir ik

解压插件到制定目录

[hui@hadoop201 software]$ unzip elasticsearch-analysis-ik-6.6.0.zip -d /opt/module/elasticsearch/plugins/ik
/opt/module/elasticsearch/plugins/ik
[hui@hadoop201 ik]$ ll
total 1432
-rw-r--r--. 1 hui hui 263965 May  6  2018 commons-codec-1.9.jar
-rw-r--r--. 1 hui hui  61829 May  6  2018 commons-logging-1.2.jar
drwxr-xr-x. 2 hui hui   4096 Aug 26  2018 config
-rw-r--r--. 1 hui hui  54693 Jan 30  2019 elasticsearch-analysis-ik-6.6.0.jar
-rw-r--r--. 1 hui hui 736658 May  6  2018 httpclient-4.5.2.jar
-rw-r--r--. 1 hui hui 326724 May  6  2018 httpcore-4.4.4.jar
-rw-r--r--. 1 hui hui   1805 Jan 30  2019 plugin-descriptor.properties
-rw-r--r--. 1 hui hui    125 Jan 30  2019 plugin-security.policy

分发ik 分词器

hui@hadoop201 plugins]$ sxync.sh ik/

分发后记得重启es集群

#ik 分词简单版本
GET /_analyze
{
  "text":  "我是中国人",
  "analyzer": "ik_smart"
}
结果
{
  "tokens" : [
    {
      "token" : "我",
      "start_offset" : 0,
      "end_offset" : 1,
      "type" : "CN_CHAR",
      "position" : 0
    },
    {
      "token" : "是",
      "start_offset" : 1,
      "end_offset" : 2,
      "type" : "CN_CHAR",
      "position" : 1
    },
    {
      "token" : "中国人",
      "start_offset" : 2,
      "end_offset" : 5,
      "type" : "CN_WORD",
      "position" : 2
    }
  ]
}

#ik 分词细致版本
GET /_analyze
{
  "text":  "我是中国人",
  "analyzer": "ik_max_word"
}
{
  "tokens" : [
    {
      "token" : "我",
      "start_offset" : 0,
      "end_offset" : 1,
      "type" : "CN_CHAR",
      "position" : 0
    },
    {
      "token" : "是",
      "start_offset" : 1,
      "end_offset" : 2,
      "type" : "CN_CHAR",
      "position" : 1
    },
    {
      "token" : "中国人",
      "start_offset" : 2,
      "end_offset" : 5,
      "type" : "CN_WORD",
      "position" : 2
    },
    {
      "token" : "中国",
      "start_offset" : 2,
      "end_offset" : 4,
      "type" : "CN_WORD",
      "position" : 3
    },
    {
      "token" : "国人",
      "start_offset" : 3,
      "end_offset" : 5,
      "type" : "CN_WORD",
      "position" : 4
    }
  ]
}

2.5、自定义IK 分词器

有的时候，词库提供的词并不包含项目中使用到的一些专业术语或者新兴网络用语，需要我们对词库进行补充。具体步骤编辑自定义分词数据

[hui@hadoop201 config]$ pwd
/opt/module/elasticsearch/plugins/ik/config
[hui@hadoop201 config]$ less myword.txt 
蓝瘦香菇
蓝廋
香菇
瘦香

编写配置文件

[hui@hadoop201 config]$ vim IKAnalyzer.cfg.xml 
<?xml version="1.0" encoding="UTF-8"?>
<!DOCTYPE properties SYSTEM "http://java.sun.com/dtd/properties.dtd">
<properties>
        <comment>IK Analyzer 扩展配置</comment>
        <!--用户可以在这里配置自己的扩展字典 -->
        <entry key="ext_dict">./myword.txt</entry>
         <!--用户可以在这里配置自己的扩展停止词字典-->
        <entry key="ext_stopwords"></entry>
        <!--用户可以在这里配置远程扩展字典 -->
        <!-- <entry key="remote_ext_dict">words_location</entry> -->
        <!--用户可以在这里配置远程扩展停止词字典-->
        <!-- <entry key="remote_ext_stopwords">words_location</entry> -->
</properties>

分发配置

[hui@hadoop201 config]$ sxync.sh myword.txt 
[hui@hadoop201 config]$ sxync.sh IKAnalyzer.cfg.xml

重启后测试

#ik 分词细致版本
GET /_analyze
{
  "text":  "蓝瘦香菇",
  "analyzer": "ik_max_word"
}

结果

{
  "tokens" : [
    {
      "token" : "蓝瘦香菇",
      "start_offset" : 0,
      "end_offset" : 4,
      "type" : "CN_WORD",
      "position" : 0
    },
    {
      "token" : "瘦香",
      "start_offset" : 1,
      "end_offset" : 3,
      "type" : "CN_WORD",
      "position" : 1
    },
    {
      "token" : "香菇",
      "start_offset" : 2,
      "end_offset" : 4,
      "type" : "CN_WORD",
      "position" : 2
    }
  ]
}

View Code

3、mapping

每个Type中的字段是什么数据类型，由mapping定义，如果我们在创建Index的时候，没有设定mapping，系统会自动根据一条数据的格式来推断出该数据对应的字段类型，具体推断类型如下：

true/false →boolean
1020 →long
20.1 →float
“2018-02-01” →date
“hello world” →text +keyword

默认只有text会进行分词，keyword是不会分词的字符串。mapping除了自动定义，还可以手动定义，但是只能对新加的、没有数据的字段进行定义，一旦有了数据就无法再做修改了。

直接创建文档，没有索引时可以自行创建：

PUT /movie_chn_1/movie/1
{     "id":1,
    "name":"红海行动",
    "doubanScore":8.5,
    "actorList":[   
    {"id":1,"name":"张译"},
    {"id":2,"name":"海清"},
    {"id":3,"name":"张涵予"}
    ]
}
PUT /movie_chn_1/movie/2
{    "id":2,
    "name":"湄公河行动",
    "doubanScore":8.0,
    "actorList":[   
    {"id":3,
    "name":"张涵予"}
    ]
}
PUT /movie_chn_1/movie/3
{    "id":3,
    "name":"红海事件",
    "doubanScore":5.0,
    "actorList":[   
    { "id":4,
    "name":"张三丰"
    }
    ]
}

查询测试

GET /movie_chn_1/_search
GET /movie_chn_1
GET /movie_chn_1/_mapping
#分词查询
GET /movie_chn_1/_search
{
  "query": {
    "match": {
      "name": "海行"
    }
  }
}

自定义 mapping
PUT movie_chn_2
{
    "mappings": {
        "movie":{
            "properties": {
                "id":{
                    "type": "long"
                    },
                "name":{"type": "text", 
                "analyzer": "ik_smart"
                },
                "doubanScore":{
                "type": "double"
                },
                "actorList":{
                    "properties": {
                        "id":{
                            "type":"long"
                            },
                        "name":{
                            "type":"keyword"
                        }
                    }
                }
            }
        }
    }
}

#向自定义的 mapping 放数据
PUT /movie_chn_2/movie/1
{     "id":1,
    "name":"红海行动",
    "doubanScore":8.5,
    "actorList":[   
    {"id":1,"name":"张译"},
    {"id":2,"name":"海清"},
    {"id":3,"name":"张涵予"}
    ]
}
PUT /movie_chn_2/movie/2
{    "id":2,
    "name":"湄公河行动",
    "doubanScore":8.0,
    "actorList":[   
    {"id":3,
    "name":"张涵予"}
    ]
}
PUT /movie_chn_2/movie/3
{    "id":3,
    "name":"红海事件",
    "doubanScore":5.0,
    "actorList":[   
    { "id":4,
    "name":"张三丰"
    }
    ]
}
#创建索引的时候，手动mapping，并指定别名
别名
"aliases": {
        "movie_chn_3_aliase": {}
    },

PUT movie_chn_3
{
    "aliases": {
        "movie_chn_3_aliase": {}
    },
    "mappings": {
        "movie":{
            "properties": {
                "id":{
                    "type": "long"
                },
                "name":{
                    "type": "text", 
                    "analyzer": "ik_smart"
                },
                "doubanScore":{
                    "type": "double"
                },
                "actorList":{
                    "properties": {
                        "id":{
                            "type":"long"
                        },
                        "name":{
                            "type":"keyword"
                        }
                    }
                }
            }
        }
    }
}

4、别名

#查看别名
GET /_cat/aliases
#给索引添加另一个别名 
POST /_aliases
{
  "actions": [
    {
      "add": {
        "index": "movie_chn_3",
        "alias": "movie_chn_3_wdh01"
      }
    }
  ]
}

#查看别名
GET /_cat/aliases
#使用别名和索引一样
GET /movie_chn_3_wdh01/_search
GET /movie_chn_3_aliase/_search
#删除索引
#删除索引
POST /_aliases
{
  "actions": [
    {
       "remove": {"index": "movie_chn_3","alias": "movie_chn_3_aliase"}
    }
  ]
}
# 给索引的子集创建索引
POST _aliases
{
  "actions": [
    {
      "add": {
        "index": "movie_chn_1",
        "alias": "movie_chn_1_sub_query",
        "filter": {
          "term": {
            "actorList.id": "4"
          }
        }
      }
    }
  ]
}
#
GET /movie_chn_1_sub_query/_search

5、模板

#模板
PUT _template/template_movie2020
{
    "index_patterns": ["movie_test*"],
    "settings": {
        "number_of_shards": 1
    },
    "aliases" : { 
        "{index}-query": {},
        "movie_test-query":{}
    },
    "mappings": {  
        "_doc": {
            "properties": {
                "id": {
                    "type": "keyword"
                },
                "movie_name": {
                    "type": "text",
                    "analyzer": "ik_smart"
                }
            }
        }
    }
}

#使用模板创建索引
POST movie_test_220101/_doc
{
  "id":"0101",
  "name":"令狐冲"
}

GET /movie_test_220101/_mapping
GET /movie_test_220101-query/_mapping

#查看模板清单
GET /_cat/templates

索引模板
-----REDIS 
PUT _template/gmall2020_dau_info_template
{
    "index_patterns": ["gmall2020_dau_info*"],
    "settings": {
        "number_of_shards": 3
    },
    "aliases" : { 
        "{index}-query": {},
        "gmall2020_dau_info-query":{}
    },
    "mappings": {
        "_doc":{   
            "properties":{
                "mid":{
                    "type":"keyword"
                },
                "uid":{
                    "type":"keyword"
                },
                "ar":{
                    "type":"keyword"
                },
                "ch":{
                    "type":"keyword"
                },
                "vc":{
                    "type":"keyword"
                },
                "dt":{
                    "type":"keyword"
                },
                "hr":{
                    "type":"keyword"
                },
                "mi":{
                    "type":"keyword"
                },
                "ts":{
                    "type":"date"
                }   
            }
        }
    }
}

posted @ 2022-04-30 08:27 晓枫的春天阅读(24) 评论(0) 编辑收藏举报

会员力量，点亮园子希望

刷新页面返回顶部

滴水穿石不是靠力，而是因为不舍昼夜。

ElasticSerach（三）

1、ES 查询操作

1.1、过滤—先匹配，再过滤

1.2、过滤—匹配过滤同时

1.3、过滤—按范围

1.4、排序

1.5、分页

1.6、查询指定列

1.7、高亮显示

1.8、聚合

2、分词

2.1、查看默认分词

2.2、中文分词

2.3、中文分词器

2.4、IK 分词器

2.5、自定义IK 分词器

3、mapping

4、别名

5、模板

公告