5765809 - 博客园

2024年10月

66_索引管理_复杂上机实验：基于scoll+bulk+索引别名实现零停机重建索引

摘要：课程大纲 1、重建索引一个field的设置是不能被修改的，如果要修改一个Field，那么应该重新按照新的mapping，建立一个index，然后将数据批量查询出来，重新用bulk api写入index中批量查询的时候，建议采用scroll api，并且采用多线程并发的方式来reindex数据，每阅读全文

posted @ 2024-10-02 13:09 5765809 阅读(43) 评论(0) 推荐(0)

65_索引管理_定制化自己的dynamic mapping策略

摘要：课程大纲 1、定制dynamic策略 true：遇到陌生字段，就进行dynamic mapping false：遇到陌生字段，就忽略 strict：遇到陌生字段，就报错 PUT /my_index { "mappings": { "my_type": { "dynamic": "strict", " 阅读全文

posted @ 2024-10-02 13:08 5765809 阅读(30) 评论(0) 推荐(0)

64_索引管理_mapping root object深入剖析

摘要：课程大纲 1、root object 就是某个type对应的mapping json，包括了properties，metadata（_id，_source，_type），settings（analyzer），其他settings（比如include_in_all） PUT /my_index { " 阅读全文

posted @ 2024-10-02 13:08 5765809 阅读(21) 评论(0) 推荐(0)

63_索引管理_内核级知识点：深入探秘type底层数据结构

摘要： type，是一个index中用来区分类似的数据的，类似的数据，但是可能有不同的fields，而且有不同的属性来控制索引建立、分词器 field的value，在底层的lucene中建立索引的时候，全部是opaque bytes类型，不区分类型的 lucene是没有type的概念的，在document中阅读全文

posted @ 2024-10-02 13:07 5765809 阅读(23) 评论(0) 推荐(0)

62_索引管理_快速上机动手实战修改分词器以及定制自己的分词器

摘要： 1、默认的分词器 standard standard tokenizer：以单词边界进行切分 standard token filter：什么都不做 lowercase token filter：将所有字母转换为小写 stop token filer（默认被禁用）：移除停用词，比如a the it等阅读全文

posted @ 2024-10-02 13:06 5765809 阅读(44) 评论(0) 推荐(0)

61_索引管理_快速上机动手实战创建、修改以及删除索引

摘要： 1、为什么我们要手动创建索引？ 2、创建索引创建索引的语法 PUT /my_index { "settings": { ... any settings ... }, "mappings": { "type_one": { ... any mappings ... }, "type_two": { 阅读全文

posted @ 2024-10-02 13:05 5765809 阅读(24) 评论(0) 推荐(0)

60_初识搜索引擎_上机动手实战基于scoll技术滚动搜索大量数据

摘要：如果一次性要查出来比如10万条数据，那么性能会很差，此时一般会采取用scoll滚动查询，一批一批的查，直到所有数据都查询完处理完使用scoll滚动搜索，可以先搜索一批数据，然后下次再搜索一批数据，以此类推，直到搜索出全部的数据来 scoll搜索会在第一次搜索的时候，保存一个当时的视图快照，之后只会阅读全文

posted @ 2024-10-02 13:05 5765809 阅读(144) 评论(0) 推荐(0)

摘要： 1、preference 决定了哪些shard会被用来执行搜索操作 _primary, _primary_first, _local, _only_node:xyz, _prefer_node:xyz, _shards:2,3 bouncing results问题，两个document排序，fiel 阅读全文

posted @ 2024-10-02 13:04 5765809 阅读(33) 评论(0) 推荐(0)

58_初识搜索引擎_分布式搜索引擎内核解密之fetch phase

摘要：课程大纲 1、fetch phbase工作流程（1）coordinate node构建完priority queue之后，就发送mget请求去所有shard上获取对应的document （2）各个shard将document返回给coordinate node （3）coordinate node 阅读全文

posted @ 2024-10-02 13:04 5765809 阅读(18) 评论(0) 推荐(0)

57_初识搜索引擎_分布式搜索引擎内核解密之query phase

摘要： 1、query phase （1）搜索请求发送到某一个coordinate node，构构建一个priority queue，长度以paging操作from和size为准，默认为10 （2）coordinate node将请求转发到所有shard，每个shard本地搜索，并构建一个本地的priori 阅读全文

posted @ 2024-10-02 13:03 5765809 阅读(29) 评论(0) 推荐(0)

公告