Elastic Stack：es 索引index入门

一.索引操作

直接put数据 PUT index/_doc/1,es会自动生成索引，并建立动态映射dynamic mapping。

在生产上，我们需要自己手动建立索引和映射，为了更好地管理索引。就像数据库的建表语句一样。

创建索引语法：

PUT /index
{
    "settings": { ... any settings ... },
    "mappings": {
       "properties" : {
            "field1" : { "type" : "text" }
        }
    },
    "aliases": {
        "default_index": {}
  } 
}

举例：

PUT /my_index
{
  "settings": {
    "number_of_shards": 1,
    "number_of_replicas": 1
  },
  "mappings": {
    "properties": {
      "field1":{
        "type": "text"
      },
      "field2":{
        "type": "text"
      }
    }
  },
  "aliases": {
    "default_index": {}
  } 
}

查询索引：

1 2	`GET /my_index/_mapping` `GET /my_index/_setting`

修改副本数：

PUT /my_index/_settings
{
    "index" : {
        "number_of_replicas" : 2
    }
}

删除索引：

DELETE /my_index

DELETE /my_index*

为了安全起见，防止恶意删除索引，删除时必须指定索引名：在elasticsearch.yml配置：action.destructive_requires_name: true

二.定制分词器

默认分词器：

standard

分词三个组件：

　　character filter：在一段文本进行分词之前，先进行预处理

　　tokenizer：分词

　　token filter：对词标准化，lowercase，stop word，synonymom

standard tokenizer：以单词边界进行切分

standard token filter：什么都不做

lowercase token filter：将所有字母转换为小写

stop token filer（默认被禁用）：移除停用词，比如a the it等等

启用english停用词token filter

PUT /my_index
{
  "settings": {
    "analysis": {
      "analyzer": {
        "es_std": {
          "type": "standard",
          "stopwords": "_english_"
        }
      }
    }
  }
}

定制化自己的分词器

PUT /my_index
{
  "settings": {
    "analysis": {
      "char_filter": {
        "&_to_and": {
          "type": "mapping",
          "mappings": ["&=> and"]
        }
      },
      "filter": {
        "my_stopwords": {
          "type": "stop",
          "stopwords": ["the", "a"]
        }
      },
      "analyzer": {
        "my_analyzer": {
          "type": "custom",<br>　　　　　　#三个组件都可以定制
          "char_filter": ["html_strip", "&_to_and"],
          "tokenizer": "standard",
          "filter": ["lowercase", "my_stopwords"]
        }
      }
    }
  }
}

三.定制dynamic mapping

true：遇到陌生字段，就进行dynamic mapping

false：新检测到的字段将被忽略。这些字段将不会被索引，因此将无法搜索，但仍将出现在返回点击的源字段中。这些字段不会添加到映射中，必须显式添加新字段。

strict：遇到陌生字段，就报错

创建mapping时：

PUT /my_index
{
    "mappings": {
      "dynamic": "strict",
       "properties": {
        "title": {
          "type": "text"
        },
        "address": {
          "type": "object",
          "dynamic": "true"
        }
    }
    }
}

date_detection:日期探测，默认会按照一定格式识别date，yyyy-MM-dd，如果有需要，自己手动指定某个field为date类型

numeric_detection:数字探测，默认不开启

定制自己的dynamic mapping template:

PUT /my_index
{
    "mappings": {
            "dynamic_templates": [
                { 
                  "en": {
                      "match":              "*_en", 
                      "match_mapping_type": "string",
                      "mapping": {
                          "type":           "text",
                          "analyzer":       "english"
                      }
                }                  
            }
        ]
    }
}

posted @ 2020-06-10 10:40 秋风飒飒吹阅读(482) 评论(0) 编辑收藏举报

刷新页面返回顶部

登录后才能查看或发表评论，立即登录或者逛逛博客园首页

公告

昵称：秋风飒飒吹
园龄： 6年3个月
粉丝： 87
关注： 6

+加关注

2025年3月

日

一

二

三

四

五

六

Loading

wenjie's blog

Elastic Stack：es 索引index入门

一.索引操作

二.定制分词器

三.定制dynamic mapping

公告

常用链接

我的标签

积分与排名

随笔档案 (145)

阅读排行榜

评论排行榜

推荐排行榜

最新评论