es分词器

1、默认的分词器

standard

standard tokenizer：以单词边界进行切分
standard token filter：什么都不做
lowercase token filter：将所有字母转换为小写
stop token filer（默认被禁用）：移除停用词，比如a the it等等

2、修改分词器的设置

启用english停用词token filter

PUT /my_index
{
　　"settings": {
　　　　"analysis": {
　　　　　　"analyzer": {
　　　　　　　　"es_std": {
　　　　　　　　　　"type": "standard",
　　　　　　　　　　"stopwords": "_english_"
　　　　　　　　}
　　　　　　}
　　　　}
　　}
}

GET /my_index/_analyze
{
　　"analyzer": "standard",
　　"text": "a dog is in the house"
}

GET /my_index/_analyze
{
　　"analyzer": "es_std",
　　"text":"a dog is in the house"
}

3、定制化自己的分词器

PUT /my_index
{
　　"settings": {
　　　　"analysis": {
　　　　　　"char_filter": {
　　　　　　　　"&_to_and": {
　　　　　　　　　　"type": "mapping",
　　　　　　　　　　"mappings": ["&=> and"]
　　　　　　　　}
　　　　　　},
　　　　　　"filter": {
　　　　　　　　"my_stopwords": {
　　　　　　　　　　　　"type": "stop",
　　　　　　　　　　　　"stopwords": ["the", "a"]
　　　　　　　　}
　　　　　　},
　　　　　　"analyzer": {
　　　　　　　　"my_analyzer": {
　　　　　　　　　　"type": "custom",
　　　　　　　　　　"char_filter": ["html_strip", "&_to_and"],
　　　　　　　　　　"tokenizer": "standard",
　　　　　　　　　　"filter": ["lowercase", "my_stopwords"]
　　　　　　　　}
　　　　　　}
　　　　}
　　}
}

GET /my_index/_analyze
{
　　"text": "tom&jerry are a friend in the house, <a>, HAHA!!",
　　"analyzer": "my_analyzer"
}

PUT /my_index/_mapping/my_type
{
　　"properties": {
　　　　"content": {
　　　　　　"type": "text",
　　　　　　"analyzer": "my_analyzer"
　　　　}
　　}
}

posted @ 2018-03-11 22:02 秦先生的客栈 Views(673) Comments(0) 收藏举报

刷新页面返回顶部

秦先生的客栈

欢迎您来到我的客栈。博客内容仅供技术交流！

es分词器

公告