ElasticSearch自定义分词器
通过mapping中的映射,将&映射成and
PUT /my_index?pretty' -H 'Content-Type: application/json' -d' { "settings": { "analysis": { "char_filter": { "&_to_and": { "type": "mapping", "mappings": [ "& => and "] }}, "filter": { "my_stopwords": { "type": "stop", "stopwords": [ "the", "a" ] }}, "analyzer": { "my_analyzer": { "type": "custom", "char_filter": [ "html_strip", "&_to_and" ], "tokenizer": "standard", "filter": [ "lowercase", "my_stopwords" ] }} }}} '
对于字符串"a & b" 输出的结果为a and b,感觉怪怪的,当前的应用常见没前还不清楚。先记录下这个功能吧。
GET /my_index/_analyze?analyzer=my_analyzer&pretty' -H 'Content-Type: application/json' -d' a & b '
另一种,可以通过正则表达是的方式,来匹配字符,如下,重新将com.test.abc分词成了com, test, abc
PUT /my_index?pretty' -H 'Content-Type: application/json' -d' { "settings": { "analysis": { "char_filter": { "dot": { "type": "pattern_replace", "pattern": "(\\w+)\\.(?=\\w)", "replacement": "$1 " } }, "analyzer": { "my_analyzer": { "char_filter": ["dot"], "tokenizer": "whitespace" }} }}} '