ElasticSearch自定义分词器

通过mapping中的映射,将&映射成and

PUT /my_index?pretty' -H 'Content-Type: application/json' -d'
{
    "settings": {
        "analysis": {
            "char_filter": {
                "&_to_and": {
                    "type":       "mapping",
                    "mappings": [ "& => and "]
            }},
            "filter": {
                "my_stopwords": {
                    "type":       "stop",
                    "stopwords": [ "the", "a" ]
            }},
            "analyzer": {
                "my_analyzer": {
                    "type":         "custom",
                    "char_filter":  [ "html_strip", "&_to_and" ],
                    "tokenizer":    "standard",
                    "filter":       [ "lowercase", "my_stopwords" ]
            }}
}}}
'

 

对于字符串"a & b" 输出的结果为a and b,感觉怪怪的,当前的应用常见没前还不清楚。先记录下这个功能吧。

GET /my_index/_analyze?analyzer=my_analyzer&pretty' -H 'Content-Type: application/json' -d'
a & b
'

 

另一种,可以通过正则表达是的方式,来匹配字符,如下,重新将com.test.abc分词成了com, test, abc

PUT /my_index?pretty' -H 'Content-Type: application/json' -d'
{
    "settings": {
        "analysis": {
            "char_filter": {
                "dot": {
                    "type":       "pattern_replace",
                    "pattern":     "(\\w+)\\.(?=\\w)",
                    "replacement": "$1 "
                }
            },
            "analyzer": {
                "my_analyzer": {
                    "char_filter":  ["dot"],
                    "tokenizer":    "whitespace"
            }}
}}}
'

 

posted @ 2018-01-10 17:08  woniu4  阅读(417)  评论(0编辑  收藏  举报