ElasticSearch使用小结

最近有个业务需求，即全文搜索关键字查询列表，因而转向ES的学习，也学习了大半个月了，做个笔记，总结下自己的学习历程。

独自学习一项新技术，总是难免走不少弯路的，在此推荐下ES的基础教程，对，好好学习官网教程就可以了！

1） Elasticsearch: 权威指南

https://www.elastic.co/guide/cn/elasticsearch/guide/current/index.html

权威中文教程，对于英文不好的同学，读这个教程可以快速入门。

2）ElasticsearchReference 官网英文教程

https://www.elastic.co/guide/en/elasticsearch/reference/current/index.html

真正想吃透ES还是多看英文官网文档，知识点讲的是最全面的。

ES版本选择

说道ES版本选择，初学ES的时候，查到的大部分资料都是针对2.x版本的，本打算也是用2.x版本，但是读到ES5.x版本新特性说明的时候，果断还是选用5.x版本，因为新版ES性能比2.x版本好太多了，而且本来也想使用spring-data-elasticsearch，但是spring-data不支持5.x版本，而且ES发展势头强劲，半个月前使用的5.5.0版本，现在已经更新到5.5.1。

ES客户端请求方式

1）Java API：创建TransportClient，复杂应用推荐使用

2）Java REST Client：创建RestClient

3）http restful api：使用最原始的http请求访问

目前暂时使用的第三种，原因ES业务需求单一，不需要动态创建、删除索引，上手简单，只需要学习es rest语法就可以了。其实后期可以切换到RestClient，它是持久化http链接（使用httpClient还需要一个http连接池），特点如官方所说：

The low-level client’s features include:

minimal dependencies
load balancing across all available nodes
failover in case of node failures and upon specific response codes
failed connection penalization (whether a failed node is retrieddepends on how many consecutive times it failed; the more failed attempts thelonger the client will wait before trying that same node again)
persistent connections
trace logging of requests and responses
optional automatic discovery of cluster nodes

ES常用插件

1）head插件

5.5使用教程：http://www.cnblogs.com/xing901022/p/6030296.html

2）ik中文分析器 – 中文分词必备，可以自定义词典

github地址：https://github.com/medcl/elasticsearch-analysis-ik

插件下载地址：https://github.com/medcl/elasticsearch-analysis-ik/releases 博主更新很及时，5.5.1的已经有啦。

3）pinyin分析器

github地址: https://github.com/medcl/elasticsearch-analysis-pinyin

ik和pinyin同一个作者，elastic中文社区创始人。

4）elasticsearch-analysis-lc-pinyin分析器

这一款插件也很不错，但是没有pinyin声势大。支持全拼、首字母、中文混合搜索。后面拼音全文搜索准备测试下效果，目前分析器使用的还是ik和pinyin。

ES集群

ES集群的配置，权威教程讲的很粗糙，当时还花了好几天返回测试，最终发现还是配置文件参数没有吃透。

Minimum Master Nodes

最小主节点数的设置对集群的稳定是非常重要的。该设置对预防脑裂是有帮助的，即一个集群中存在两个master。

这个配置就是告诉Elasticsearch除非有足够可用的master候选节点，否则就不选举master，只有有足够可用的master候选节点才进行选举。

该设置应该始终被配置为有主节点资格的法定节点数，法定节点数：（主节点资格的节点数/2)+1。例如：

1、如果你有10个符合规则的节点数，法定数就是6.

2、如果你有3个候选master，和100个数据节点，法定数就是2，你只要计算那些有主节点资格的节点数就可以了。

3、如果你有2个符合规则的节点数，法定节点数应该是2，但是这意味着如果一个节点狗带了，你的整个集群就不可以用了。设置成1将保证集群的功能，但是就不能防止脑裂了。基于这样的情况，最好的解决就是至少有3个节点。

小集群或本地测试可以不用区分master node,data node,client node。但生产环境为了保证最大的可伸缩性，官方建议不同的类型节点加以区分，默认情况的elasticsearch既是master node,也是data node。关于节点的知识，可参看转载的《Elasticsearch节点类型》。

我目前使用的集群配置：一个Client节点，3个master/data混合节点。使用RestClient可以省去一个Client节点。

创建索引、类型示例

1、创建索引与配置分析器

{
    "settings":{
        "index":{
            "number_of_shards":3,
            "number_of_replicas":1,
            "analysis":{
                "analyzer":{
                    "ik_analyzer":{
                        "type":"custom",
                        "tokenizer":"ik_smart"
                    },
                    "pinyin_analyzer":{
                        "tokenizer":"my_pinyin"
                    }
                },
                "tokenizer":{
                    "my_pinyin":{
                        "type":"pinyin",
                        "keep_original":true
                    }
                }
            }
        }
    }
}

2、创建type并设置mapping

{
    "ProductTour":{
        "properties":{
            "companyId":{
                "type":"integer"
            },
            "productCode":{
                "type":"keyword"
            },
            "productType":{
                "type":"text",
                "analyzer":"ik_analyzer",
                "fields":{
                    "pinyin":{
                        "type":"text",
                        "analyzer":"pinyin_analyzer"
                    }
                }
            },
            "gType":{
                "type":"keyword"
            },
            "lineType":{
                "type":"keyword"
            },
            "productState":{
                "type":"boolean"
            },
            "auditState":{
                "type":"integer"
            },
            "productMainTitle":{
                "type":"text",
                "analyzer":"ik_analyzer",
                "fields":{
                    "pinyin":{
                        "type":"text",
                        "analyzer":"pinyin_analyzer"
                    }
                }
            },
            "productSubTitle":{
                "type":"text",
                "analyzer":"ik_analyzer",
                "fields":{
                    "pinyin":{
                        "type":"text",
                        "analyzer":"pinyin_analyzer"
                    }
                }
            },
            "supplyProductName":{
                "type":"keyword"
            },
            "productMainPic":{
                "type":"keyword"
            },
            "productPic":{
                "type":"keyword"
            },
            "dpt":{
                "type":"keyword"
            },
            "arr":{
                "type":"text",
                "analyzer":"ik_analyzer",
                "fields":{
                    "pinyin":{
                        "type":"text",
                        "analyzer":"pinyin_analyzer"
                    }
                }
            },
            "productFeatures":{
                "type":"text",
                "analyzer":"ik_analyzer",
                "fields":{
                    "pinyin":{
                        "type":"text",
                        "analyzer":"pinyin_analyzer"
                    }
                }
            },
            "tripDay":{
                "type":"integer"
            },
            "tripNight":{
                "type":"integer"
            },
            "advanceDays":{
                "type":"integer"
            },
            "auditResult":{
                "type":"keyword"
            },
            "createTime":{
                "type":"date"
            }
        }
    }
}

3、全文检索

{
    "from":0,
    "size":10,  // 分页查询
    "query":{
        "bool":{
            "must":[
                {
                    "multi_match":{   // 全文搜索
                        "query":"1日",  // 关键词
                        "fields":[   // 全文搜索字段
                            "productType",
                            "productMainTitle",
                            "productSubTitle",
                            "arr",
                            "productFeatures"
                        ]
                    }
                }
            ],
            "filter":[   // 筛选条件
                {
                    "term":{
                        "productType":"themt"
                    }
                }
            ]
        }
    }
}

如果关键字为字母混合汉字，全文搜索字段换成：

"fields": [
    //全文搜索字段"productType.pinyin",
    "productMainTitle.pinyin",
    "productSubTitle.pinyin",
    "arr.pinyin",
    "productFeatures.pinyin"
]

使用中遇到的坑：

mapping set to strict, dynamic introduction of [doc]不可用；

使用term类型，即精确查询，字符型类型使用keyword！不能使用text；

关键字若为汉字，使用ik分词器；关键字若为拼音或拼音汉字混合，使用pinyin分词器；

posted on 2017-08-06 11:24 菜鸟Z 阅读(224) 评论(0) 收藏举报

刷新页面返回顶部