社区帖子全文搜索实战(基于ElasticSearch)
要为社区APP的帖子提供全文搜索的功能,考察使用ElasticSearch实现此功能。
ES的安装不再描述。
- es集成中文分词器(根据es版本选择对应的插件版本)
下载源码:https://github.com/medcl/elasticsearch-analysis-ik
maven编译得到:elasticsearch-analysis-ik-1.9.5.zip
在plugins目录下创建ik目录,将elasticsearch-analysis-ik-1.9.5.zip解压在此目录。
- 创建索引(settings,mapping)
配置
{ "settings":{ "number_of_shards":5, "number_of_replicas":1 }, "mappings":{ "post":{ "dynamic":"strict", "properties":{ "id":{"type":"integer","store":"yes"}, "title":{"type":"string","store":"yes","index":"analyzed","analyzer": "ik_max_word","search_analyzer": "ik_max_word"}, "content":{"type":"string","store":"yes","index":"analyzed","analyzer": "ik_max_word","search_analyzer": "ik_max_word"}, "author":{"type":"string","store":"yes","index":"no"}, "time":{"type":"date","store":"yes","index":"no"} } } } }
执行命令,创建索引
curl -XPOST 'spark2:9200/community' -d @post.json
- 插入数据
工程代码依赖的jar包
pom.xml
<dependency>
<groupId>org.elasticsearch</groupId>
<artifactId>elasticsearch</artifactId>
<version>2.3.3</version>
</dependency>
<dependency>
<groupId>com.alibaba</groupId>
<artifactId>fastjson</artifactId>
<version>1.2.7</version>
</dependency>
ES client工具类
public class EsClient { private static TransportClient transportClient; static { Settings settings = Settings.builder().put("cluster.name", "es_cluster").build(); try { transportClient = new TransportClient.Builder().settings(settings) .build() .addTransportAddress(new InetSocketTransportAddress(InetAddress.getByName("spark2"), 9300)) .addTransportAddress(new InetSocketTransportAddress(InetAddress.getByName("spark3"), 9300)); } catch (UnknownHostException e) { throw new RuntimeException(e); } } public static TransportClient getInstance() { return transportClient; } }
插入数据
TransportClient client = EsClient.getInstance(); for (int i = 0; i < 10000; i++) { Post post = new Post(i + "", "hll", "百度百科", "ES即etamsports ,全名上海英模特制衣有限公司,是法国Etam集团在中国的分支企业,创立于1994年底。ES的服装适合出游、朋友聚会、晚间娱乐、校园生活等各种轻松", new Date()); client.prepareIndex("community", "post", post.getId()) .setSource(JSON.toJSONString(post)) .execute() .actionGet(); }
- 查询,高亮
TransportClient client = EsClient.getInstance(); SearchResponse response = client.prepareSearch("community") .setTypes("post") .setSearchType(SearchType.DFS_QUERY_THEN_FETCH) .setQuery(QueryBuilders.multiMatchQuery("上海", "title", "content")) .setFrom(0).setSize(10) .addHighlightedField("content") .setHighlighterPreTags("<red>") .setHighlighterPostTags("</red>") .execute() .actionGet(); SearchHits hits = response.getHits(); for (SearchHit hit : hits) { String s = ""; System.out.println(hit.getHighlightFields()); for (Text text : hit.highlightFields().get("content").getFragments()) { s += text.string(); } Map<String, Object> source = hit.getSource(); source.put("content", s); System.out.println(source); }
查询结果
{author=hll, id=782, time=1490165237878, title=百度百科, content=ES即etamsports ,全名<red>上海</red>英模特制衣有限公司,是法国Etam集团在中国的分支企业,创立于1994年底。ES的服装适合出游、朋友聚会、晚间娱乐、校园生活等各种轻松}