社区帖子全文搜索实战(基于ElasticSearch)

要为社区APP的帖子提供全文搜索的功能,考察使用ElasticSearch实现此功能。

ES的安装不再描述。

  • es集成中文分词器(根据es版本选择对应的插件版本)

  下载源码:https://github.com/medcl/elasticsearch-analysis-ik
  maven编译得到:elasticsearch-analysis-ik-1.9.5.zip

  在plugins目录下创建ik目录,将elasticsearch-analysis-ik-1.9.5.zip解压在此目录。

  • 创建索引(settings,mapping)

  配置

{
    "settings":{
        "number_of_shards":5,
        "number_of_replicas":1
    },
    "mappings":{
        "post":{
            "dynamic":"strict",
            "properties":{
                "id":{"type":"integer","store":"yes"},
                "title":{"type":"string","store":"yes","index":"analyzed","analyzer": "ik_max_word","search_analyzer": "ik_max_word"},
                "content":{"type":"string","store":"yes","index":"analyzed","analyzer": "ik_max_word","search_analyzer": "ik_max_word"},
                "author":{"type":"string","store":"yes","index":"no"},
                "time":{"type":"date","store":"yes","index":"no"}
            }
        }
    }
}

  执行命令,创建索引

  curl -XPOST 'spark2:9200/community' -d @post.json

  •  插入数据

  工程代码依赖的jar包

pom.xml
<dependency>
  <groupId>org.elasticsearch</groupId>
  <artifactId>elasticsearch</artifactId>
  <version>2.3.3</version>
</dependency>
<dependency>
  <groupId>com.alibaba</groupId>
  <artifactId>fastjson</artifactId>
  <version>1.2.7</version>
</dependency>

ES client工具类

public class EsClient {

  private static TransportClient transportClient;

  static {
    Settings settings = Settings.builder().put("cluster.name", "es_cluster").build();
    try {
      transportClient = new TransportClient.Builder().settings(settings)
          .build()
          .addTransportAddress(new InetSocketTransportAddress(InetAddress.getByName("spark2"), 9300))
          .addTransportAddress(new InetSocketTransportAddress(InetAddress.getByName("spark3"), 9300));
    } catch (UnknownHostException e) {
      throw new RuntimeException(e);
    }
  }

  public static TransportClient getInstance() {
    return transportClient;
  }
}

插入数据

TransportClient client = EsClient.getInstance();


    for (int i = 0; i < 10000; i++) {
      Post post = new Post(i + "", "hll", "百度百科", "ES即etamsports ,全名上海英模特制衣有限公司,是法国Etam集团在中国的分支企业,创立于1994年底。ES的服装适合出游、朋友聚会、晚间娱乐、校园生活等各种轻松", new Date());
      client.prepareIndex("community", "post", post.getId())
          .setSource(JSON.toJSONString(post))
          .execute()
          .actionGet();
    }
  • 查询,高亮
 TransportClient client = EsClient.getInstance();
    SearchResponse response = client.prepareSearch("community")
        .setTypes("post")
        .setSearchType(SearchType.DFS_QUERY_THEN_FETCH)
        .setQuery(QueryBuilders.multiMatchQuery("上海", "title", "content")) 
        .setFrom(0).setSize(10)
        .addHighlightedField("content")
        .setHighlighterPreTags("<red>")
        .setHighlighterPostTags("</red>")
        .execute()
        .actionGet();

    SearchHits hits = response.getHits();
    for (SearchHit hit : hits) {
      String s = "";
      System.out.println(hit.getHighlightFields());
      for (Text text : hit.highlightFields().get("content").getFragments()) {
        s += text.string();
      }
      Map<String, Object> source = hit.getSource();
      source.put("content", s);
      System.out.println(source);
    }

查询结果


{author=hll, id=782, time=1490165237878, title=百度百科, content=ES即etamsports ,全名<red>上海</red>英模特制衣有限公司,是法国Etam集团在中国的分支企业,创立于1994年底。ES的服装适合出游、朋友聚会、晚间娱乐、校园生活等各种轻松}

 
posted @ 2017-03-22 15:05  huangll99  阅读(1796)  评论(0编辑  收藏  举报