ElasticSearch学习笔记

mac系统,安装java jdk

brew install elasticsearch,安装完之后记得设置开机自动启动和马上启动elasticsearch服务

根据elasticsearch-rails这个gem为项目加入两个gem

gem 'elasticsearch-model', git: 'git://github.com/elasticsearch/elasticsearch-rails.git'
gem 'elasticsearch-rails', git: 'git://github.com/elasticsearch/elasticsearch-rails.git'
View Code

根据产品需求:根据场馆地址和名字来搜索场馆

全中文的话用默认的设置也能满足,所以搜索函数封装如下:

module Searchable
  extend ActiveSupport::Concern

  included do
    include Elasticsearch::Model
    include Elasticsearch::Model::Callbacks

    mapping do
    end
    
    def self.search(query)
      __elasticsearch__.search(
        {
          query: {
            multi_match: {
              query: query,
              fields: [ "name", "address" ]
            }
          }
        }
      )
    end
  end
end
View Code

model方面的设置如下:

equire 'elasticsearch/model'

class Stadium < ActiveRecord::Base
  include Searchable

  has_many :fields

  belongs_to :city
  belongs_to :sport

  validates_presence_of :status_gap, :available
end

#每次都删除之前的index并且重新创建
Stadium.__elasticsearch__.client.indices.delete index: Stadium.index_name rescue nil
Stadium.__elasticsearch__.client.indices.create \
  index: Stadium.index_name,
  body: { settings: Stadium.settings.to_hash, mappings: Stadium.mappings.to_hash  }


Stadium.import
View Code

 

但是对于部分匹配的需求无用,例如对于"15936525874","tom",

想要输入"to"或者"om"或者"936"也能搜索出结果就匹配失败

根据官网的教程,我用通配符把search函数改写如下:

module Searchable
  extend ActiveSupport::Concern

  included do
    include Elasticsearch::Model
    include Elasticsearch::Model::Callbacks

    #用通配符的话最好设置其analyzer为not_analyzed,减少系统消耗
    mapping do
      indexes :name, index: "not_analyzed"
      indexes :address, index: "not_analyzed"
      indexes :contact_phone, index: "not_analyzed"
    end
    
    def self.search(query)
      __elasticsearch__.search(
        {
          query: {
            query_string: {
              query: "*#{query}*",
              fields: [ "name", "address", "contact_phone" ]
            }
          }
        }
      )
    end
  end
end
View Code

 

但是以通配符开头的模式是非常消耗资源的,应该避免,现在以实现功能为主,暂时先这样

 


 

 

fuzzy query 模糊查询

模糊查询基本格式如下:

"fuzzy" : {
        "price" : {
            "value" : 12,
            "fuzziness" : 2
        }
    }
View Code

当value为数字或者时间格式时,查询变成一个范围

例如对于上面的代码来说就变成查询10<price<14的范围

当value为string的格式时,就涉及一个叫“编辑距离”的东西,具体可以参考这篇文章

例如当我想搜索用户名字叫"tom"的字符时,我的设置如下:

"fuzzy" : {
        "name" : {
            "value" : "to",
            "fuzziness" : 2
        }
    }
View Code

因为字符串从"to"变成"tom"只需要一步的变化,所以即使fuzziness设置为1也能够匹配到

但是对于电话号码或者比较长的用户名字(例如"tommy"时)就匹配不了

 

posted on 2015-09-09 10:30  tomboy  阅读(797)  评论(0编辑  收藏  举报

导航