ElasticSearch学习笔记

mac系统，安装java jdk

brew install elasticsearch，安装完之后记得设置开机自动启动和马上启动elasticsearch服务

gem 'elasticsearch-model', git: 'git://github.com/elasticsearch/elasticsearch-rails.git'
gem 'elasticsearch-rails', git: 'git://github.com/elasticsearch/elasticsearch-rails.git'

View Code

根据产品需求：根据场馆地址和名字来搜索场馆

全中文的话用默认的设置也能满足，所以搜索函数封装如下：

module Searchable
  extend ActiveSupport::Concern

  included do
    include Elasticsearch::Model
    include Elasticsearch::Model::Callbacks

    mapping do
    end
    
    def self.search(query)
      __elasticsearch__.search(
        {
          query: {
            multi_match: {
              query: query,
              fields: [ "name", "address" ]
            }
          }
        }
      )
    end
  end
end

View Code

model方面的设置如下：

equire 'elasticsearch/model'

class Stadium < ActiveRecord::Base
  include Searchable

  has_many :fields

  belongs_to :city
  belongs_to :sport

  validates_presence_of :status_gap, :available
end

#每次都删除之前的index并且重新创建
Stadium.__elasticsearch__.client.indices.delete index: Stadium.index_name rescue nil
Stadium.__elasticsearch__.client.indices.create \
  index: Stadium.index_name,
  body: { settings: Stadium.settings.to_hash, mappings: Stadium.mappings.to_hash  }


Stadium.import

View Code

但是对于部分匹配的需求无用，例如对于"15936525874","tom",

想要输入"to"或者"om"或者"936"也能搜索出结果就匹配失败

根据官网的教程，我用通配符把search函数改写如下：

module Searchable
  extend ActiveSupport::Concern

  included do
    include Elasticsearch::Model
    include Elasticsearch::Model::Callbacks

    #用通配符的话最好设置其analyzer为not_analyzed，减少系统消耗
    mapping do
      indexes :name, index: "not_analyzed"
      indexes :address, index: "not_analyzed"
      indexes :contact_phone, index: "not_analyzed"
    end
    
    def self.search(query)
      __elasticsearch__.search(
        {
          query: {
            query_string: {
              query: "*#{query}*",
              fields: [ "name", "address", "contact_phone" ]
            }
          }
        }
      )
    end
  end
end

View Code

但是以通配符开头的模式是非常消耗资源的，应该避免，现在以实现功能为主，暂时先这样

fuzzy query 模糊查询

模糊查询基本格式如下：

"fuzzy" : {
        "price" : {
            "value" : 12,
            "fuzziness" : 2
        }
    }

View Code

当value为数字或者时间格式时，查询变成一个范围

例如对于上面的代码来说就变成查询10<price<14的范围

当value为string的格式时，就涉及一个叫“编辑距离”的东西，具体可以参考这篇文章

例如当我想搜索用户名字叫"tom"的字符时，我的设置如下：

"fuzzy" : {
        "name" : {
            "value" : "to",
            "fuzziness" : 2
        }
    }

View Code

因为字符串从"to"变成"tom"只需要一步的变化，所以即使fuzziness设置为1也能够匹配到

但是对于电话号码或者比较长的用户名字（例如"tommy"时）就匹配不了

posted on 2015-09-09 10:30 tomboy 阅读(797) 评论(0) 编辑收藏举报

会员力量，点亮园子希望

刷新页面返回顶部

tomboy

ElasticSearch学习笔记

导航

公告