Ruby's Louvre

每天学习一点点算法

导航

利用acts_as_ferret实现全文检索

acts_as_ferret是在Rails中实现全文检索的插件,它的实现基于Ferret,Ferret是Apache Lucene的ruby接口。有关acts_as_ferret的介绍网上很多,教程也很多,是早期rails最重要的全文检索插件,不过要老外支持中文检索可谓是天荒地潭,javaeye中讲述中文支持的实现也不尽人意,而且都因年代久远,日益失去参考价值了。鉴此,我在放弃使用acts_as_ferret之前,详细介绍一下如何利用acts_as_ferret实现中文的全文检索吧,算是一个备案,或许未来会有用到它的时候。

独立安装gem

不要以插件方式安装,也不要unpack到你的项目下,原因不大清楚,但这是我血的教训总结出来的!

gem install ferret -v=0.11.5 --platform mswin32
gem install acts_as_ferret
gem install rmmseg

然后把C:\ruby\lib\ruby\gems\1.8\gems\ferret-0.11.5-x86-mswin32\ext目录下的ferret_ext.so复制到C:\ruby\lib\ruby\gems\1.8\gems\ferret-0.11.5-x86-mswin32\lib下!

关联gem

在你的项目的environment.rb添加

  config.gem 'ferret'
  config.gem 'rmmseg'
  config.gem 'acts_as_ferret'

为模型添加索引支持


require 'rmmseg'
require 'rmmseg/ferret'
class Topic < ActiveRecord::Base
  #……………………………………………………其他实现………………………………………………………………
  #如果想重新建立索引,只需要删除对应的文件夹,并重启服务,也可以使用Model.rebuild_index方法
  #=======================搜索部分=====================
  acts_as_ferret({
      :fields => {
        :title => {
          :store => :yes,
          :boost=> 20 #设置权重
        },
        :body => {
          :boost=> 1,
          :store => :yes,
          :term_vector => :with_positions_offsets
        },
        :author => {:store => :yes},
        :created_at_s => {:index => :untokenized,:store => :yes},
        :updated_at_s => {:index => :untokenized,:store => :yes}
      },
      :store_class_name=>true,
      :analyzer => RMMSeg::Ferret::Analyzer.new
    })
  def created_at_s
    created_at.to_s(:db)
  end

  def updated_at_s
    updated_at.to_s(:db)
  end

  def body
    first_post.body
  end
#……………………………………………………其他实现………………………………………………………………
end
require 'rmmseg'
require 'rmmseg/ferret'
class Post < ActiveRecord::Base
  #……………………………………………………其他实现………………………………………………………………
  delegate :title, :to => :topic
  
  #如果想重新建立索引,只需要删除对应的文件夹,并重启服务,也可以使用Model.rebuild_index方法
  #=======================搜索部分=====================
  acts_as_ferret({
      :fields => {
        :title => {
          :store => :yes,
          :boost=> 20 #设置权重
        },
        :body => {
          :boost=> 1,
          :store => :yes,
          :term_vector => :with_positions_offsets
        },
        :author => {:store => :yes},
        :created_at_s => {:index => :untokenized,:store => :yes},
        :updated_at_s => {:index => :untokenized,:store => :yes}
      },
      :store_class_name => true,
      :analyzer => RMMSeg::Ferret::Analyzer.new
    })

  def created_at_s
    created_at.to_s(:db)
  end

  def updated_at_s
    updated_at.to_s(:db)
  end
#……………………………………………………其他实现………………………………………………………………
end

其中 :analyzer => RMMSeg::Ferret::Analyzer.new为我们添加了中文分词的能力。

建立Search模块

ruby script/generate controller search index

添加路由规则。

 map.online '/seach', :controller => 'seach', :action => 'index'

修改search_controller。

class SearchController < ApplicationController
  def index
    @class = params[:class] || "topic"
    @query = params[:query] || ''
    unless @query.blank?
      if @class == "topic"
        @results = Topic.find_with_ferret @query
      else
        @results = Post.find_with_ferret @query
      end
    end
  end
end

修改对应视图:

<% form_tag '/search', :method => :get ,:style => "margin-left:40%" do %>
  <input type="radio" name="class" value = "topic" <%= @class == "topic"? 'checked="checked"':'' %>>仅主题贴
  <input type="radio" name="class" value = "post" <%= @class == "post"? 'checked="checked"':'' %>>所有贴子<br>
  <p>
    <%= text_field_tag :query, @query %>
    <%= submit_tag "搜索", :name => nil %>
  </p>
<% end %>

<% if defined? @results %>
  <style type="text/css">
    .hilite{
      color:#0042BD;
      background:#F345CC;
    }
  </style>

  <div id="search_result">
    <% @results.each do |result| %>
      <h3>
        <%= result.highlight(@query,:field => :title,:pre_tag => "<span class='hilite'>",:post_tag => "</span>")%>
      </h3>
      <div><%= result.highlight(@query,:field => :body,:num_excerpts => 3,:excerpt_length => 250) %></div>
      <p>作者:<%=result.author %>  发表时间 <%= result.created_at_s %></p>
    <% end %>
  </div>
<% end %>

另一个高亮方案。

  def hilight(a,b)
    #a为要高亮的字符串,b为高亮部分,默认高亮后的样式为hilite
    highlight a,b, '\1'
  end


  
<% @results.each do |result| %>

<%= hilight h(result.title),@query %>

<%= hilight simple_format(truncate(result.body,:length => 250)), @query %>

作者:<%= hilight h(result.author),@query %> 发表时间 <%= result.created_at_s %>

<% end %>

分页

application_controller.rb添加

  def pages_for(result,options = {})
    page, per_page, total = (options[:page] || 1),(options[:per_page] || 30),(result.total_hits || 0)
    page_total = page * per_page
    index = (page.to_i - 1) * per_page
    returning WillPaginate::Collection.new(page, per_page, total) do |pager|
      pager.replace result[index,per_page]
    end
  end

修改控制器:

class SearchController < ApplicationController
  def index
    @class = params[:class] || "topic"
    @query = params[:query] || ''
    unless @query.blank?
      if @class == "topic"
        results = Topic.find_with_ferret @query
        @results = pages_for(results  ,:per_page => 3,:page=> (params[:page] || 1))
      else
        results = Post.find_with_ferret @query
        @results = pages_for(results  ,:per_page => 3,:page=> (params[:page] || 1))
      end
    end
  end
end

对应视图的最下方添加一句(用到will_paginate插件)

<% form_tag '/search', :method => :get ,:style => "margin-left:40%" do %>
  <input type="radio" name="class" value = "topic" <%= @class == "topic"? 'checked="checked"':'' %>>仅主题贴
  <input type="radio" name="class" value = "post" <%= @class == "post"? 'checked="checked"':'' %>>所有贴子<br>
  <p>
    <%= text_field_tag :query, @query %>
    <%= submit_tag "搜索", :name => nil %>
  </p>
<% end %>

<% if defined? @results %>
  <style type="text/css">
    .hilite{
      color:#0042BD;
      background:#F345CC;
    }
  </style>

  <div id="search_result">
    <% @results.each do |result| %>
      <h3>
        <%= result.highlight(@query,:field => :title,:pre_tag => "<span class='hilite'>",:post_tag => "</span>")%>
      </h3>
      <div><%= result.highlight(@query,:field => :body,:num_excerpts => 3,:excerpt_length => 250) %></div>
      <p>作者:<%=result.author %>  发表时间 <%= result.created_at_s %> 相关度 <= number_to_percentage result.ferret_score*100,:precision => 2 %><p>
    <% end %>
  </div>
  <%= will_paginate @results ,:class => "non_ajax"%>
<% end %>

产品环境

  • model里acts_as_ferret :remote=>true指定remote为true
  • 把vendor/plugins/acts_as_ferret/config/目录下的ferret_server.yml copy到 config/下
  • ruby script/runner vendor/plugins/acts_as_ferret/script/ferret_server -e production

一些有用的链接

http://www.pluitsolutions.com/2007/07/30/acts-as-ferret-drbserver-win32-service/

http://ferret.davebalmain.com/api/classes/Ferret/Index.html

posted on 2009-07-22 17:09  司徒正美  阅读(2535)  评论(0编辑  收藏  举报