利用acts_as_ferret实现全文检索
acts_as_ferret是在Rails中实现全文检索的插件,它的实现基于Ferret,Ferret是Apache Lucene的ruby接口。有关acts_as_ferret的介绍网上很多,教程也很多,是早期rails最重要的全文检索插件,不过要老外支持中文检索可谓是天荒地潭,javaeye中讲述中文支持的实现也不尽人意,而且都因年代久远,日益失去参考价值了。鉴此,我在放弃使用acts_as_ferret之前,详细介绍一下如何利用acts_as_ferret实现中文的全文检索吧,算是一个备案,或许未来会有用到它的时候。
独立安装gem
不要以插件方式安装,也不要unpack到你的项目下,原因不大清楚,但这是我血的教训总结出来的!
gem install ferret -v=0.11.5 --platform mswin32 gem install acts_as_ferret gem install rmmseg
然后把C:\ruby\lib\ruby\gems\1.8\gems\ferret-0.11.5-x86-mswin32\ext目录下的ferret_ext.so复制到C:\ruby\lib\ruby\gems\1.8\gems\ferret-0.11.5-x86-mswin32\lib下!
关联gem
在你的项目的environment.rb添加
config.gem 'ferret' config.gem 'rmmseg' config.gem 'acts_as_ferret'
为模型添加索引支持
require 'rmmseg' require 'rmmseg/ferret' class Topic < ActiveRecord::Base #……………………………………………………其他实现……………………………………………………………… #如果想重新建立索引,只需要删除对应的文件夹,并重启服务,也可以使用Model.rebuild_index方法 #=======================搜索部分===================== acts_as_ferret({ :fields => { :title => { :store => :yes, :boost=> 20 #设置权重 }, :body => { :boost=> 1, :store => :yes, :term_vector => :with_positions_offsets }, :author => {:store => :yes}, :created_at_s => {:index => :untokenized,:store => :yes}, :updated_at_s => {:index => :untokenized,:store => :yes} }, :store_class_name=>true, :analyzer => RMMSeg::Ferret::Analyzer.new }) def created_at_s created_at.to_s(:db) end def updated_at_s updated_at.to_s(:db) end def body first_post.body end #……………………………………………………其他实现……………………………………………………………… end
require 'rmmseg' require 'rmmseg/ferret' class Post < ActiveRecord::Base #……………………………………………………其他实现……………………………………………………………… delegate :title, :to => :topic #如果想重新建立索引,只需要删除对应的文件夹,并重启服务,也可以使用Model.rebuild_index方法 #=======================搜索部分===================== acts_as_ferret({ :fields => { :title => { :store => :yes, :boost=> 20 #设置权重 }, :body => { :boost=> 1, :store => :yes, :term_vector => :with_positions_offsets }, :author => {:store => :yes}, :created_at_s => {:index => :untokenized,:store => :yes}, :updated_at_s => {:index => :untokenized,:store => :yes} }, :store_class_name => true, :analyzer => RMMSeg::Ferret::Analyzer.new }) def created_at_s created_at.to_s(:db) end def updated_at_s updated_at.to_s(:db) end #……………………………………………………其他实现……………………………………………………………… end
其中 :analyzer => RMMSeg::Ferret::Analyzer.new为我们添加了中文分词的能力。
建立Search模块
ruby script/generate controller search index
添加路由规则。
map.online '/seach', :controller => 'seach', :action => 'index'
修改search_controller。
class SearchController < ApplicationController def index @class = params[:class] || "topic" @query = params[:query] || '' unless @query.blank? if @class == "topic" @results = Topic.find_with_ferret @query else @results = Post.find_with_ferret @query end end end end
修改对应视图:
<% form_tag '/search', :method => :get ,:style => "margin-left:40%" do %> <input type="radio" name="class" value = "topic" <%= @class == "topic"? 'checked="checked"':'' %>>仅主题贴 <input type="radio" name="class" value = "post" <%= @class == "post"? 'checked="checked"':'' %>>所有贴子<br> <p> <%= text_field_tag :query, @query %> <%= submit_tag "搜索", :name => nil %> </p> <% end %> <% if defined? @results %> <style type="text/css"> .hilite{ color:#0042BD; background:#F345CC; } </style> <div id="search_result"> <% @results.each do |result| %> <h3> <%= result.highlight(@query,:field => :title,:pre_tag => "<span class='hilite'>",:post_tag => "</span>")%> </h3> <div><%= result.highlight(@query,:field => :body,:num_excerpts => 3,:excerpt_length => 250) %></div> <p>作者:<%=result.author %> 发表时间 <%= result.created_at_s %></p> <% end %> </div> <% end %>
分页
application_controller.rb添加
def pages_for(result,options = {}) page, per_page, total = (options[:page] || 1),(options[:per_page] || 30),(result.total_hits || 0) page_total = page * per_page index = (page.to_i - 1) * per_page returning WillPaginate::Collection.new(page, per_page, total) do |pager| pager.replace result[index,per_page] end end
修改控制器:
class SearchController < ApplicationController def index @class = params[:class] || "topic" @query = params[:query] || '' unless @query.blank? if @class == "topic" results = Topic.find_with_ferret @query @results = pages_for(results ,:per_page => 3,:page=> (params[:page] || 1)) else results = Post.find_with_ferret @query @results = pages_for(results ,:per_page => 3,:page=> (params[:page] || 1)) end end end end
对应视图的最下方添加一句(用到will_paginate插件)
<% form_tag '/search', :method => :get ,:style => "margin-left:40%" do %> <input type="radio" name="class" value = "topic" <%= @class == "topic"? 'checked="checked"':'' %>>仅主题贴 <input type="radio" name="class" value = "post" <%= @class == "post"? 'checked="checked"':'' %>>所有贴子<br> <p> <%= text_field_tag :query, @query %> <%= submit_tag "搜索", :name => nil %> </p> <% end %> <% if defined? @results %> <style type="text/css"> .hilite{ color:#0042BD; background:#F345CC; } </style> <div id="search_result"> <% @results.each do |result| %> <h3> <%= result.highlight(@query,:field => :title,:pre_tag => "<span class='hilite'>",:post_tag => "</span>")%> </h3> <div><%= result.highlight(@query,:field => :body,:num_excerpts => 3,:excerpt_length => 250) %></div> <p>作者:<%=result.author %> 发表时间 <%= result.created_at_s %> 相关度 <= number_to_percentage result.ferret_score*100,:precision => 2 %><p> <% end %> </div> <%= will_paginate @results ,:class => "non_ajax"%> <% end %>
产品环境
- model里acts_as_ferret :remote=>true指定remote为true
- 把vendor/plugins/acts_as_ferret/config/目录下的ferret_server.yml copy到 config/下
- ruby script/runner vendor/plugins/acts_as_ferret/script/ferret_server -e production
一些有用的链接
http://www.pluitsolutions.com/2007/07/30/acts-as-ferret-drbserver-win32-service/
http://ferret.davebalmain.com/api/classes/Ferret/Index.html