.NET开发过程中的全文索引使用技巧之Solr

    前言:相信许多人都听说过.net开发过程中基于Lucene.net实现的全文索引,而Solr是一个高性能,基于Lucene的全文搜索服务器。同时对其进行了扩展,提供了比Lucene更为丰富的查询语言,同时实现了可配置、可扩展并对查询性能进行了优化,并且提供了一个完善的功能管理界面,是一款非常优秀的全文搜索引引擎,这里我就绕过Lucene,直接说Solr的应用了,总之,Solr比Lucene更加方便简洁好用,而且上手快,开发效率高。

   Solr应用场景:涉及到大数据的全文搜索。尤其是电子商务平台还有现在流行的云计算,物联网等都是需要强大的数据量作为支撑的,使用Solr来进行数据检索最合适不过了,而且Solr是免费开源的,门槛低、投资少见效快。关于Solr的一些优点我这里就不在累赘陈述了,园子里也有很多大神也写了很多关于Solr的技术博文,我这里也只是抛砖引玉,见笑了。

   好了,这里就开始Solr的奇幻之旅吧

 

基于.NET平台下的Solr开发步骤

一、搭建Solr服务器,具体步骤如下:

   1.安装JDK,因为是.NET平台,不需要安装JRE、JAVA虚拟机,只安装JDK即可,而且安装JDK不需要手动去配置环境变量,它会自动帮我们配置好环境变量,很方便,这里我安装的是jdk1.7,官网地址:http://www.oracle.com/technetwork/java/javase/downloads/index.html

   2.安装Tomcat8.0,官网地址:http://tomcat.apache.org/download-80.cgi,安装完成后启动Monitor Tomcat,浏览器地址栏输入http://localhost:8080/,能进入说明安装成功

   3.下载Solr,这里我用的是Solr4.4版本,下载后进行下列配置

  (1)解压Solr4.4,创建Solr目录,比如D:/SorlServer/one,将解压后的Solr4.4中的example目录下的Solr文件夹中的所有文件拷贝到创建的目录中

  (2)创建Solr Web应用,具体步骤,将解压后的Solr4.4中的dist目录下的Solr-4.4.0.war文件拷贝到Tomcat下,比如C:\Program Files\Apache Software Foundation\Tomcat 7.0\webapps下,重命名为one.war,启动Tomcat后该文件会自动解压,进入到D:\SorlServer\one\collection1\conf下,打开solrconfig.xml文件,找到 <dataDir>节点改为<dataDir>${solr.data.dir:c:/SorlServer/one/data}</dataDir>

注意:这一步很重要:打开C:\Program Files\Apache Software Foundation\Tomcat 7.0\webapps\One\WEB-INF下的web.xml文件,找到<env-entry>节点开启

env-entry-value值改为D:/SorlServer/one,如下:

<env-entry>       

      <env-entry-name>solr/home</env-entry-name>

      <env-entry-value>D:/SorlServer/one</env-entry-value>

      <env-entry-type>java.lang.String</env-entry-type>

 </env-entry>

   (3)将解压后的Solr4.4下的/dist/solrj-lib目录中的所有jar包拷贝到C:\Program Files\Apache Software Foundation\Tomcat 7.0\lib

  (4)停止Tomcat,然后再启动,访问http://localhost:8080/one,即可打开

注意:如果是开发英文网站,我们就不需要使用第三方的分词配置,Solr本身就内置支持英文分词,如果是其他语种比如小语种(日语、意大利、法语等等),大家可以去网上找相关的分词包,这里我们以中文分词为例,毕竟国内大部分网站都是中文为主的。

   4.配置中文分词,国内常用的分词器(庖丁解牛mmseg4jIKAnalyzer),这里我用的是IKAnalyzer,这个分词器比较活跃而且更新也快,挺好用的,具体步骤如下:

   (1)IKAnalyzerjar包以及IKAnalyzer.cfg.xml都复制到C:\Program Files\Apache Software Foundation\Tomcat 7.0\webapps\one\WEB-INF\lib

   (2)配置D:\SorlServer\one\collection1\conf下的schema.xml,添加如下配置:

      <!-- 分词配置 -->

 <fieldType name="text_IKFENCHI" class="solr.TextField"> 

     <analyzer class="org.wltea.analyzer.lucene.IKAnalyzer"/>

 </fieldType>

    (3)停止Tomcat,然后再启动,访问http://localhost:8080/one/#/collection1/analysis,即可进行测试

    以上是Solr服务器端的相关配置工作

二、开始基于.NET平台的Solr开发:

   1.下载Solr客户端组件,我用的是园子里的Terry大哥的EasyNet.Solr,地址在微软开源站:http://easynet.codeplex.com/

Terry大哥已经把solr客户端封装的很完善了,里面封装了很多现成的方法和参数配置,我们直接可以拿过来用,利用Easynet.solr创建索引,然后再查询索引,具体使用方法如下:

  (1)下载EasyNet.Solr源码直接放到项目中,也可以将源码生成Dll组件后添加到项目引用进行使用,把源码放到项目中最好不过了,我们也可以对其进行调整来满足自己的需要

  (2)创建索引实体类,就是我们要保存的索引数据,比如创建一个产品实体类   

using System;
using System.Collections.Generic;

namespace Seek.SearchIndex
{
    public partial class IndexProductModel
    {
        public IndexProductModel()
        {
        }

        #region  Properties
        public int ID { get; set; }
        public int ProductID { get; set; }
        public string ClassPath { get; set; }
        public int ClassID1 { get; set; }
        public int ClassID2 { get; set; }
        public int ClassID3 { get; set; }
        public string Title { get; set; }
        public string Model { get; set; }
        public string PriceRange { get; set; }
        public string AttributeValues { get; set; }
        public string ProductImages { get; set; }
        public int MemberID { get; set; }
        public System.DateTime CreateDate { get; set; }
        public System.DateTime LastEditDate { get; set; }
        public string FileName { get; set; }
        public string ProductType { get; set; }
        public string Summary { get; set; }
        public string Details { get; set; }
        public string RelatedKeywords { get; set; }
        public int MemberGrade { get; set; }
        #endregion
    }
}
View Code

  (3)配置Solr服务器端的xml,就是将咱们的这个索引实体类配置到Solr服务器上,进入D:\SorlServer\one\collection1\conf,打开schema.xml文件,配置如下

 <field name="ID" type="string" indexed="true" stored="true" required="true" multiValued="false" /> 
   <field name="ProductID" type="int" indexed="true" stored="true"/>
   <!-- 快速高亮配置 termVectors="true" termPositions="true"  termOffsets="true" -->
   <field name="Title" type="text_en_splitting" indexed="true" stored="true" termVectors="true" termPositions="true"  termOffsets="true"/>
   <field name="Model" type="text_en_splitting" indexed="true" stored="true" termVectors="true" termPositions="true"  termOffsets="true"/>
   <field name="ClassPath" type="string" indexed="true" stored="true"/>
   <field name="ClassID1" type="int" indexed="true" stored="true"/>
   <field name="ClassID2" type="int" indexed="true" stored="true"/>
   <field name="ClassID3" type="int" indexed="true" stored="true"/>
   <field name="PriceRange" type="string" indexed="true" stored="true"/>
   <field name="AttributeValues" type="string" indexed="true" stored="true"/>
   <field name="ProductImages" type="string" indexed="true" stored="true"/>
   <field name="MemberID" type="int" indexed="true" stored="true"/>
   <field name="CreateDate" type="date" indexed="true" stored="true"/>
   <field name="LastEditDate" type="date" indexed="true" stored="true"/>
   <field name="FileName" type="string" indexed="true" stored="true"/>
   <field name="ProductType" type="string" indexed="true" stored="true"/>
   <field name="Summary" type="string" indexed="true" stored="false"/>
   <field name="Details" type="string" indexed="true" stored="false"/>
   <field name="RelatedKeywords" type="string" indexed="true" stored="true"/>
   <field name="MemberType" type="string" indexed="true" stored="true"/>
   <field name="MemberGrade" type="int" indexed="true" stored="true"/>
View Code

  (4)开始创建索引,最好能写一个生成索引的客户端程序,我这里提供一下自己的索引器关键代码

using System;
using System.Collections.Generic;
using System.Linq;
using System.Text;
using Seek.SearchIndex;
using System.Data;
using System.Threading;
using System.Configuration;
using System.Reflection;
using EasyNet.Solr;
using EasyNet.Solr.Impl;
using EasyNet.Solr.Commons;
using System.Xml.Linq;
using EasyNet.Solr.Commons.Params;
using System.Threading.Tasks;

namespace Seek.SearchIndex
{
    /// <summary>
    /// 索引器
    /// </summary>
    public class Indexer
    {
        private readonly static OptimizeOptions optimizeOptions = new OptimizeOptions();
        private readonly static CommitOptions commitOptions = new CommitOptions() { SoftCommit = true };
        private readonly static ISolrResponseParser<NamedList, EasyNet.Solr.ResponseHeader> binaryResponseHeaderParser = new BinaryResponseHeaderParser();
        private readonly static IUpdateParametersConvert<NamedList> updateParametersConvert = new BinaryUpdateParametersConvert();
        private readonly static ISolrQueryConnection<NamedList> connection = new SolrQueryConnection<NamedList>() { ServerUrl = ConfigurationManager.AppSettings["SolrServer"] };
        private readonly static ISolrUpdateConnection<NamedList, NamedList> solrUpdateConnection = new SolrUpdateConnection<NamedList, NamedList>() { ServerUrl = ConfigurationManager.AppSettings["SolrServer"], ContentType = "application/javabin" };
        private readonly static ISolrUpdateOperations<NamedList> solr = new SolrUpdateOperations<NamedList, NamedList>(solrUpdateConnection, updateParametersConvert) { ResponseWriter = "javabin" };
        private readonly static ISolrQueryOperations<NamedList> solrQuery = new SolrQueryOperations<NamedList>(connection) { ResponseWriter = "javabin" };

        public enum State
        {
            /// <summary>
            /// 运行中
            /// </summary>
            Runing,
            /// <summary>
            /// 停止
            /// </summary>
            Stop,
            /// <summary>
            /// 中断
            /// </summary>
            Break
        }
        /// <summary>
        /// 窗口
        /// </summary>
        private Main form;
        /// <summary>
        /// 线程
        /// </summary>
        public Thread t;
        /// <summary>
        /// 消息状态
        /// </summary>
        public State state = State.Stop;
        /// <summary>
        /// 当前索引
        /// </summary>
        private long currentIndex = 0;

        public long CurrentIndex
        {
            get { return currentIndex; }
            set { currentIndex = value; }
        }

        private int _startId = AppCongfig.StartId;

        public int StartId
        {
            get { return _startId; }
            set { _startId = value; }
        }

        /// <summary>
        /// 产品总数
        /// </summary>
        private int productsCount = 0;
        /// <summary>
        /// 起始时间
        /// </summary>
        private DateTime startTime = DateTime.Now;
        /// <summary>
        /// 结束时间
        /// </summary>
        private DateTime endTime = DateTime.MinValue;
        private static object syncLock = new object();
        #region 单利模式
        private static Indexer instance = null;

        private Indexer(Main _form)
        {
            form = _form;
            productsCount = DataAccess.GetCount(0);       //产品数统计
            form.fullerTsslMaxNum.Text = productsCount.ToString();
            form.fullerProgressBar.Minimum = 0;
            form.fullerProgressBar.Maximum = productsCount;
        }
        public static Indexer GetInstance(Main form)
        {
            if (instance == null)
            {
                lock (syncLock)
                {
                    if (instance == null)
                    {
                        instance = new Indexer(form);
                    }
                }
            }
            return instance;
        }
        #endregion

        /// <summary>
        /// 启动
        /// </summary>
        public void Start()
        {
            ThreadStart ts = new ThreadStart(FullerRun);
            t = new Thread(ts);
            t.Start();
        }
        /// <summary>
        /// 停止
        /// </summary>
        public void Stop()
        {
            state = State.Stop;
        }
        /// <summary>
        /// 中断
        /// </summary>
        public void Break()
        {
            state = State.Break;
        }


        /// <summary>
        /// 创建索引
        /// </summary>
        public void InitIndex(object data)
        {
            var docs = new List<SolrInputDocument>();
            DataTable list = data as DataTable;
            foreach (DataRow pro in list.Rows)
            {
                var model = new SolrInputDocument();

                PropertyInfo[] properites = typeof(IndexProductModel).GetProperties();//得到实体类属性的集合
                string[] dateFields = { "CreateDate", "LastEditDate" };
                string field = string.Empty;//存储fieldname
                foreach (PropertyInfo propertyInfo in properites)//遍历数组
                {
                    object val = pro[propertyInfo.Name];
                    if (val != DBNull.Value)
                    {
                        model.Add(propertyInfo.Name, new SolrInputField(propertyInfo.Name, val));
                    }
                }
                docs.Add(model);

                StartId = Convert.ToInt32(pro["ID"]);
            }
            GetStartId();
            lock (syncLock)
            {
                if (currentIndex <= productsCount)
                {
                    form.fullerProgressBar.Value = (int)currentIndex;
                }
                form.fullerTsslCurrentNum.Text = currentIndex.ToString();
            }
            var result = solr.Update("/update", new UpdateOptions() {  Docs = docs });
        }

        /// <summary>
        /// 创建索引
        /// </summary>
        public void CreateIndexer(DataTable dt)
        {
            GetStartId();
            Parallel.ForEach<DataRow>(dt.AsEnumerable(), (row) =>
            {
                //从数据库查询商品详细属性
                if (row != null)
                {
                    var docs = new List<SolrInputDocument>();
                    var model = new SolrInputDocument();

                    PropertyInfo[] properites = typeof(IndexProductModel).GetProperties();//得到实体类属性的集合
                    string[] dateFields = { "CreateDate", "LastEditDate" };
                    string field = string.Empty;//存储fieldname
                    foreach (PropertyInfo propertyInfo in properites)//遍历数组
                    {
                        object val = row[propertyInfo.Name];
                        if (val != DBNull.Value)
                        {
                            model.Add(propertyInfo.Name, new SolrInputField(propertyInfo.Name, val));
                        }
                    }
                    docs.Add(model);

                    StartId = Convert.ToInt32(row["ID"]);
                    var result = solr.Update("/update", new UpdateOptions() { Docs = docs });
                }
            });

            //GetStartId();
            lock (syncLock)
            {
                if (currentIndex <= productsCount)
                {
                    form.fullerProgressBar.Value = (int)currentIndex;
                }
                form.fullerTsslCurrentNum.Text = currentIndex.ToString();
            }
        }

        /// <summary>
        /// 全部索引运行
        /// </summary>
        public void FullerRun()
        {
            //GetStartId();
            //form.fullerTsslCurrentNum.Text = currentIndex.ToString();
            DataTable dt = DataAccess.GetNextProductsInfo(StartId);
            StartId = AppCongfig.StartId;
            if (state == State.Break)
            {
                this.SendMesasge("完全索引已继续,起始ID[" + StartId + "]...");
            }
            else
            {
                startTime = DateTime.Now;
                this.SendMesasge("完全索引已启动,起始ID[" + StartId + "]...");
            }
            state = State.Runing;
            form.btnInitIndex.Enabled = false;
            form.btnSuspend.Enabled = true;
            form.btnStop.Enabled = true;
      
            while (dt != null && dt.Rows.Count > 0 && state == State.Runing)
            {
                try
                {
                    InitIndex(dt);//单线程
                   // CreateIndexer(dt);//多线程
                }
                catch (Exception ex)
                {
                    state = State.Stop;
                    form.btnInitIndex.Enabled = true;
                    form.btnSuspend.Enabled = false;
                    form.btnStop.Enabled = false;
                    GetStartId();
                    this.SendMesasge(ex.Message.ToString());
                }
                form.fullerTsslTimeSpan.Text = "已运行 :" + GetTimeSpanShow(DateTime.Now - startTime) + ",预计还需:" + GetTimeSpanForecast();

                try
                {
                    dt = DataAccess.GetNextProductsInfo(StartId);//获取下一组产品
                }
                catch (Exception err)
                {
                    this.SendMesasge("获取下一组产品出错,起始ID[" + StartId + "]:" + err.Message);
                }
            }
            if (state == State.Runing)
            {
                state = State.Stop;
                form.btnInitIndex.Enabled = true;
                form.btnSuspend.Enabled = false;
                form.btnStop.Enabled = false;
                AppCongfig.SetValue("StartId", StartId.ToString());
                this.SendMesasge("完全索引已完成,总计索引数[" + currentIndex + "]结束的产品Id" + StartId);
            }
            else if (state == State.Break)
            {
                GetStartId();
                state = State.Break;
                form.btnInitIndex.Enabled = true;
                form.btnSuspend.Enabled = false;
                form.btnStop.Enabled = false;
                AppCongfig.SetValue("StartId", StartId.ToString());
                this.SendMesasge("完全索引已暂停,当前索引位置[" + currentIndex + "]结束的产品Id" + StartId);
            }
            else if (state == State.Stop)
            {
                GetStartId();
                state = State.Stop;
                this.SendMesasge("完全索引已停止,已索引数[" + currentIndex + "]结束的产品Id" + StartId);
                form.btnInitIndex.Enabled = true;
                form.btnSuspend.Enabled = false;
                form.btnStop.Enabled = false;
                AppCongfig.SetValue("StartId", StartId.ToString());
                productsCount = DataAccess.GetCount(StartId);       //产品数统计
                form.fullerTsslMaxNum.Text = productsCount.ToString();
                form.fullerProgressBar.Minimum = 0;
                form.fullerProgressBar.Maximum = productsCount;
            }
            endTime = DateTime.Now;
        }

        /// <summary>
        /// 多线程构建索引数据方法
        /// </summary>
        /// <param name="threadDataParam"></param>
        public void MultiThreadCreateIndex(object threadDataParam)
        {
            InitIndex(threadDataParam);
        }

        /// <summary>
        /// 获取最大的索引id
        /// </summary>
        private void GetStartId()
        {
            IDictionary<string, ICollection<string>> options = new Dictionary<string, ICollection<string>>();
            options[CommonParams.SORT] = new string[] { "ProductID DESC" };
            options[CommonParams.START] = new string[] { "0" };
            options[CommonParams.ROWS] = new string[] { "1" };
            options[HighlightParams.FIELDS] = new string[] { "ProductID" };
            options[CommonParams.Q] = new string[] { "*:*" };
            var result = solrQuery.Query("/select", null, options);
            var solrDocumentList = (SolrDocumentList)result.Get("response");
            currentIndex = solrDocumentList.NumFound;
            if (solrDocumentList != null && solrDocumentList.Count() > 0)
            {
                StartId = (int)solrDocumentList[0]["ProductID"];
                //AppCongfig.SetValue("StartId", solrDocumentList[0]["ProductID"].ToString());
            }
            else
            {
                StartId = 0;
                // AppCongfig.SetValue("StartId", "0");
            }
        }


        /// <summary>
        /// 优化索引
        /// </summary>
        public void Optimize()
        {
            this.SendMesasge("开始优化索引,请耐心等待...");
            var result = solr.Update("/update", new UpdateOptions() { OptimizeOptions = optimizeOptions });
            var header = binaryResponseHeaderParser.Parse(result);
            this.SendMesasge("优化索引耗时:" + header.QTime + "毫秒");
        }

        /// <summary>
        /// 发送消息到界面
        /// </summary>
        /// <param name="message">发送消息到界面</param>
        protected void SendMesasge(string message)
        {
            form.fullerDgvMessage.Rows.Add(form.fullerDgvMessage.Rows.Count + 1, message, DateTime.Now.ToString());
        }
        /// <summary>
        /// 获取时间间隔显示
        /// </summary>
        /// <param name="ts">时间间隔</param>
        /// <returns></returns>
        protected string GetTimeSpanShow(TimeSpan ts)
        {
            string text = "";
            if (ts.Days > 0)
            {
                text += ts.Days + "";
            }
            if (ts.Hours > 0)
            {
                text += ts.Hours + "";
            }
            if (ts.Minutes > 0)
            {
                text += ts.Minutes + "";
            }
            if (ts.Seconds > 0)
            {
                text += ts.Seconds + "";
            }
            return text;
        }
        /// <summary>
        /// 获取预测时间
        /// </summary>
        /// <returns></returns>
        protected string GetTimeSpanForecast()
        {
            if (currentIndex != 0)
            {
                TimeSpan tsed = DateTime.Now - startTime;
                double d = ((tsed.TotalMilliseconds / currentIndex) * productsCount) - tsed.TotalMilliseconds;
                return GetTimeSpanShow(TimeSpan.FromMilliseconds(d));
            }
            return "";
        }
    }
}
View Code

  (5)运行索引器,创建索引,这里是我的索引器界面,如图

   可以随时跟踪索引生成的情况

  (6)索引创建完毕后,可以进入Solr服务器界面http://localhost:8080/one/#/collection1/query进行测试

 

以上就是Solr的前期工作,主要是Solr服务器搭建和客户端调用生成索引,后期再对客户端的查询进行详细的说明,下期预告

1.全文搜索,分词配置,以及类似于谷歌和百度那种输入关键字自动完成功能

2.Facet查询

 

 

     

 

 

 

posted @ 2013-11-28 10:45  曾俊杰  阅读(2193)  评论(4编辑  收藏  举报