BlogEngine.net---搜索

谈及Blogengine的搜索,真的好强大,也许我少见多怪,呵呵。看过以前一个大大写的文章,知道这里有开放式搜索这一应用

  A          B

A图没有打开博客,搜索引擎里就没有B图里的添加“Name of the blog”这一选项,是不是很神奇,呵呵。
B里就是多了一个 <link href="http://localhost:52457/BlogEngine.NET/opensearch.axd" title="Name of the blog" rel="search" type="application/opensearchdescription+xml">
如果把“Name of the blog”添加进去,那么就可以选择它作为搜索引擎,进行搜索,自然搜索的页面就是跳到我们的blog里了,呵呵。


看看具体Blogengine是怎么去搜索的
按一贯的思维去调试,输入内容然后点击search,是以 http://localhost:52457/BlogEngine.NET/search.aspx?q=1 跳转 q后面就是搜索的内容。
一步一步走

 1 protected override void OnLoad(EventArgs e)
2 {
3 base.OnLoad(e);
4
5 rep.ItemDataBound += new RepeaterItemEventHandler(rep_ItemDataBound);
6
7 var term = Request.QueryString["q"];
8 if (!Utils.StringIsNullOrWhitespace(term))
9 {
10 bool includeComments = (Request.QueryString["comment"] == "true");
11
12 var encodedTerm = Server.HtmlEncode(term);
13 Page.Title = Server.HtmlEncode(Resources.labels.searchResultsFor) + " '" + encodedTerm + "'";
14 h1Headline.InnerHtml = Resources.labels.searchResultsFor + " '" + encodedTerm + "'";
15
16 Uri url;
17 if (!Uri.TryCreate(term, UriKind.Absolute, out url))
18 {
19 List<IPublishable> list = Search.Hits(term, includeComments);
20 BindSearchResult(list);
21 }
22 else
23 {
24 SearchByApml(url);
25 }
26 }
27 else
28 {
29 Page.Title = Resources.labels.search;
30 h1Headline.InnerHtml = Resources.labels.search;
31 }
32
33 }

看到List<IPublishable> list = Search.Hits(term, includeComments); 这句,我们顺藤摸瓜

 1 ///<summary>
2 /// Searches all the posts and returns a ranked result set.
3 ///</summary>
4 ///<param name="searchTerm">The term to search for</param>
5 ///<param name="includeComments">True to include a post's comments and their authors in search</param>
6 ///<returns>A list of IPublishable.</returns>
7 public static List<IPublishable> Hits(string searchTerm, bool includeComments)
8 {
9 lock (SyncRoot)
10 {
11 var results = BuildResultSet(searchTerm, includeComments);
12 var items = results.ConvertAll(ResultToPost);
13 results.Clear();
14 OnSearcing(searchTerm);
15 return items;
16 }
17 }

搜索所有内容,并且返回一个有序的结果集。看程序很显然还得继续跟 BuildResultSet

 1  ///<summary>
2 /// Builds the results set and ranks it.
3 ///</summary>
4 ///<param name="searchTerm">
5 /// The search Term.
6 ///</param>
7 ///<param name="includeComments">
8 /// The include Comments.
9 ///</param>
10 private static List<Result> BuildResultSet(string searchTerm, bool includeComments)
11 {
12 var results = new List<Result>();
13 var term = CleanContent(searchTerm.ToLowerInvariant().Trim(), false);
14 var terms = term.Split(new[] { '' }, StringSplitOptions.RemoveEmptyEntries);
15 var regex = string.Format(CultureInfo.InvariantCulture, "({0})", string.Join("|", terms));
16
17 foreach (var entry in Catalog)
18 {
19 var result = new Result();
20 if (!(entry.Item is Comment))
21 {
22 var titleMatches = Regex.Matches(entry.Title, regex).Count;
23 result.Rank = titleMatches * 20;
24
25 var postMatches = Regex.Matches(entry.Content, regex).Count;
26 result.Rank += postMatches;
27
28 var descriptionMatches = Regex.Matches(entry.Item.Description, regex).Count;
29 result.Rank += descriptionMatches * 2;
30 }
31 else if (includeComments)
32 {
33 var commentMatches = Regex.Matches(entry.Content + entry.Title, regex).Count;
34 result.Rank += commentMatches;
35 }
36
37 if (result.Rank > 0)
38 {
39 result.Item = entry.Item;
40 results.Add(result);
41 }
42 }
43
44 results.Sort();
45 return results;
46 }

先不管Catalog具体是怎样,这里的匹配操作都是为了给result.Rank 这里的权值赋值,匹配数越多,权值越高,那么排序也就越靠前,把权值大于0的结果添加进list<result>
集合里,然后sort()排序,这里没有指定comparer那就是默认的,当然blogengine自己写了

 1 ///<summary>
2 /// Compares the current object with another object of the same type.
3 ///</summary>
4 ///<param name="other">
5 /// An object to compare with this object.
6 ///</param>
7 ///<returns>
8 /// A 32-bit signed integer that indicates the relative order of the objects being compared. The return value
9 /// has the following meanings: Value Meaning Less than zero This object is less than the other parameter.Zero
10 /// This object is equal to other. Greater than zero This object is greater than other.
11 ///</returns>
12 public int CompareTo(Result other)
13 {
14 return other.Rank.CompareTo(this.Rank);
15 }

最后返回List<Result>排序后的结果集。再说Catalog是什么呢?他是一个用来被搜索的集合Collection<Entry>,看看Entry的结构

 1 ///<summary>
2 /// A search optimized post object cleansed from HTML and stop words.
3 ///</summary>
4 internal struct Entry
5 {
6 #region Constants and Fields
7
8 ///<summary>
9 /// The content of the post cleansed for stop words and HTML
10 ///</summary>
11 internal string Content;
12
13 ///<summary>
14 /// The post object reference
15 ///</summary>
16 internal IPublishable Item;
17
18 ///<summary>
19 /// The title of the post cleansed for stop words
20 ///</summary>
21 internal string Title;
22
23 #endregion
24 }

回过去看BuildResultSet函数里的匹配方法,我们就会发现原来如此了。我们知道有这么一个东西是用来搜索的,那么它是如何形成的呢?

 1  ///<summary>
2 /// Initializes static members of the <see cref="Search"/> class.
3 ///</summary>
4 static Search()
5 {
6 BuildCatalog();
7 Post.Saved += Post_Saved;
8 Page.Saved += Page_Saved;
9 BlogSettings.Changed += delegate { BuildCatalog(); };
10 Post.CommentAdded += Post_CommentAdded;
11 Post.CommentRemoved += delegate { BuildCatalog(); };
12 Comment.Approved += Post_CommentAdded;
13 }

在静态构造函数内有一个BuildCatalog的方法用来建立搜索集合,同时为其他的post,page。。。等等都添加了事件,也就是说他们一有变动,那么就更新catalog,从这里
又可以看出搜索的集合包含了很多对象,其实他们都有一个公共点就是继承了IPublishable接口
至此,有了搜索的关键字,也有了被搜索的集合,那么自然可以返回搜索后的集合了。
这里的搜索让我想起了lucene.net,呵呵,同样要考虑权值这一说,不过lucene的分词就高级多了,不像这里只能整个关键字去匹配,"ABC"就只能搜出含“ABC”的,而不能搜
出含有“A”或“B”或“C”之类的。











posted @ 2011-11-29 11:40  一文钱  阅读(320)  评论(0编辑  收藏  举报