Boyer-Moore字符串查找算法的实现

　　前段时间在园子里看到一篇讲Boyer-Moore算法原理的文章http://kb.cnblogs.com/page/176945/，写的很详细，于是在这里自己写个C语言的实现，权当是练手吧。

　　基本思路是每次从模式串的最右端开始匹配，如果后缀不匹配，模式串可以快速地后移，从而快速地匹配字符串。要用一个数组right[]，来存储失配时模式串的移动步数。数组的大小是256，即扩展的ASCII码表大小（即256个字符）。若对应字符不存在于模式串中，则赋值-1，否则表示字符出现在模式串中，查找失配时，模式串向右移动的步数。应该还有优化的空间，暂时没想了。

　　分成两个文件，一个.h文件，一个.c文件。实现中，若匹配成功则返回第一次出现模式串的位置，若不匹配则返回模式串长度。

　　N: 被搜索的字符串长度。

　　M: 模式串长度。

　　strsearch.h :

 1 #ifndef _STRSEARCH_H
 2 #define _STRSEARCH_H
 3 
 4 
 5 /***ASCII list length***/
 6 
 7 #define ASCII_LIST_LENGTH 256
 8 
 9 /* Interface */
10 
11 extern int BMSearch(char *dest_str,char *pattern);
12 
13 
14 #endif

　　strsearch.c :

 1 #include <stdio.h>
 2 #include <string.h>
 3 #include "strsearch.h"
 4 
 5 #ifdef _cplusplus
 6 extern "C"{
 7 #endif
 8 
 9 /*
10  ******Implementation of Boyer-Moore Algorithm******
11  *
12  * This function is to solve the string search ,and somhow we
13  * can find the position of pattern string in the dest string
14  * quickly.
15  *
16  * Copyright(c) 2013.9.6 xiaoh
17  * All rights reserved.
18  *
19  ***************************************************
20 */
21 
22 
23 /*
24  * This function is to build the jump step list for each
25  * charactor.
26  *
27 */
28 
29 void BoyerMoore(char *pattern,int right[])
30 {
31     int M = strlen(pattern);
32 
33     for(int c=0;c<ASCII_LIST_LENGTH;c++)
34         right[c] = -1;
35     for(int j=0;j<M;j++)
36         right[pattern[j]] = j;
37 }
38 
39 /*
40  * Main function of Boyer-More Search Algorithm
41  *
42 */
43 
44 int BMSearch(char *dest_str,char *pattern)
45 {
46     /*Array right: steps to move for the pattern string*/
47     int right[ASCII_LIST_LENGTH];
48 
49     BoyerMoore(pattern,right);
50 
51     int N = strlen(dest_str);
52     int M = strlen(pattern);
53 
54     int skip; //number to jump
55     for(int i=0;i<=N-M;i+=skip)
56  {
57    　　 skip = 0;
58         　for(int j=M-1;j>=0;j--)
59    　{
60            　　 if(pattern[j]!=dest_str[j+i])
61      　　{
62   　　          　　    skip = j-right[dest_str[i+j]];//calculate the step to jump
63           　　      　　if(skip<1)
64                   　　　　　　  skip = 1;
65                 　　　　break;
66       　　}
67    　}
68        　if(skip == 0)
69    　{
70        　　　   printf("Search finished successfully.\n");
71               　　return i;    
72    　}
73   }
74     printf("String cannot be found.\n");
75     return N;
76 }
77 
78 #ifdef _cplusplus
79 }
80 #endif

　　查找的最好情况时间复杂度约为O(N/M)(其实是查找完，但不匹配情况)，这里需要和O(M)比较下，如果M比较大的话N/M就比较小一些；在最坏情况下，需要比较N/M组，每组比较M次，所以时间复杂度为O(N)。其查找速度确实比KMP算法还要快。

posted @ 2013-09-08 03:27 XiaoH在博客园阅读(576) 评论(0) 编辑收藏举报

刷新页面返回顶部

XiaoH在博客园

Boyer-Moore字符串查找算法的实现

公告