1. 程式人生 > >字符串模式匹配算法 Sunday算法

字符串模式匹配算法 Sunday算法

http 模式串匹配 字符匹配 算法 每次 參考資料 const com 實現

  Sunday算法的思想類似於BM算法中的壞字符思想。差別在於Sunday算法在失配之後,是取目標串中當前和模式串匹配的部分後面一個位置的字符來做壞字符匹配。

  舉例:

  技術分享

  BM算法在b與x失配後,壞字符為b(下標1),在模式串中尋找b的位置,找到之後對齊並繼續匹配,見下圖:

  技術分享

  Sunday算法在失配後,取目標串中和模式串匹配部分後面的一個字符,也就是e,然後用e來做壞字符匹配。e在模式串中沒有,移動位置繼續匹配,見下圖:

  技術分享

  可以看出Sunday算法的位移比BM算法更大,所以Sunday算法的效率比BM算法更高。但是Sunday算法最壞的時間復雜度仍然是o(n*m)。考慮如下目標串:baaaabaaaabaaaabaaaa,在裏面搜索aaaaa,沒有匹配位置。如果用Sunday算法,壞字符大部分都是a,而模式串中又全部都是a,所以在大部分情況下,失配後模式串只能往右移動1位。而如果用改進的KMP算法,可以保證線性時間內匹配完。

  Sunday算法不要求固定地從左到右匹配或者從右到左匹配,因為失配後把目標串中後一個沒有匹配過的字符當作壞字符。可以先統計模式串中字符出現的概率,每次使用概率最小的字符所在的位置進行比較,失配的概率較大,可以減少比較次數,加快匹配速度。

  舉例:

  技術分享

  模式串中b只出現一次,a和c都出現了2次,所以先比較b所在的位置(只看模式串中的字符時,b失配的概率比較大)。

  Sunday算法最好情況下的時間復雜度是O(n),在匹配隨機字符串時效率比其他匹配算法快。
  

  C語言實現:

 1 #include <stdio.h>
 2 #include <string.h>
 3
4 bool BadChar(const char *pattern, int nLen, int *pArray, int nArrayLen) 5 { 6 if (nArrayLen < 256) 7 { 8 return false; 9 } 10 for (int i = 0; i < 256; i++) 11 { 12 pArray[i] = -1; 13 } 14 for (int i = 0; i < nLen; i++) 15 { 16 pArray[pattern[i]] = i;
17 } 18 return true; 19 } 20 21 int SundaySearch(const char *dest, int nDLen, 22 const char *pattern, int nPLen, 23 int *pArray) 24 { 25 if (0 == nPLen) 26 { 27 return -1; 28 } 29 for (int nBegin = 0; nBegin <= nDLen-nPLen; ) 30 { 31 int i = nBegin, j = 0; 32 for ( ;j < nPLen && i < nDLen && dest[i] == pattern[j];i++, j++); 33 if (j == nPLen) 34 { 35 return nBegin; 36 } 37 if (nBegin + nPLen > nDLen) 38 { 39 return -1; 40 } 41 else 42 { 43 nBegin += nPLen - pArray[dest[nBegin+nPLen]]; 44 } 45 } 46 return -1; 47 } 48 49 void TestSundaySearch() 50 { 51 int nFind; 52 int nBadArray[256] = {0}; 53 // 1 2 3 4 54 //0123456789012345678901234567890123456789012345678901234 55 const char dest[] = "abcxxxbaaaabaaaxbbaaabcdamno"; 56 const char pattern[][40] = { 57 "a", 58 "ab", 59 "abc", 60 "abcd", 61 "x", 62 "xx", 63 "xxx", 64 "ax", 65 "axb", 66 "xb", 67 "b", 68 "m", 69 "mn", 70 "mno", 71 "no", 72 "o", 73 "", 74 "aaabaaaab", 75 "baaaabaaa", 76 "aabaaaxbbaaabcd", 77 "abcxxxbaaaabaaaxbbaaabcdamno", 78 }; 79 80 for (int i = 0; i < sizeof(pattern)/sizeof(pattern[0]); i++) 81 { 82 BadChar(pattern[i], strlen(pattern[i]), nBadArray, 256); 83 nFind = SundaySearch(dest, strlen(dest), pattern[i], strlen(pattern[i]), nBadArray); 84 if (-1 != nFind) 85 { 86 printf("Found \"%s\" at %d \t%s\r\n", pattern[i], nFind, dest+nFind); 87 } 88 else 89 { 90 printf("Found \"%s\" no result.\r\n", pattern[i]); 91 } 92 93 }} 94 95 int main(int argc, char* argv[]) 96 { 97 TestSundaySearch(); 98 return 0; 99 }

  輸出結果:

 1 Found    "a" at 0       abcxxxbaaaabaaaxbbaaabcdamno
 2 Found    "ab" at 0      abcxxxbaaaabaaaxbbaaabcdamno
 3 Found    "abc" at 0     abcxxxbaaaabaaaxbbaaabcdamno
 4 Found    "abcd" at 20   abcdamno
 5 Found    "x" at 3       xxxbaaaabaaaxbbaaabcdamno
 6 Found    "xx" at 3      xxxbaaaabaaaxbbaaabcdamno
 7 Found    "xxx" at 3     xxxbaaaabaaaxbbaaabcdamno
 8 Found    "ax" at 14     axbbaaabcdamno
 9 Found    "axb" at 14    axbbaaabcdamno
10 Found    "xb" at 5      xbaaaabaaaxbbaaabcdamno
11 Found    "b" at 1       bcxxxbaaaabaaaxbbaaabcdamno
12 Found    "m" at 25      mno
13 Found    "mn" at 25     mno
14 Found    "mno" at 25    mno
15 Found    "no" at 26     no
16 Found    "o" at 27      o
17 Found    "" no result.
18 Found    "aaabaaaab" no result.
19 Found    "baaaabaaa" at 6       baaaabaaaxbbaaabcdamno
20 Found    "aabaaaxbbaaabcd" at 9         aabaaaxbbaaabcdamno
21 Found    "abcxxxbaaaabaaaxbbaaabcdamno" at 0    abcxxxbaaaabaaaxbbaaabcdamno

  參考資料

  【模式匹配】之 —— Sunday算法

字符串模式匹配算法 Sunday算法