字尾陣列求最長重複子串

阿新 • • 發佈：2018-11-11

分享一下我老師大神的人工智慧教程！零基礎，通俗易懂！http://blog.csdn.net/jiangjunshow

也歡迎大家轉載本篇文章。分享知識，造福人民，實現我們中華民族偉大復興！

                問題描述
給定一個字串，求出其最長重複子串
例如：abcdabcd
最長重複子串是 abcd，最長重複子串可以重疊
例如：abcdabcda，這時最長重複子串是 abcda，中間的 a 是被重疊的。

直觀的解法是，首先檢測長度為 n - 1 的字串情況，如果不存在重複則檢測 n - 2, 一直遞減下去，直到 1 。
這種方法的時間複雜度是 O(N * N * N)，其中包括三部分，長度緯度、根據長度檢測的字串數目、字串檢測。

改進的方法是利用字尾陣列
字尾陣列是一種資料結構，對一個字串生成相應的字尾陣列後，然後再排序，排完序依次檢測相鄰的兩個字串的開頭公共部分。
這樣的時間複雜度為：生成字尾陣列 O(N)，排序 O(NlogN*N) 最後面的 N 是因為字串比較也是 O(N)
依次檢測相鄰的兩個字串 O(N * N)，總的時間複雜度是 O(N^2*logN)，優於第一種方法的 O(N^3)

      對於類似從給定的文字中，查詢其中最長的重複子字串的問題，可以採用“字尾陣列”來高效地完成此任務。字尾陣列使用文字本身和n個附加指標（與文字陣列相應的指標陣列）來表示輸入文字中的n個字元的每個子字串。
    首先，如果輸入字串儲存在c[0..n-1]中，那麼就可以使用類似於下面的程式碼比較每對子字串：

int main(void){ int i , j , thislen , maxlen = -1; ...... ...... ...... for(i = 0 ; i < n ; ++i ) {  for(j = i+1 ; j < n ; ++j )  {   if((thislen = comlen(&c[i] , &c[j])) > maxlen)   {    maxlen = thislen;    maxi = i;    maxj = j;   }  } } ...... ...... ...... return 
 0;}

當作為comlen函式引數的兩個字串長度相等時，該函式便返回這個長度值，從第一個字元開始：

int comlen( char *p, char *q ){    int i = 0;    while( *p && (*p++ == *q++) )        ++i;    return i;}

由於該演算法檢視所有的字串對，所以它的時間和n的平方成正比。下面便是使用“字尾陣列”的解決辦法。
如果程式至多可以處理MAXN個字元，這些字元被儲存在陣列c中：

#define 
 MAXCHAR 5000 //最長處理5000個字元char c[MAXCHAR], *a[MAXCHAR];

在讀取輸入時，首先初始化a，這樣，每個元素就都指向輸入字串中的相應字元：

n = 0;while( (ch=getchar())!='\n' ){     a[n] = &c[n];     c[n++] = ch;}c[n]='\0';     // 將陣列c中的最後一個元素設為空字元，以終止所有字串

這樣，元素a[0]指向整個字串，下一個元素指向以第二個字元開始的陣列的字尾，等等。如若輸入字串為"banana",該陣列將表示這些字尾：
a[0]:banana
a[1]:anana
a[2]:nana
a[3]:ana
a[4]:na
a[5]:a
由於陣列a中的指標分別指向字串中的每個字尾，所以將陣列a命名為"字尾陣列"

第二、對字尾陣列進行快速排序，以將字尾相近的（變位詞）子串集中在一起
qsort(a, n, sizeof(char*), pstrcmp)後
a[0]:a
a[1]:ana
a[2]:anana
a[3]:banana
a[4]:na
a[5]:nana
第三、使用以下comlen函式對陣列進行掃描比較鄰接元素，以找出最長重複的字串：

for(i = 0 ; i < n-1 ; ++i ){        temp=comlen( a[i], a[i+1] );        if( temp>maxlen ) {              maxlen=temp;              maxi=i;        }}printf("%.*s\n",maxlen, a[maxi]);

完整的實現程式碼如下：

#include <iostream>using namespace std;#define MAXCHAR 5000 //最長處理5000個字元char c[MAXCHAR], *a[MAXCHAR];int comlen( char *p, char *q ){    int i = 0;    while( *p && (*p++ == *q++) )        ++i;    return i;}int pstrcmp( const void *p1, const void *p2 ){    return strcmp( *(char* const *)p1, *(char* const*)p2 );}int main(void){    char ch;    int  n=0;    int  i, temp;    int  maxlen=0, maxi=0;    printf("Please input your string:\n"); n = 0;    while( (ch=getchar())!='\n' ) {        a[n] = &c[n];        c[n++] = ch;    }    c[n]='\0';     // 將陣列c中的最後一個元素設為空字元，以終止所有字串    qsort( a, n, sizeof(char*), pstrcmp );    for(i = 0 ; i < n-1 ; ++i ) {        temp=comlen( a[i], a[i+1] );        if( temp>maxlen )  {            maxlen=temp;            maxi=i;        }    }    printf("%.*s\n",maxlen, a[maxi]);        return 0;}

方法二：KMP
通過使用next陣列的特性，同樣可以求最長重複子串，不過時間複雜度有點高挖。。

#include<iostream>using namespace std;const int MAX = 100000;int next[MAX];char str[MAX];void GetNext(char *t){ int len = strlen(t); next[0] = -1; int i = 0 , j = -1; while(i < len) {  if(j == -1 || t[i] == t[j])  {   i++;   j++;   if(t[i] != t[j])    next[i] = j;   else    next[i] = next[j];  }  else   j = next[j]; }}int main(void){ int i , j , index , len; cout<<"Please input your string:"<<endl; cin>>str; char *s = str; len = 0; for(i = 0 ; *s != '\0' ; s++ , ++i) {  GetNext(s);  for(j = 1 ; j <= strlen(s) ; ++j)  {   if(next[j] > len)   {    len = next[j];    index = i + j;    //index是第一個最長重複串在str中的位置    }  } } if(len > 0) {  for(i = index - len ; i < index ; ++i)   cout<<str[i];  cout<<endl; } else  cout<<"none"<<endl; return 0;}

題目描述：求最長不重複子串，如abcdefgegcsgcasse，最長不重複子串為abcdefg，長度為7

#include <iostream>#include <list>using namespace std;//思路：用一個數組儲存字元出現的次數。用i和j進行遍歷整個字串。//當某個字元沒有出現過，次數+1；出現字元已經出現過，次數+1，找到這個字元前面出現的位置的下一個位置，設為i//並將之前的那些字元次數都-1。繼續遍歷，直到'\0'int find(char str[],char *output){ int i = 0 , j = 0; int cnt[26] = {0}; int res = 0 , temp = 0; char *out = output; int final; while(str[j] != '\0') {  if(cnt[str[j]-'a'] == 0)  {   cnt[str[j]-'a']++;  }  else  {   cnt[str[j]-'a']++;   while(str[i] != str[j])   {     cnt[str[i]-'a']--;    i++;   }   cnt[str[i]-'a']--;   i++;  }   j++;  temp = j-i;  if(temp > res)  {   res = temp;   final = i;  } } //結果儲存在output裡面 for(i = 0 ; i < res ; ++i)  *out++ = str[final++]; *out = '\0'; return res;}int main(void){ char a[] = "abcdefg"; char b[100]; int max = find(a,b); cout<<b<<endl; cout<<max<<endl; return 0;}

給我老師的人工智慧教程打call！http://blog.csdn.net/jiangjunshow

字尾陣列求最長重複子串

給我老師的人工智慧教程打call！http://blog.csdn.net/jiangjunshow

字尾陣列求最長重複子串

718. Maximum Length of Repeated Subarray 字尾陣列解最長公共子串 O(n log^2 n)時間複雜度

求最長重複子串和最長不重複子串思路

[字尾陣列] 求最長不重疊重複子串 POJ

利用字尾陣列求最長的重複子串

[字尾陣列] 兩串求最長公共子串 POJ - 2774

poj 2774 字尾陣列求最長連續公共子串長度

尋找一個字串中的最長重複子串（字尾陣列）&找出一個字串中最長不重複子串

資料結構——求一個串中出現的第一個最長重複子串

求字串的最長重複子串

求一個字串當中的最長重複子串

求字串的最長重複子串（java）

串的定長順序儲存結構：求串s中出現的第一個最長重複子串及其位置

求給定字串中最長重複子串

求字串的最長重複子串——Java實現

求一個串中出現的第一個最長重複子串

java求最長公共子串的長度

【文文殿下】後綴自動機(SAM)求最長公共子串的方法

資料結構/最長重複子串

找出字串的最長重複子串

字尾陣列求最長重複子串

給我老師的人工智慧教程打call！http://blog.csdn.net/jiangjunshow

相關推薦