1. 程式人生 > >HDU 4644 BWT (KMP)

HDU 4644 BWT (KMP)

nds src 進行 esp line def cep oid amp

BWT

Time Limit: 12000/6000 MS (Java/Others) Memory Limit: 65535/32768 K (Java/Others)
Total Submission(s): 775 Accepted Submission(s): 242


Problem Description When the problem to match S string in T string is mentioned, people always put KMP, Aho-Corasick and Suffixarray forward. But Mr Liu tells Canoe that there is an algorithm called Burrows–Wheeler Transform(BWT) which is quite amazing and high-efficiency to solve the problem.
But how does BWT work to solve the matching S-in-T problem? Mr Liu tells Canoe the firstly three steps of it.
Firstly, we append the ‘$’ to the end of T and for convenience, we still call the new string T. And then for every suffix of T string which starts from i, we append the prefix of T string which ends at (i – 1) to its end. Secondly, we sort these new strings by the dictionary order. And we call the matrix formed by these sorted strings Burrows Wheeler Matrix. Thirdly, we pick characters of the last column to get a new string. And we call the string of the last column BWT(T). You can get more information from the example below.

技術分享


Then Mr Liu tells Canoe that we only need to save the BWT(T) to solve the matching problem. But how and can it? Mr Liu smiles and says yes. We can find whether S strings like “aac” are substring of T string like “acaacg” or not only knowing the BWT(T)! What an amazing algorithm BWT is! But Canoe is puzzled by the tricky method of matching S strings in T string. Would you please help Canoe to find the method of it? Given BWT(T) and S string, can you help Canoe to figure out whether S string is a substring of string T or not?

Input There are multiple test cases.
First Line: the BWT(T) string (1 <= length(BWT(T)) <= 100086).
Second Line: an integer n ( 1 <=n <= 10086) which is the number of S strings.
Then n lines comes.
There is a S string (n * length(S) will less than 2000000, and all characters of S are lowercase ) in every line.

Output For every S, if S string is substring of T string, then put out “YES” in a line. If S string is not a substring of T string, then put out “NO” in a line.

Sample Input gc$aaac 2 aac gc

Sample Output YES NO 分析: 我們可以想到將變化後的串,轉化為原串,然後進行KMP 轉化過程如下 先將變化為的串編號 gc$aaac 0123456 然後再字典序排序,排序的時候如果大小相同,那麽原來編號在前就排在前面 $aaaccg 2345160 再將排序後的字符將編號作為下標,跑一遍,比如一開始$的編號是2,那麽對應下標為2的字符是a,就有"a",a的編號是4,那麽對應下標為4的字符就是 c,就有"ac",c的編號為1,對應下標為1的字符a,就有"aca"; 到最後得到"acaacg"; 然後再進行KMP即可 代碼如下:
#include <cstdio>
#include <iostream>
#include <cstring>
#include <vector>
#include <algorithm>
using namespace std;
typedef long long ll;
struct node
{
    int id;
    char r;
}str[100186];
char s[100186];
char str2[100186];
char T[2000100];
int Next[2000100];
int tlen;
bool cmp(node x,node y)
{
    return x.r<y.r;
}
void getNext()
{
    int j, k;
    j = 0; k = -1; Next[0] = -1;
    while(j < tlen)
        if(k == -1 || T[j] == T[k])
            Next[++j] = ++k;
        else
            k = Next[k];

}

bool KMP_Index(char S[],int slen)
{
    int i = 0, j = 0;
    getNext();

    while(i < slen && j < tlen)
    {
        if(j == -1 || S[i] == T[j])
        {
            i++; j++;
        }
        else
            j = Next[j];
    }
    if(j == tlen)
        return true;
    else
        return false;
}
int main()
{
    int n;
    while(scanf("%s",s)!=EOF)
    {
        int len=strlen(s);
      for(int i=0;i<len;i++)
      {
        str[i].id=i;
        str[i].r=s[i];
      }
      stable_sort(str,str+len,cmp);
      int now=0;
      for(int i=0;i<len-1;i++)
      {
        now=str[now].id;
        str2[i]=str[now].r;
      }
      len=len-1;
      str2[len]=0;
      scanf("%d",&n);
      while(n--)
      {
          scanf("%s",T);
          tlen=strlen(T);
          getNext();
          if( KMP_Index(str2,len))puts("YES");
          else puts("NO");
      }
    }
    return 0;
}

HDU 4644 BWT (KMP)