1. 程式人生 > >Long Long Message (字尾陣列)

Long Long Message (字尾陣列)

The little cat is majoring in physics in the capital of Byterland. A piece of sad news comes to him these days: his mother is getting ill. Being worried about spending so much on railway tickets (Byterland is such a big country, and he has to spend 16 shours on train to his hometown), he decided only to send SMS with his mother. 

The little cat lives in an unrich family, so he frequently comes to the mobile service center, to check how much money he has spent on SMS. Yesterday, the computer of service center was broken, and printed two very long messages. The brilliant little cat soon found out: 

1. All characters in messages are lowercase Latin letters, without punctuations and spaces. 
2. All SMS has been appended to each other – (i+1)-th SMS comes directly after the i-th one – that is why those two messages are quite long. 
3. His own SMS has been appended together, but possibly a great many redundancy characters appear leftwards and rightwards due to the broken computer. 
E.g: if his SMS is “motheriloveyou”, either long message printed by that machine, would possibly be one of “hahamotheriloveyou”, “motheriloveyoureally”, “motheriloveyouornot”, “bbbmotheriloveyouaaa”, etc. 
4. For these broken issues, the little cat has printed his original text twice (so there appears two very long messages). Even though the original text remains the same in two printed messages, the redundancy characters on both sides would be possibly different. 

You are given those two very long messages, and you have to output the length of the longest possible original text written by the little cat. 

Background: 
The SMS in Byterland mobile service are charging in dollars-per-byte. That is why the little cat is worrying about how long could the longest original text be. 

Why ask you to write a program? There are four resions: 
1. The little cat is so busy these days with physics lessons; 
2. The little cat wants to keep what he said to his mother seceret; 
3. POJ is such a great Online Judge; 
4. The little cat wants to earn some money from POJ, and try to persuade his mother to see the doctor :( 

Input

Two strings with lowercase letters on two of the input lines individually. Number of characters in each one will never exceed 100000.

Output

A single line with a single integer number – what is the maximum length of the original text written by the little cat.

Sample Input

yeshowmuchiloveyoumydearmotherreallyicannotbelieveit
yeaphowmuchiloveyoumydearmother

Sample Output

27

題意:求兩個字串的最長公共連續字串的長度。

思路: 因為是求連續的公共子字串,那麼我們可以將兩個字串拼接成一個字串,通過後綴字串的性質(任何字尾字串的字首都是原字串的字串)我們就可以找到最長的重複字串了。確保找到的兩個字串一個是第一個字串的字串,另一個數第二個字串的字串。

技巧:字尾陣列模板Orz...   特殊處理:我們在拼接字串的時候,可以在第一個字串拼接完成後再加一個最大字元(127),為的只是在排序後,更方便求最長公共子字串,作用如下:

程式碼如下:

/*
    求兩個字串的最長連續公共子字串
    思路:將兩個字串拼接(len=l1+l2),然後求字尾陣列。每個字尾陣列的字首便是一個子串。
    因此我們可以在(0->l1) (l1->len)的範圍內找出最長公共字串
*/
#include<cstdio>
#include<cstring>
#include<algorithm>
#define N 200010
#define inf 0x3f3f3f3f
using namespace std;
char str[N],s[N],t[N];
int wa[N],wb[N],wv[N],ws[N];
int cmp(int *r,int a,int b,int l)
{
    return r[a]==r[b]&&r[a+l]==r[b+l];
}
void da(const char r[],int sa[],int n,int m)
{
    int i,j,p,k;
    int *x=wa,*y=wb,*t;
    for(i=0; i<m; i++)ws[i]=0;
    for(i=0; i<n; i++)ws[x[i]=r[i]]++;
    for(i=1; i<m; i++)ws[i]+=ws[i-1];
    for(i=n-1; i>=0; i--)sa[--ws[x[i]]]=i;

    for(j=1,p=1; p<n; j*=2,m=p)
    {
        for(p=0,i=n-j; i<n; i++)y[p++]=i;
        for(i=0; i<n; i++)if(sa[i]>=j)y[p++]=sa[i]-j;
        for(i=0; i<n; i++)wv[i]=x[y[i]];
        for(i=0; i<m; i++)ws[i]=0;
        for(i=0; i<n; i++)ws[wv[i]]++;
        for(i=1; i<m; i++)ws[i]+=ws[i-1];
        for(i=n-1; i>=0; i--)sa[--ws[wv[i]]]=y[i];
        t=x;x=y;y=t;
        for(p=1,x[sa[0]]=0,i=1; i<n; i++)
            x[sa[i]]=cmp(y,sa[i-1],sa[i],j)?p-1:p++;
    }
    return;
}
int sa[N],Rank[N],height[N];
void calheight(const char r[],int sa[],int n)
{
    int i,j,k=0;
    for(i=1; i<=n; i++) Rank[sa[i]]=i;
    for(i=0; i<n; height[Rank[i++]]=k)
        for(k?k--:0,j=sa[Rank[i]-1]; r[i+k]==r[j+k]; k++);
    for(i=n; i>=1; --i) ++sa[i],Rank[i]=Rank[i-1];
}
int main()
{
    int n;
    while(~scanf("%s%s",s,t))
    {
        int l1=strlen(s);
        int l2=strlen(t);
        int len=0;
        for(int i=0; i<l1; i++)
            str[len++]=s[i];
        str[len++]=125;   //在第一個字串後面加一個最大的值,成為兩個字串的分界點。優化了後序的排序
        for(int i=0; i<l2; i++)
            str[len++]=t[i];
        str[len]=0;
        da(str,sa,len+1,130);
        calheight(str,sa,len);
        /*   //字尾陣列模板,方便觀察各種值
        puts("--------------All Suffix--------------");
        for(int i=1; i<=len; ++i)
        {
            printf("%d:\t",i);
            for(int j=i-1; j<len; ++j)
                printf("%c",str[j]);
            puts("");
        }
        puts("");
        puts("-------------After sort---------------");
        for(int i=1; i<=len; ++i)
        {
            printf("sa[%2d ] = %2d\t",i,sa[i]);
            for(int j=sa[i]-1; j<len; ++j)
                printf("%c",str[j]);
            puts("");
        }
        puts("");
        puts("---------------Height-----------------");
        for(int i=1; i<=len; ++i)
            printf("height[%2d ]=%2d \n",i,height[i]);
        puts("");
        puts("----------------Rank------------------");
        for(int i=1; i<=len; ++i)
            printf("Rank[%2d ] = %2d\n",i,Rank[i]);
        puts("------------------END-----------------");
        */
        int maxl=0;
        for(int i=2; i<len; i++)
        {
            if(height[i]>maxl)
            {
                int a=sa[i-1];
                int b=sa[i];
                if(a>0&&a<=l1&&b>l1)
                {
                    if(height[i]>maxl)maxl=height[i];
                }
                else if(b>0&&b<=l1&&a>l1)
                {
                    if(height[i]>maxl)maxl=height[i];
                }
            }
        }
        printf("%d\n",maxl);
    }
}