1. 程式人生 > >POJ 1200 Crazy Search(雜湊演算法)【模板】

POJ 1200 Crazy Search(雜湊演算法)【模板】

Many people like to solve hard puzzles some of which may lead them to madness. One such puzzle could be finding a hidden prime number in a given text. Such number could be the number of different substrings of a given size that exist in the text. As you soon will discover, you really need the help of a computer and a good algorithm to solve such a puzzle. 
Your task is to write a program that given the size, N, of the substring, the number of different characters that may occur in the text, NC, and the text itself, determines the number of different substrings of size N that appear in the text. 

As an example, consider N=3, NC=4 and the text "daababac". The different substrings of size 3 that can be found in this text are: "daa"; "aab"; "aba"; "bab"; "bac". Therefore, the answer should be 5. 
Input The first line of input consists of two numbers, N and NC, separated by exactly one space. This is followed by the text where the search takes place. You may assume that the maximum number of substrings formed by the possible set of characters does not exceed 16 Millions. Output The program should output just an integer corresponding to the number of different substrings of size N found in the given text. Sample Input
3 4
daababac
Sample Output
5
Hint Huge input,scanf is recommended.

 【題解】

 題意很簡單,就是給定一個長最多為16000000的字串,其中字元型別有m種,問其中長度為n的相異子串的個數是多少。

 分析:

首先注意到,資料量很大,雖然網上說12000000的陣列也可以過,但是那也有1e8的資料,普通的方法過不了,所以必須想其他演算法,一開始我用的是map 鍵值對來做,但是很不幸,也超時了(稍後會附有程式碼,這也是一種方法嘛),最後思考良久,用hash試了試,果然過了,還很快,只有63ms,不得不說hash演算法很強,具體就是,把原串中的每個字元給它賦值,用數字來代替不同的字母,比如a可以用0表示,b可以用1表示,等等。

然後再遍歷長度為n的子串,把每個子串用剛才賦值的數字按10進位制或者m進位制轉化成一個數(其實就是把長度為n的那一小段字元表示成一個數),可以想象,只要子串不同,那表示出來的數字結果就一定不相同,這就把字串和數字構成了一一對應關係,進而也就能用不同的數字表示不同的子串,最後只要遍歷一下不同的數字有多少,就是答案了。

【鍵值對做法——TLE程式碼】

#include<iostream>
#include<map>
#include<string>
using namespace std;
map<string,int> Map;
string strText;
int N;
void Hash()
{
	int i;
	Map.clear();
	for(i=0;i<(int)strText.size()-N+1;++i)
	{
		string Temp(strText,i,N);
		Map[Temp]=i;//賦值操作只有佔位的功能
	}
	cout<<Map.size()<<endl;
}

int main()
{
	int T,NC;
	cin>>T;
	while(T--)
	{
		cin>>N>>NC;
		cin>>strText;
		Hash();
	}
	return 0;
}


【AC程式碼】

#include<iostream>
#include<cstdio>
#include<cstring>
#include<algorithm>
using namespace std;
const int N=16000005;
int m,n;
char str[N];
int hash[N];
int vis[500];

int main()
{
    while(~scanf("%d%d%s",&m,&n,str))
    {
        int num=0;
        int len=strlen(str);
        vis[0]=num++; //第一個字元編號為0
        for(int i=1;i<len;++i)//遍歷所有的字串
        {
            if(vis[str[i]]==0)//如果沒出現過
                vis[str[i]]=num++;;//就給它編號
        }
        int ans=0;
        for(int i=0;i<=len-m;++i)//遍歷長度為m的子串
        {
            int sum=0;
            for(int j=0;j<m;++j)
            {
                sum=sum*num+vis[str[i+j]];//字串轉化為數字
            }
            if(!hash[sum])//第一次出現該字串
            {
                hash[sum]=1;
                ans++;
            }
        }
        printf("%d\n",ans);
    }
    return 0;
}