演算法#15--子字串查詢演算法彙總和程式碼詳解

阿新 • • 發佈：2019-01-07

1.演算法彙總

首先，來看一張彙總表，本文會將表裡的每種演算法作詳細介紹。程式碼和邏輯比較長，可以根據目錄跳著看。

2.暴力演算法

在文字中可能出現匹配的任何地方都檢查是否存在。原理很簡單，直接看程式碼就可以懂。


//暴力子字串查詢
public class ViolenceSubStringSearch 
{
    @SuppressWarnings("unused")
    public static int search(String pat, String txt)
    {
        int M = pat.length();
        int 
 N = txt.length();
        for(int i = 0; i <= N-M; i++)
        {
            int j;
            for(j = 0; j < M; j++)
            {
                if(txt.charAt(i + j) != pat.charAt(j));
                break;
            }
            if(j == M)
            {
                return i;   //找到匹配 

            }
        }
        return N;           //未找到匹配
    }
}

執行軌跡：

3.KMP演算法

KMP演算法的基本思想是當出現不匹配是，就能知曉一部分文字的內容（因為在匹配失敗之前它們已經和模式匹配）。我們可以利用這些資訊避免將指標回退到所有這些已知的字元之前。

KMP的主要思想是提前判斷如何重新開始查詢，而這種判斷只取決於模式本身。

在KMP子字串查詢演算法中，不會回退文字指標i，而是使用一個數組dfa[][]來記錄匹配失敗時模式指標j應該回退多遠。dfa[][]稱為確定有限狀態自動機（DFA）。

如何構造dfa，

即DFA應該如何處理下一個字元?

和回退是的處理方式相同，除非在pat.charAt(j)處匹配成功，這時DFA應該前進到狀態j+1.例如，對於ABABAC，要判斷在j=5時匹配失敗後DFA應該怎麼做。通過DFA可以知道完全回退之後演算法會掃描BABA併到達狀態3，因此可以將dfa[][3]複製到dfa[][5]並將C所對飲的元素的值設為6.因為在計算DFA的地j個狀態時只需要知道DFA是如何處理前j-1個字元的，所以總能從尚不完整的DFA中得到所需的資訊。

最後一個關鍵的細節，如何維護重啟位置X，因為X< j，所以可以由已經構造的DFA部分來完成這個任務–X的下一個值是dfa[pat.charAt(j)][X].

總結下，對於每個j，DFA會：

將dfa[][X]複製到dfa[][j]（對於失敗的情況）
將dfa[pat.charAt(j)][j]設為j+1（對於匹配成功的情況）
更新X。

如下圖：

實現程式碼


//KMP子字串查詢
public class KMP 
{
    private final int R;       // the radix
    private int[][] dfa;       // the KMP automoton

    private char[] pattern;    // either the character array for the pattern
    private String pat;        // or the pattern string

    /**
     * Preprocesses the pattern string.
     *
     * @param pat the pattern string
     */
    public KMP(String pat) 
    {
        this.R = 256;
        this.pat = pat;

        // build DFA from pattern
        int m = pat.length();
        dfa = new int[R][m]; 
        dfa[pat.charAt(0)][0] = 1; 
        for (int x = 0, j = 1; j < m; j++) 
        {
            for (int c = 0; c < R; c++) 
            {
                dfa[c][j] = dfa[c][x];     // Copy mismatch cases. 
            }
            dfa[pat.charAt(j)][j] = j+1;   // Set match case. 
            x = dfa[pat.charAt(j)][x];     // Update restart state. 
        } 
    } 

    /**
     * Preprocesses the pattern string.
     *
     * @param pattern the pattern string
     * @param R the alphabet size
     */
    public KMP(char[] pattern, int R) 
    {
        this.R = R;
        this.pattern = new char[pattern.length];
        for (int j = 0; j < pattern.length; j++)
        {
            this.pattern[j] = pattern[j];
        }

        // build DFA from pattern
        int m = pattern.length;
        dfa = new int[R][m]; 
        dfa[pattern[0]][0] = 1; 
        for (int x = 0, j = 1; j < m; j++) 
        {
            for (int c = 0; c < R; c++) 
            {
                dfa[c][j] = dfa[c][x];     // Copy mismatch cases. 
            }
            dfa[pattern[j]][j] = j+1;      // Set match case. 
            x = dfa[pattern[j]][x];        // Update restart state. 
        } 
    } 

    /**
     * Returns the index of the first occurrrence of the pattern string
     * in the text string.
     *
     * @param  txt the text string
     * @return the index of the first occurrence of the pattern string
     *         in the text string; N if no such match
     */
    public int search(String txt) 
    {
        // simulate operation of DFA on text
        int m = pat.length();
        int n = txt.length();
        int i, j;
        for (i = 0, j = 0; i < n && j < m; i++) 
        {
            j = dfa[txt.charAt(i)][j];
        }
        if (j == m) return i - m;    // found
        return n;                    // not found
    }

    /**
     * Returns the index of the first occurrrence of the pattern string
     * in the text string.
     *
     * @param  text the text string
     * @return the index of the first occurrence of the pattern string
     *         in the text string; N if no such match
     */
    public int search(char[] text) 
    {
        // simulate operation of DFA on text
        int m = pattern.length;
        int n = text.length;
        int i, j;
        for (i = 0, j = 0; i < n && j < m; i++) 
        {
            j = dfa[text[i]][j];
        }
        if (j == m) return i - m;    // found
        return n;                    // not found
    }


    /** 
     * Takes a pattern string and an input string as command-line arguments;
     * searches for the pattern string in the text string; and prints
     * the first occurrence of the pattern string in the text string.
     *
     * @param args the command-line arguments
     */
    public static void main(String[] args) 
    {
        String pat = "AACAA";
        String txt = "AABRAACADABRAACAADABRA";
        char[] pattern = pat.toCharArray();
        char[] text    = txt.toCharArray();

        KMP kmp1 = new KMP(pat);
        int offset1 = kmp1.search(txt);

        KMP kmp2 = new KMP(pattern, 256);
        int offset2 = kmp2.search(text);

        // print results
        System.out.println("text:    " + txt);

        System.out.print("pattern: ");
        for (int i = 0; i < offset1; i++)
            System.out.print(" ");
        System.out.println(pat);

        System.out.print("pattern: ");
        for (int i = 0; i < offset2; i++)
            System.out.print(" ");
        System.out.println(pat);
    }
}

輸出：

text:    AABRAACADABRAACAADABRA
pattern:             AACAA
pattern:             AACAA

4.BoyerMoore演算法

從右往左掃描，跳躍式匹配。用right[]來記錄跳躍表，它等於字元出現在模式中的位置，沒出現賦值為-1.

對於匹配失敗，有如下三種情況：

造成匹配失敗的字元不包含在模式字串中，將模式字串向右移動j+1個位置（即將i增加j+1）。
造成匹配失敗的字元包含在模式字串中，就可以用right[]陣列來講模式字串和文字對其，使得該字元和它在模式字串中出現的最右位置相匹配。
如果這種方式無法增大i，那就直接將i+1來保證模式字串至少向右移動了一個位置。

實現程式碼：

//BoyerMoore字串匹配演算法（啟發式地處理不匹配的字元）
public class BoyerMoore 
{
    private final int R;     // the radix
    private int[] right;     // the bad-character skip array

    private char[] pattern;  // store the pattern as a character array
    private String pat;      // or as a string

    /**
     * Preprocesses the pattern string.
     *
     * @param pat the pattern string
     */
    public BoyerMoore(String pat) 
    {
        this.R = 256;
        this.pat = pat;

        // position of rightmost occurrence of c in the pattern
        right = new int[R];
        for (int c = 0; c < R; c++)
        {
            right[c] = -1;          //不包含在模式字串中的字元的值為-1
        }
        for (int j = 0; j < pat.length(); j++)
        {//包含在模式字串中的字元的值為它在其中出現的最右位置
            right[pat.charAt(j)] = j;
        }
    }

    /**
     * Preprocesses the pattern string.
     *
     * @param pattern the pattern string
     * @param R the alphabet size
     */
    public BoyerMoore(char[] pattern, int R) 
    {
        this.R = R;
        this.pattern = new char[pattern.length];
        for (int j = 0; j < pattern.length; j++)
        {
            this.pattern[j] = pattern[j];
        }

        // position of rightmost occurrence of c in the pattern
        right = new int[R];
        for (int c = 0; c < R; c++)
        {
            right[c] = -1;
        }
        for (int j = 0; j < pattern.length; j++)
        {
            right[pattern[j]] = j;
        }
    }

    /**
     * Returns the index of the first occurrrence of the pattern string
     * in the text string.
     *
     * @param  txt the text string
     * @return the index of the first occurrence of the pattern string
     *         in the text string; n if no such match
     */
    public int search(String txt) 
    {
        int m = pat.length();
        int n = txt.length();
        int skip;
        for (int i = 0; i <= n - m; i += skip) 
        {
            skip = 0;
            for (int j = m-1; j >= 0; j--) 
            {
                if (pat.charAt(j) != txt.charAt(i+j)) 
                {
                    skip = Math.max(1, j - right[txt.charAt(i+j)]);
                    break;
                }
            }
            if (skip == 0) return i;    // found
        }
        return n;                       // not found
    }


    /**
     * Returns the index of the first occurrrence of the pattern string
     * in the text string.
     *
     * @param  text the text string
     * @return the index of the first occurrence of the pattern string
     *         in the text string; n if no such match
     */
    public int search(char[] text) 
    {
        int m = pattern.length;
        int n = text.length;
        int skip;
        for (int i = 0; i <= n - m; i += skip) 
        {
            skip = 0;
            for (int j = m-1; j >= 0; j--) 
            {
                if (pattern[j] != text[i+j]) 
                {
                    skip = Math.max(1, j - right[text[i+j]]);
                    break;
                }
            }
            if (skip == 0) return i;    // found
        }
        return n;                       // not found
    }


    /**
     * Takes a pattern string and an input string as command-line arguments;
     * searches for the pattern string in the text string; and prints
     * the first occurrence of the pattern string in the text string.
     *
     * @param args the command-line arguments
     */
    public static void main(String[] args) 
    {
        String pat = "AACAA";
        String txt = "AABRAACADABRAACAADABRA";
        char[] pattern = pat.toCharArray();
        char[] text    = txt.toCharArray();

        BoyerMoore boyermoore1 = new BoyerMoore(pat);
        BoyerMoore boyermoore2 = new BoyerMoore(pattern, 256);
        int offset1 = boyermoore1.search(txt);
        int offset2 = boyermoore2.search(text);

        // print results
        System.out.println("text:    " + txt);

        System.out.print("pattern: ");
        for (int i = 0; i < offset1; i++)
            System.out.print(" ");
        System.out.println(pat);

        System.out.print("pattern: ");
        for (int i = 0; i < offset2; i++)
            System.out.print(" ");
        System.out.println(pat);
    }
}

輸出：

text:    AABRAACADABRAACAADABRA
pattern:             AACAA
pattern:             AACAA

5.RabinKarp演算法

計算模式字串的雜湊函式，然後用相同的雜湊函式計算文字中所有可能的M個字元的子字串雜湊值並尋找匹配。

實現程式碼：


import java.math.BigInteger;
import java.util.Random;
//RabinKarp指紋字串查詢演算法
public class RabinKarp 
{
    private String pat;      // the pattern  // needed only for Las Vegas
    private long patHash;    // pattern hash value
    private int m;           // pattern length
    private long q;          // a large prime, small enough to avoid long overflow
    private int R;           // radix
    private long RM;         // R^(M-1) % Q

    /**
     * Preprocesses the pattern string.
     *
     * @param pattern the pattern string
     * @param R the alphabet size
     */
    public RabinKarp(char[] pattern, int R) 
    {
        throw new UnsupportedOperationException("Operation not supported yet");
    }

    /**
     * Preprocesses the pattern string.
     *
     * @param pat the pattern string
     */
    public RabinKarp(String pat) 
    {
        this.pat = pat;      // save pattern (needed only for Las Vegas)
        R = 256;
        m = pat.length();
        q = longRandomPrime();

        // precompute R^(m-1) % q for use in removing leading digit
        RM = 1;
        for (int i = 1; i <= m-1; i++)
        {
            RM = (R * RM) % q;
        }
        patHash = hash(pat, m);
    } 

    // Compute hash for key[0..m-1]. 
    private long hash(String key, int m) 
    { 
        long h = 0; 
        for (int j = 0; j < m; j++) 
        {
            h = (R * h + key.charAt(j)) % q;
        }
        return h;
    }

    // Las Vegas version: does pat[] match txt[i..i-m+1] ?
    private boolean check(String txt, int i) 
    {
        for (int j = 0; j < m; j++) 
        {
            if (pat.charAt(j) != txt.charAt(i + j)) 
            {
                return false;
            }
        }
        return true;
    }

    // Monte Carlo version: always return true
    @SuppressWarnings("unused")
    private boolean check(int i) 
    {
        return true;
    }

    /**
     * Returns the index of the first occurrrence of the pattern string
     * in the text string.
     *
     * @param  txt the text string
     * @return the index of the first occurrence of the pattern string
     *         in the text string; n if no such match
     */
    public int search(String txt) 
    {
        int n = txt.length(); 
        if (n < m) return n;
        long txtHash = hash(txt, m); 

        // check for match at offset 0
        if ((patHash == txtHash) && check(txt, 0))
        {
            return 0;
        }

        // check for hash match; if hash match, check for exact match
        for (int i = m; i < n; i++) 
        {
            // Remove leading digit, add trailing digit, check for match. 
            txtHash = (txtHash + q - RM*txt.charAt(i-m) % q) % q; 
            txtHash = (txtHash*R + txt.charAt(i)) % q; 

            // match
            int offset = i - m + 1;
            if ((patHash == txtHash) && check(txt, offset))
            {
                return offset;
            }
        }

        // no match
        return n;
    }


    // a random 31-bit prime
    private static long longRandomPrime() 
    {
        BigInteger prime = BigInteger.probablePrime(31, new Random());
        return prime.longValue();
    }

    /** 
     * Takes a pattern string and an input string as command-line arguments;
     * searches for the pattern string in the text string; and prints
     * the first occurrence of the pattern string in the text string.
     *
     * @param args the command-line arguments
     */
    public static void main(String[] args) 
    {
        String pat = "AACAA";
        String txt = "AABRAACADABRAACAADABRA";

        RabinKarp searcher = new RabinKarp(pat);
        int offset = searcher.search(txt);

        // print results
        System.out.println("text:    " + txt);

        // from brute force search method 1
        System.out.print("pattern: ");
        for (int i = 0; i < offset; i++)
            System.out.print(" ");
        System.out.println(pat);
    }
}

演算法#15--子字串查詢演算法彙總和程式碼詳解

1.演算法彙總首先，來看一張彙總表，本文會將表裡的每種演算法作詳細介紹。程式碼和邏輯比較長，可以根據目錄跳著看。 2.暴力演算法在文字中可能出現匹配的任何地方都檢查是否存在。原理很簡單，直接看程式碼就可以懂。實現程式碼： //暴力子

隱馬爾科夫演算法之實現簡易版的拼音輸入法程式碼詳解

這段時間瞭解了隱馬爾科夫演算法，然後拼音輸入法的核心就是HMM，然後從github上找了一個輸入法實現的程式碼來更透徹的理解演算法，本文程式碼來源：https://github.com/LiuRoy/Pinyin_Demo，如果侵權，請聯絡我刪除！！! 一、拼音輸入法的原理概述 1.主要原

java 計算1-100所有奇數和(程式碼詳解)

/* 要求: 計算1-100之間的所有的奇數和 1+3+5+7...+99 有1個數據,從0變到100 迴圈 int =0 <= 100 ++ 從0-100,範圍內,找到奇數數%2==1 奇數所有的奇數求和計算需要變數,儲存奇數的

子字串查詢之————關於KMP演算法你不知道的事

寫在前面：（閱讀本文前需要了解KMP演算法的基本思路。另外，本著大道至簡的思想，本文的所有例子都會做從頭到尾的講解）作者翻閱了大量網上現有的KMP演算法部落格，發現廣為流傳的竟然是一種不完整的KMP演算法。即通過next陣列來作為有限狀態自動機，以此實現非匹配時的回退。這不失為一種好的

演算法求子陣列的最大和 C

分享一下我老師大神的人工智慧教程！零基礎，通俗易懂！http://blog.csdn.net/jiangjunshow 也歡迎大家轉載本篇文章。分享知識，造福人民，實現我們中華民族偉大復興！

資料結構筆記：KMP子串查詢演算法

發現 -匹配失敗時的右移位數與子串本身相關，與目標串無關 -移動位數=已匹配的字元數-對應的部分匹配值 -任意子串都穿在一個唯一的部位匹配表字首 -除了最後一個字元以外，一個字串的全部頭部組合字尾 -出了第一個字元以外，一個字串的全部尾部組合部分匹配值 -字

kmp字串查詢演算法

kmp字串查詢演算法 1 普通的字串查詢普通的字串查詢是遍歷被查詢的字串，然後和key字串進行匹配，如果不一致，則，被查詢的字串+1，繼續向下遍歷。程式碼如下： private static void search(String str, String key) {

演算法 - 求子陣列的最大和（C++）

//**************************************************************************************************** // // 求子陣列的最大和 - C++ - by Chimomo // //

資料結構開發(14)：KMP 子串查詢演算法

0.目錄 1.KMP 子串查詢演算法 2.KMP 演算法的應用 3.小結 1.KMP 子串查詢演算法問題：如何在目標字串S中，查詢是否存在子串P？樸素解法：樸素解法的一個優化線索：示例：偉大的發現：匹配失敗時的右移位數與子串本身相關，與目標串無關移動位數 =

演算法求子陣列的最大和 C

//****************************************************************************************************//// 求子陣列的最大和 - C++ - by Chimomo////

系統技術非業餘研究 » fastsearch快速字串查詢演算法

最近在做一個專案需要涉及到快速的字串匹配，每秒幾十萬次的那種。之前我用過linux核心的的textsearch庫的KMP,BM,FSM的演算法覺得還不錯，這幾個演算法用於Linux網路模組的關鍵詞過濾系統，支援非線性的字元查詢，但是對效能還是不夠印象深刻。於是我想起了python的fastsear

C/C++庫函式strstr和find實現子字串查詢

1 子字串查詢實現Demo #include<iostream> #include<string> #include<cstring> using namesp

字串查詢演算法總結及MS的strstr原始碼

http://www.cnblogs.com/ziwuge/archive/2011/12/09/2281455.html 首先

資料結構和演算法(Golang實現)(30)查詢演算法-2-3-4樹和普通紅黑樹

文章首發於閱讀更友好的GitBook。 2-3-4樹和普通紅黑樹某些教程不區分普通紅黑樹和左傾紅黑樹的區別，直接將左傾紅黑樹拿來教學，並且稱其為紅黑樹，因為左傾紅黑樹與普通的紅黑樹相比，實現起來較為簡單，容易教學。在這裡，我們區分開左傾紅黑樹和普通紅黑樹。紅黑樹是一種近似平衡的二叉查詢樹，從2-3樹或2