1. 程式人生 > >AC自動機演算法筆記

AC自動機演算法筆記

  AC演算法是Alfred V.Aho(《編譯原理》(龍書)的作者),和Margaret J.Corasick於1974年提出(與KMP演算法同年)的一個經典的多模式匹配演算法,可以保證對於給定的長度為n的文字,和模式集合P{p1,p2,...pm},在O(n)時間複雜度內,找到文字中的所有目標模式,而與模式集合的規模m無關.
  AC演算法從某種程度上可以說是KMP演算法在多模式環境下的擴充套件。

KMP 演算法簡述

  對於模式串而言,其字首,有可能也是模式串中的非字首的子串,而且這裡找的是最大字首,非字首可能包含多個字首
  在KMP演算法中有個陣列,叫做字首陣列,也有的叫next陣列,發現不匹配,下一步模式(pattern)串匹配目標(target)串的模式串的位置,它記錄著字串匹配過程中失配情況下,模式串可以向前跳幾個字元,當然它描述的也是子串的對稱程度,程度越高,值越大,當然之前可能出現再匹配的機會就更大。

示例1

序號 0 1 2 3 4 5 6 7 8 9
pattern a b c a b c a c a b
next 0 0 0 1 2 3 4 0 1 2

示例2

序號 0 1 2 3 4 5 6 7 8 9 10 11 12 13 14 15
pattern a g c t a g c a g c t a g c t g
next 0 0 0 0 1 2 3 1 2 3 4 5 6 7 4 0

示例2中,a g c t a g c,包含兩個字首。對於t,其next一定小於其前面c的next。

AC自動機演算法

  AC are determined by three functions:goto function ,failure function,output function

Keyword Tree

A keyword tree (or a trie ) for a set of patterns P

is a rooted tree K such that

  1. each edge of K is labeled by a char acter
  2. any two edges out of a node have diferent labels
    Define the label of a node v as the concatenation of edge labels on the path from the root to v , and denote it by L(v)
  3. for each pP there’s a node v with L(v)=P , and
  4. the label L(v) of any leaf v equals some pP

A keyword tree for P={he,she,his,hers}
這裡寫圖片描述

goto function

States: nodes of the keyword tree
initial state: 0 = the root
the goto function g(q;a)gives the state entered from current state q by matching target char a

  1. if edge (q;v)is labeled by a, then g(q;a)=v;
  2. g(0;a)=0 for each a that does not label an edge out of the root the automaton stays at the initial state while scanning non-matching characters
  3. Otherwise g(q;a)=;

failure function

the failure function f(q) for q0 gives the state entered at a mismatch
f(q) is the node labeled by the longest proper suffix w of L(q) s.t.w is a prefix of some pattern,a fail transition does not miss any potential occurrences

f(q) is always defined, since L(0)=ϵ is a prefix of any pattern

Dashed arrows are fail transitions
這裡寫圖片描述

q 1 2 3 4 5 6 7 8 9
P h- e- s- h- e- i- s- r- s-
f(q) 0 0 0 1 2 0 3 0 3

output function

the output function out(q) gives the set of patterns recognized when entering state q

q out(q)
2 {he}
5 {she,he}
7 {his}
9 {hers}