最短編輯距離算法實現

阿新 • • 發佈：2017-09-02

編輯 length 一個 font then java實現 ron init system

一，算法介紹

在CS124課程的第一周提到求解兩個字符串相似度的算法---Minimum Edit Distance（最短編輯距離）算法。該算法在NLP（自然語言處理）中也會用到。

如何定義相似度呢？任給兩個字符串X 和Y，使用以下三種操作將字符串X 變到字符串Y ：①插入(Insert)操作；②刪除操作（delete）；③替換操作(substitute)

比如字符串X="intention" ，字符串Y="execution"。從字符串X 轉換成字符串Y 如下圖所示：

技術分享

定義：插入操作的代價為1，刪除操作的代價為1，替換操作的代價為2（稱為： Levenshtein distance）。那麽，"intention" 變成 "execution" 執行了三次替換，一次刪除，一次插入。因此，總代價為8

而這個代價又稱為編輯距離，用之來衡量兩個字符串的相似程度。顯然，若兩個字符串越相似，則從一個字符串變到另一個字符串所需要的 “操作” 步驟就越少。

二，動態規則求解最短編輯距離

為什麽能用動態規劃來求解呢？?該問題可以分解成若幹個子問題；?子問題之間具有重疊性（可“查表”），具體可參考一些動態規劃的示例1，示例2.

假設字符串X的長度為n，字符串Y的長度為m，用d[n][m] 表示字符串X 轉換成字符串Y 的最短編輯距離

定義 d[i][j] 表示字符串X的子串X[1...i] 轉換成字符串Y 的子串 Y[1...j] 的最短編輯距離（這裏的下標從1開始，不從0開始），有如下動態規劃公式：

技術分享

要想從長度為 i 的源字符串X 轉換成長度為 j 的目標字符串Y，有三種方式：

①先將源字符串X 的前 i-1 個字符 X[1...i-1] 轉換成目標字符串Y[1...j]，然後再刪除字符串X 的第 i 個字符source[i]

②先將源字符串X[1...j] 轉換成目標字符串Y[1...j-1] ，然後再插入字符串Y的第 j 個字符 target[j]

③先將源字符串X[1...i-1] 轉換成目標字符串Y[1...j-1]，然後源字符串中的第 i 個字符X[i] 替換為目標字符串的第 j 個字符 Y[j]

為什麽只有上述三種方式呢？

因為我們是將源問題的求解，分解成若幹個子問題的求解，子問題的規模比原問題要小1。源問題 X[1...i] 轉換成 Y[1...j] 。比如，子問題是：先將X[1...i-1] 轉換成 Y[1...j] ，...

結合前面定義的操作代價（刪除和插入操作代價為1，替換操作為2），就是下面這個公式：

技術分享

解釋一下為什麽 if source[i]=target[j]時，替換的代價為0呢？if source[i]=target[j] 表明字符串X 的第 i 個字符串和字符串Y的第 j 個字符是相同的

要想將 X[1...i] 轉換成 Y[1...j] ，對於第三種轉換方式：先將源字符串X[1...i-1] 轉換成目標字符串Y[1...j-1] ，既然：字符串X 的第 i 個字符串和字符串Y的第 j 個字符是相同的，那就相當於“自己替換自己”，或者說是不需要替換操作了嘛。這也是下面代碼實現邏輯：

                if (source.charAt(i-1) == target.charAt(j-1)) {
                    dp[i][j] = dp[i - 1][j - 1];

三，代碼實現

偽代碼描述如下：

技術分享

JAVA實現：

 1 public class MinimumEditDistance {
 2 
 3     public static void main(String[] args) {
 4         MinimumEditDistance med = new MinimumEditDistance();
 5         String source = "execution";
 6         String target = "intention";
 7         int result = med.similarDegree(source, target);
 8         System.out.println(result);
 9     }
10 
11     public int similarDegree(String source, String target) {
12         if(source == null || target == null)
13             throw new IllegalArgumentException("illegal input String");
14 
15         int sourceLen = source.length();
16         int targetLen = target.length();
17 
18         int[][] dp = new int[sourceLen + 1][targetLen +1];
19         //init
20         dp[0][0] = 0;
21         for(int i = 1; i <= sourceLen; i++)
22             dp[i][0] = i;
23         for(int i = 1; i <= targetLen; i++)
24             dp[0][i] = i;
25 
26         for(int i = 1; i <= sourceLen; i++) {
27             for(int j = 1; j <= targetLen; j++) {
28                 if (source.charAt(i-1) == target.charAt(j-1)) {
29                     dp[i][j] = dp[i - 1][j - 1];
30                 }else{
31                     int insert = dp[i][j - 1] + 1;//source[0,i] to target[0,j-1] then insert target[j]
32                     int delete = dp[i - 1][j] + 1;//source[0,i-1] to target[0,j] then delete source[i]
33                     int substitute = dp[i - 1][j - 1] + 2;//source[0,i-1] to target[0,j-1] then substitute(source[i] by target[j])
34 
35                     int min = min(insert, delete, substitute);
36                     dp[i][j] = min;
37                 }
38             }
39         }
40         return dp[sourceLen][targetLen];
41     }
42 
43     private int min(int insert, int delete, int substitute) {
44         int tmp = insert < delete ? insert:delete;
45         int min = tmp < substitute ? tmp:substitute;
46         return min;
47     }
48 }

參考：Stanford CS124課程

原文：http://www.cnblogs.com/hapjin/p/7467035.html

最短編輯距離算法實現

編輯 length 一個 font then java實現 ron init system 一，算法介紹在CS124課程的第一周提到求解兩個字符串相似度的算法---Minimum Edit Distance（最短編輯距離）算法。該算法在NLP（自然語言處理）中也會用到。

最短編輯距離算法實現

最短編輯距離算法實現

最短路徑-Dijkstra算法（轉載）

最短路徑-Floyd算法（轉載）

最短路徑(Dijkstra算法)

有向網絡（帶權的有向圖）的最短路徑Dijkstra算法

最短路徑-Dijkstra算法與Floyd算法

單源最短路徑---Dijkstra算法

編輯距離算法

洛谷P3371單源最短路徑SPFA算法

最短路徑-SPFA算法

最短路徑-floyd算法

最短編輯距離問題： Levenshtein Distance

最短編輯距離演算法（字串比較）

動態規劃---最短編輯距離

最短路徑—Dijkstra算法

最短編輯距離（Edit Distance）【DP】

最小編輯距離及其C++實現

兩個字串之間的最短編輯距離

數據結構最短路徑 Floyd 算法

（5千字）由淺入深講解動態規劃(JS版)-鋼條切割，最大公共子序列，最短編輯距離

最短編輯距離算法實現

相關推薦