1. 程式人生 > >MATLAB中實現編輯距離並求相似度

MATLAB中實現編輯距離並求相似度

編輯距離,又稱Levenshtein距離,是指兩個字串之間,由一個轉成另一個所需的最少編輯操作次數。許可的編輯操作包括將一個字元替換成另一個字元,插入一個字元,刪除一個字元。

好像R2018a已經集成了編輯距離的API ,但是沒有安裝2018a,

dist = edr(x,y,tol)[dist,ix,iy] = edr(x,y,tol)[___] = edr(x,y,maxsamp)[___] = edr(___,metric)edr(___)

所以沒辦法用,只能用手寫的了。

程式碼如下:

function [V,m,n] = EditDist(string1,string2)
% Edit Distance is a standard Dynamic Programming problem. Given two strings s1 and s2, the edit distance between s1 and s2 is the minimum number of operations required to convert string s1 to s2. The following operations are typically used:
% Replacing one character of string by another character.
% Deleting a character from string
% Adding a character to string
% Example:
% s1='article'
% s2='ardipo'
% EditDistance(s1,s2)
% > 4
% you need to do 4 actions to convert s1 to s2
% replace(t,d) , replace(c,p) , replace(l,o) , delete(e)
% using the other output, you can see the matrix solution to this problem
%
%
% by : Reza Ahmadzadeh (
[email protected]
- [email protected]) % 14-11-2012 m=length(string1); n=length(string2); v=zeros(m+1,n+1); for i=1:1:m v(i+1,1)=i; end for j=1:1:n v(1,j+1)=j; end for i=1:m for j=1:n if (string1(i) == string2(j)) v(i+1,j+1)=v(i,j); else v(i+1,j+1)=1+min(min(v(i+1,j),v(i,j+1)),v(i,j)); end end end V=v(m+1,n+1); end

怎麼根據得到的最小編輯距離,求兩個字串的相似度呢?

[mindist m n]=EditDist(final_code,final_code2);
fprintf('the similarity is : %d\n',1-mindist/max(m,n ))