简单实现计算Edit Distance算法

最近因为工作需要,学习了NLP的相关知识,简单动手实现了一下计算Edit Distance的算法,就是计算一个字符串要变成另一个字符串需要的代价,这其中采用Levenshtein方式,即规定一个插入和一个删除的代价是1,一次替换的代价是2.

简单的逻辑:

对于长度为M的字符串X,长度为N的字符串Y,

Initialization:

  D(i,0)=i

  D(0,j)=j

Recurrence Relation:

  for each i=1...M

    for each j=1...N

      D(i,j)=Min(D(i-1,j)+1,D(i,j-1)+1,X(i)==Y(j)?D(i-1,j-1):D(i-1,j-1)+2)

Termination:

  D(M,N) is distance

public static int EditDistance(string str1, string str2)
        {
            int len1 = str1.Length;
            int len2 = str2.Length;

            int[,] table = new int[len1+1, len2+1];
            for (int i = 0; i < len1; i++)
            {
                for (int j = 0; j < len2; j++)
                {
                    table[i, j] = 10000;
                }
            }
            table[0, 0] = 0;

            for (int i = 0; i <= len1; i++)
            {
                for (int j = 0; j <= len2; j++)
                {
                    if (i == 0 && j != 0)
                    {
                        table[i, j] = table[i, j - 1] + 1;
                    }
                    if (j == 0 && i != 0)
                    {
                        table[i, j] = table[i - 1, j] + 1;
                    }
                    if (i > 0 && j > 0)
                    {
                        int temp = (str1[i-1] == str2[j-1]) ? table[i - 1, j - 1] : table[i - 1, j - 1] + 2;
                        table[i, j] = Min(table[i, j - 1] + 1, table[i - 1, j] + 1, temp);
                    }
                }
            }
            return table[len1, len2];
        }
public static int Min(int val1, int val2, int val3)
        {
            return (val1 < val2 ? val1 : val2) < val3 ? (val1 < val2 ? val1 : val2) : val3;
        }

递归:

public static int EditDistanceD(string str1, string str2, int len1, int len2)
        {
            if (len1 == 0 || len2 == 0)
            {
                return Max(len1, len2);
            }

            return str1[len1-1]==str2[len2-1]?Min(EditDistanceD(str1.Substring(0,len1-1), str2.Substring(0, len2-1), len1-1, len2-1), EditDistanceD(str1.Substring(0,len1-1), str2, len1-1, len2)+1, EditDistanceD(str1, str2.Substring(0, len2-1), len1, len2-1)+1):Min(EditDistanceD(str1.Substring(0,len1-1), str2.Substring(0, len2-1), len1-1, len2-1)+2, EditDistanceD(str1.Substring(0,len1-1), str2, len1-1, len2)+1, EditDistanceD(str1, str2.Substring(0, len2-1), len1, len2-1)+1);
        }
public static int Max(int val1, int val2)
        {
            return val1 > val2 ? val1 : val2;
        }

具体讲解参考:

http://blog.csdn.net/huaweidong2011/article/details/7727482

 

posted on 2016-03-10 16:26  高山漏水  阅读(353)  评论(0编辑  收藏  举报