编辑距离和最长公共子串

编辑距离和最长公共子串问题都是经典的DP问题,首先来看看编辑距离问题:

 

问题描述

Given two words word1 and word2, find the minimum number of steps required to convert word1 to word2. (each operation is counted as 1 step.)

You have the following 3 operations permitted on a word:

a) Insert a character
b) Delete a character
c) Replace a character

 

解决思路

经典的动态规划题,建立一个二维的数组dp[][]记录两个字符串s1和s2子串的最短编辑距离,递推公式如下:

(1) 当s1.charAt(i) == s2.charAt(j)时,dp[i][j] = dp[i - 1][j - 1];

(2) 其他时,dp[i][j] = Math.min(dp[i - 1][j - 1], Math.min(dp[i][j - 1], dp[i - 1][j]));

初始化条件为:

dp[i][0] = i, dp[0][j] = j;

 

代码

1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
37
int getMinEditLen(String s1, String s2) {
    if (s1 == null && s2 == null) {
        return 0;
    }
 
    if (s1.length() == 0) {
        return s2.length();
    }
    if (s2.length() == 0) {
        return s1.length();
    }
 
    int len1 = s1.length();
    int len2 = s2.length();
 
    int[][] dp = new int[len1 + 1][len2 + 1];
    // initialize
    for (int i = 0; i < dp.length; i++) {
        dp[i][0] = i;
    }
    for (int j = 0; j < dp[0].length; j++) {
        dp[0][j] = j;
    }
 
    for (int i = 1; i < dp.length; i++) {
        for (int j = 1; j < dp[0].length; j++) {
            if (s1.charAt(i - 1) == s2.charAt(j - 1)) {
                dp[i][j] = dp[i - 1][j - 1];
            } else {
                dp[i][j] = Math.min(dp[i - 1][j - 1],
                        Math.min(dp[i - 1][j], dp[i][j - 1])) + 1;
            }
        }
    }
 
    return dp[len1][len2];
}

 容易写错的地方

1
2
3
if (word1.substring(0, i).equals(word2.substring(0, j))) {
        dp[i][j] = 0;
}

 

 

最长公共子串问题

问题描述

子字符串的定义和子序列的定义类似,但要求是连续分布在其他字符串中。比如输入两个字符串BDCABA和ABCBDAB的最长公共字符串有BD和AB,它们的长度都是2。

 

解决思路

(1) 递归;

(2) dp;

 

代码

1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
// rec
int getLCS(String s1, String s2) {
    if (s1.length() == 0 || s2.length() == 0) {
        return 0;
    }
 
    int len1 = s1.length();
    int len2 = s2.length();
 
    if (s1.charAt(len1 - 1) == s2.charAt(len2 - 1)) {
        return getLCS(s1.substring(0, len1 - 1), s2.substring(0, len2 - 1)) + 1;
    }
 
    return Math.max(
            getLCS(s1.substring(0, len1), s2.substring(0, len2 - 1)),
            getLCS(s1.substring(0, len1 - 1), s2.substring(0, len2)));
}

 

1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
// dp
int getLCS(String s1, String s2) {
    if (s1.length() == 0 || s2.length() == 0) {
        return 0;
    }
 
    int len1 = s1.length();
    int len2 = s2.length();
 
    int[][] dp = new int[len1 + 1][len2 + 1];
    for (int i = 1; i < dp.length; i++) {
        for (int j = 1; j < dp[0].length; j++) {
            if (s1.charAt(i - 1) == s2.charAt(j - 1)) {
                dp[i][j] = dp[i - 1][j - 1] + 1;
            } else {
                dp[i][j] = 0;
            }
        }
    }
 
    return dp[len1][len2];
}

 

拓展:最公共子序列问题。

例如s1 = "abc", s2 = "asbvcd", s1和s2的最长公共子序列为"abc",长度为3.

1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
37
38
39
40
41
42
43
44
45
46
public class LongestCommonSequence {
    public int getLCSeqLen(String s1, String s2) {
        if (s1 == null || s2 == null || s1.length() == 0 || s2.length() == 0) {
            return 0;
        }
 
        int len1 = s1.length(), len2 = s2.length();
        int[][] dp = new int[len1 + 1][len2 + 1];
 
        for (int i = 1; i <= len1; i++) {
            for (int j = 1; j <= len2; j++) {
                if (s1.charAt(i - 1) == s2.charAt(j - 1)) {
                    dp[i][j] = dp[i - 1][j - 1] + 1;
                } else {
                    dp[i][j] = Math.max(dp[i - 1][j], dp[i][j - 1]);
                }
            }
        }
 
        return dp[len1][len2];
    }
     
    public String getLCSeq(String s1, String s2) {
        if (s1 == null || s2 == null || s1.length() == 0 || s2.length() == 0) {
            return "";
        }
 
        int len1 = s1.length(), len2 = s2.length();
        int[][] dp = new int[len1 + 1][len2 + 1];
         
        String lcs = "";
         
        for (int i = 1; i <= len1; i++) {
            for (int j = 1; j <= len2; j++) {
                if (s1.charAt(i - 1) == s2.charAt(j - 1)) {
                    dp[i][j] = dp[i - 1][j - 1] + 1;
                    lcs += s1.charAt(i - 1);
                } else {
                    dp[i][j] = Math.max(dp[i - 1][j], dp[i][j - 1]);
                }
            }
        }
         
        return lcs;
    }
}

  

posted @   Chapter  阅读(366)  评论(0编辑  收藏  举报
努力加载评论中...
点击右上角即可分享
微信分享提示