编辑距离和最长公共子串
编辑距离和最长公共子串问题都是经典的DP问题,首先来看看编辑距离问题:
问题描述
Given two words word1 and word2, find the minimum number of steps required to convert word1 to word2. (each operation is counted as 1 step.)
You have the following 3 operations permitted on a word:
a) Insert a character
b) Delete a character
c) Replace a character
解决思路
经典的动态规划题,建立一个二维的数组dp[][]记录两个字符串s1和s2子串的最短编辑距离,递推公式如下:
(1) 当s1.charAt(i) == s2.charAt(j)时,dp[i][j] = dp[i - 1][j - 1];
(2) 其他时,dp[i][j] = Math.min(dp[i - 1][j - 1], Math.min(dp[i][j - 1], dp[i - 1][j]));
初始化条件为:
dp[i][0] = i, dp[0][j] = j;
代码
1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 27 28 29 30 31 32 33 34 35 36 37 | int getMinEditLen(String s1, String s2) { if (s1 == null && s2 == null ) { return 0 ; } if (s1.length() == 0 ) { return s2.length(); } if (s2.length() == 0 ) { return s1.length(); } int len1 = s1.length(); int len2 = s2.length(); int [][] dp = new int [len1 + 1 ][len2 + 1 ]; // initialize for ( int i = 0 ; i < dp.length; i++) { dp[i][ 0 ] = i; } for ( int j = 0 ; j < dp[ 0 ].length; j++) { dp[ 0 ][j] = j; } for ( int i = 1 ; i < dp.length; i++) { for ( int j = 1 ; j < dp[ 0 ].length; j++) { if (s1.charAt(i - 1 ) == s2.charAt(j - 1 )) { dp[i][j] = dp[i - 1 ][j - 1 ]; } else { dp[i][j] = Math.min(dp[i - 1 ][j - 1 ], Math.min(dp[i - 1 ][j], dp[i][j - 1 ])) + 1 ; } } } return dp[len1][len2]; } |
容易写错的地方
1 2 3 | if (word1.substring( 0 , i).equals(word2.substring( 0 , j))) { dp[i][j] = 0 ; } |
最长公共子串问题
问题描述
子字符串的定义和子序列的定义类似,但要求是连续分布在其他字符串中。比如输入两个字符串BDCABA和ABCBDAB的最长公共字符串有BD和AB,它们的长度都是2。
解决思路
(1) 递归;
(2) dp;
代码
1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 | // rec int getLCS(String s1, String s2) { if (s1.length() == 0 || s2.length() == 0 ) { return 0 ; } int len1 = s1.length(); int len2 = s2.length(); if (s1.charAt(len1 - 1 ) == s2.charAt(len2 - 1 )) { return getLCS(s1.substring( 0 , len1 - 1 ), s2.substring( 0 , len2 - 1 )) + 1 ; } return Math.max( getLCS(s1.substring( 0 , len1), s2.substring( 0 , len2 - 1 )), getLCS(s1.substring( 0 , len1 - 1 ), s2.substring( 0 , len2))); } |
1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 | // dp int getLCS(String s1, String s2) { if (s1.length() == 0 || s2.length() == 0 ) { return 0 ; } int len1 = s1.length(); int len2 = s2.length(); int [][] dp = new int [len1 + 1 ][len2 + 1 ]; for ( int i = 1 ; i < dp.length; i++) { for ( int j = 1 ; j < dp[ 0 ].length; j++) { if (s1.charAt(i - 1 ) == s2.charAt(j - 1 )) { dp[i][j] = dp[i - 1 ][j - 1 ] + 1 ; } else { dp[i][j] = 0 ; } } } return dp[len1][len2]; } |
拓展:最公共子序列问题。
例如s1 = "abc", s2 = "asbvcd", s1和s2的最长公共子序列为"abc",长度为3.
1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 27 28 29 30 31 32 33 34 35 36 37 38 39 40 41 42 43 44 45 46 | public class LongestCommonSequence { public int getLCSeqLen(String s1, String s2) { if (s1 == null || s2 == null || s1.length() == 0 || s2.length() == 0 ) { return 0 ; } int len1 = s1.length(), len2 = s2.length(); int [][] dp = new int [len1 + 1 ][len2 + 1 ]; for ( int i = 1 ; i <= len1; i++) { for ( int j = 1 ; j <= len2; j++) { if (s1.charAt(i - 1 ) == s2.charAt(j - 1 )) { dp[i][j] = dp[i - 1 ][j - 1 ] + 1 ; } else { dp[i][j] = Math.max(dp[i - 1 ][j], dp[i][j - 1 ]); } } } return dp[len1][len2]; } public String getLCSeq(String s1, String s2) { if (s1 == null || s2 == null || s1.length() == 0 || s2.length() == 0 ) { return "" ; } int len1 = s1.length(), len2 = s2.length(); int [][] dp = new int [len1 + 1 ][len2 + 1 ]; String lcs = "" ; for ( int i = 1 ; i <= len1; i++) { for ( int j = 1 ; j <= len2; j++) { if (s1.charAt(i - 1 ) == s2.charAt(j - 1 )) { dp[i][j] = dp[i - 1 ][j - 1 ] + 1 ; lcs += s1.charAt(i - 1 ); } else { dp[i][j] = Math.max(dp[i - 1 ][j], dp[i][j - 1 ]); } } } return lcs; } } |
【推荐】编程新体验,更懂你的AI,立即体验豆包MarsCode编程助手
【推荐】凌霞软件回馈社区,博客园 & 1Panel & Halo 联合会员上线
【推荐】抖音旗下AI助手豆包,你的智能百科全书,全免费不限次数
【推荐】博客园社区专享云产品让利特惠,阿里云新客6.5折上折
【推荐】轻量又高性能的 SSH 工具 IShell:AI 加持,快人一步