Edit Distance
Given two words word1 and word2, find the minimum number of steps required to convert word1 to word2. (each operation is counted as 1 step.)
You have the following 3 operations permitted on a word:
a) Insert a character
b) Delete a character
c) Replace a character
编辑距离,非常经典的二维DP题目,也是传说中的双序列问题。
首先定义最优解的表示,f[i][j]表示将word1前i 个字符转化为word2前j个字符需要的step数目(注意为了方便处理初始值,f[0][0] = 0表示空串的最小编辑距离,之后定义最优解之间的转换状态, 即:
f[i][j] = min(f[i][j-1]+1,f[i-1][j]+1,f[i-1][j-1]) (word1[i-1]==word2[j-1])
f[i][j] = min(f[i][j-1]+1,f[i-1][j]+1,f[i-1][j-1]+1) (word1[i-1]!=word2[j-1])
注意在f[i][j-1],f[i-1][j],f[i-1][j-1]都是可以经过1步或者0步到达f[i][j]的子状态(增加word2[j-1],删除word1[i-1],替换(或者不替换)得到f[i][j]。
这种直接的解法代码如下:
class Solution(object): def minDistance(self, word1, word2): """ :type word1: str :type word2: str :rtype: int """ if not word1 and not word2: return 0 l1 = len(word1) l2 = len(word2) res = [[0 for i in xrange(l2+1)] for j in xrange(l1+1)] for i in xrange(1,l1+1): res[i][0] = i for j in xrange(1,l2+1): res[0][j] = j for i in xrange(1,l1+1): for j in xrange(1,l2+1): res[i][j] = min(res[i-1][j],res[i][j-1])+1 if word1[i-1] == word2[j-1]: res[i][j] = min(res[i][j],res[i-1][j-1]) else: res[i][j] = min(res[i][j],res[i-1][j-1]+1) return res[l1][l2]
可以发现这种解法时间复杂度是O(mn),空间复杂度也为O(mn),但是实际计算转换状态时,只需要f[i-1][j],f[i][j-1],f[i-1][j-1]三个值,可以使用之前多次使用的滑动数组(滚动数组来解决)。因为在矩阵每一行从左到右处理,所以如果用一个数组res表示处理到j时,res[j-1]已经更新为f[i][j-1],res[j]还没更新为f[i-1][j-1],唯一需要解决的是res[i-1][j-1]。一个好的办法是使用一个单独的变量pre保存res[i-1][j-1]。在每次更新res[j]时先把 res[j]的历史值存储下来,作为下一次更新要使用的f[i-1][j-1]。另外为了减小空间复杂度可以使res为比较短的单词的长度加1,另外上述代码的三次min可以减少为1次min,即提前对f[i-1][j-1]的情况进行处理,代码如下:
class Solution(object): def minDistance(self, word1, word2): """ :type word1: str :type word2: str :rtype: int """ if not word1 and not word2: return 0 l1 = len(word1) l2 = len(word2) if l1 <= l2: #word1 have the max len word1,word2 = word2,word1 l1,l2 = l2,l1 res = range(l2+1) for i in xrange(1,l1+1): pre = res[0] res[0] = i for j in xrange(1,l2+1): tmp = res[j] #缓存f[i-1][j-1] if word1[i-1] != word2[j-1]: #提前处理 pre += 1 res[j] = min(res[j-1]+1,res[j]+1,pre) pre = tmp return res[l2]
这道题不少人直接使用f[i][j]=f[i-1][j-1](word1[i-1][j-1])和f[i][j]= min(f[i-1][j],f[i][j-1],f[i-1][j-1]+1)这种格式,但是前者的正确性证明我还没有看到特别好的解释。需要后续再想想。
简易版本滚动数组优化:
class Solution(object): def minDistance(self, word1, word2): """ :type word1: str :type word2: str :rtype: int """ if len(word1) == 0 or len(word2) == 0: return max(len(word1), len(word2)) m = len(word1) n = len(word2) dp = [[0] * (n+1) for i in xrange(2)] for i in xrange(1, n+1): dp[0][i] = i for i in xrange(1, m+1): dp[i%2][0] = i for j in xrange(1, n+1): if word1[i-1] == word2[j-1]: dp[i%2][j] = min([dp[(i-1)%2][j]+1, dp[i%2][j-1]+1, dp[(i-1)%2][j-1]]) else: dp[i%2][j] = min([dp[(i-1)%2][j]+1, dp[i%2][j-1]+1, dp[(i-1)%2][j-1]+1]) return dp[m%2][n]
posted on 2016-05-24 12:13 Sheryl Wang 阅读(211) 评论(0) 编辑 收藏 举报