[LeetCode] 72. Edit Distance (编辑距离)
- Difficulty: Hard
- Related Topics: String, Dynamic Programming
- Link: https://leetcode.com/problems/edit-distance/
Description
Given two strings word1
and word2
, return the minimum number of operations required to convert word1
to word2
.
给定两字符串 word1
和 word2
,返回将 word1
转化为 word2
所需的最少操作数。
You have the following three operations permitted on a word:
你可以对字符串进行以下三种操作:
- Insert a character
插入一个字符 - Delete a character
删除一个字符 - Replace a character
替换一个字符
Examples
Example 1
Input: word1 = "horse", word2 = "ros"
Output: 3
Explanation:
horse -> rorse (replace 'h' with 'r')
rorse -> rose (remove 'r')
rose -> ros (remove 'e')
Example 2
Input: word1 = "intention", word2 = "execution"
Output: 5
Explanation:
intention -> inention (remove 't')
inention -> enention (replace 'i' with 'e')
enention -> exention (replace 'n' with 'x')
exention -> exection (replace 'n' with 'c')
exection -> execution (insert 'u')
Constraints
0 <= word1.length, word2.length <= 500
word1
andword2
consist of lowercase English letters.
Solution
虽然这题在《算法导论》上说是 DNA 对齐排列问题的一般化,不过实际解题过程更像是求最长公共子序列。DNA 对齐排列问题可以参考文后给出的一篇博客,我们先看这个问题。为了解题方便,这里所有字符串的下标都从 1 开始。定义 dp(i, j)
为字符串 s[1..i]
到字符串 t[1..j]
的编辑距离,则有以下几种可能:
-
s
为空串,则dp(i, j) = j
(相当于把串t
复制一份) -
t
为空串,则dp(i, j) = i
(相当于把串s
清空) -
都不为空串,则需要处理
s[i]
和t[j]
,若它们相同,不需要进行操作(dp(i, j) = dp(i - 1, j - 1)
);若不同,则又有以下三种操作方案:- 替换(把
s[i]
替换成t[j]
):此时dp(i, j) = dp(i - 1, j - 1) + 1
; - 增加(把
t[j]
加入结果集):此时dp(i, j) = dp(i, j - 1) + 1
; - 删除(把
s[i]
删去):此时dp(i, j) = dp(i - 1, j) - 1
;
以上三者取最小值
- 替换(把
最后的代码如下:
class Solution {
fun minDistance(word1: String, word2: String): Int {
val dp = Array(word1.length + 1) { IntArray(word2.length + 1) }
word1.indices.forEach { dp[it + 1][0] = it + 1 }
word2.indices.forEach { dp[0][it + 1] = it + 1 }
for (i in 1..word1.length) {
for (j in 1..word2.length) {
if (word1[i - 1] == word2[j - 1]) {
dp[i][j] = dp[i - 1][j - 1]
} else {
dp[i][j] = minOf(dp[i - 1][j - 1], dp[i][j - 1], dp[i - 1][j]) + 1
}
}
}
return dp.last().last()
}
}
那么如何得出转化方案呢?我们看一下 Example 1 对应的 dp
数组:
r o s
0 1 2 3
h 1 1 2 3
o 2 2 1 2
r 3 2 2 2
s 4 3 3 2
e 5 4 4 3
不难看出,从起始到最后的路径不止一条。我们从右下角出发,往左上角延伸,每次都选择左上角三个数中最小的,即可确定一条可行解,每次数值发生变化的地方即为我们的操作点。
对于任意 dp[i][j]
,若行进过程中发生数值变化,则根据发生变化的位置不同,有:
-
dp[i - 1][j - 1]
:替换; -
dp[i - 1][j]
:删除; -
dp[i][j - 1]
:增加。