《编程之美》读书笔记11: 3.3 计算字符串的相似度
《编程之美》读书笔记11: 3.3 计算字符串的相似度
很经典的可使用动态规划方法解决的题目,和计算两字符串的最长公共子序列相似。
设Ai为字符串A(a1a2a3 … am)的前i个字符(即为a1,a2,a3 … ai)
设Bj为字符串B(b1b2b3 … bn)的前j个字符(即为b1,b2,b3 … bj)
设 L(i , j)为使两个字符串和Ai和Bj相等的最小操作次数。
当ai等于bj时 显然L(i, j)=L(i-1, j-1)
当ai不等于bj时
若将它们修改为相等,则对两个字符串至少还要操作L(i-1, j-1)次
若删除ai或在Bj后添加ai,则对两个字符串至少还要操作L(i-1, j)次
若删除bj或在Ai后添加bj,则对两个字符串至少还要操作L(i, j-1)次
此时L(i, j)=min( L(i-1, j-1), L(i-1, j), L(i, j-1) ) + 1
显然,L(i, 0)=i,L(0, j)=j, 再利用上述的递推公式,可以直接计算出L(i, j)值。
为了保持与书中代码一致,下面的函数参数类型是string,而不是char*。
![](http://www.cppblog.com/Images/OutliningIndicators/ContractedBlock.gif)
![](http://www.cppblog.com/Images/OutliningIndicators/None.gif)
2
![](http://www.cppblog.com/Images/OutliningIndicators/ExpandedBlockStart.gif)
3
![](http://www.cppblog.com/Images/OutliningIndicators/InBlock.gif)
4
![](http://www.cppblog.com/Images/OutliningIndicators/InBlock.gif)
5
![](http://www.cppblog.com/Images/OutliningIndicators/InBlock.gif)
6
![](http://www.cppblog.com/Images/OutliningIndicators/InBlock.gif)
7
![](http://www.cppblog.com/Images/OutliningIndicators/InBlock.gif)
8
![](http://www.cppblog.com/Images/OutliningIndicators/InBlock.gif)
9
![](http://www.cppblog.com/Images/OutliningIndicators/InBlock.gif)
10
![](http://www.cppblog.com/Images/OutliningIndicators/ExpandedSubBlockStart.gif)
11
![](http://www.cppblog.com/Images/OutliningIndicators/ExpandedSubBlockStart.gif)
12
![](http://www.cppblog.com/Images/OutliningIndicators/InBlock.gif)
13
![](http://www.cppblog.com/Images/OutliningIndicators/ExpandedSubBlockStart.gif)
14
![](http://www.cppblog.com/Images/OutliningIndicators/InBlock.gif)
15
![](http://www.cppblog.com/Images/OutliningIndicators/InBlock.gif)
16
![](http://www.cppblog.com/Images/OutliningIndicators/InBlock.gif)
17
![](http://www.cppblog.com/Images/OutliningIndicators/ExpandedSubBlockEnd.gif)
18
![](http://www.cppblog.com/Images/OutliningIndicators/ExpandedSubBlockEnd.gif)
19
![](http://www.cppblog.com/Images/OutliningIndicators/ExpandedSubBlockEnd.gif)
20
![](http://www.cppblog.com/Images/OutliningIndicators/InBlock.gif)
21
![](http://www.cppblog.com/Images/OutliningIndicators/ExpandedBlockEnd.gif)
22
![](http://www.cppblog.com/Images/OutliningIndicators/None.gif)
23
![](http://www.cppblog.com/Images/OutliningIndicators/None.gif)
由于只要求计算两字串的距离,计算时,只用到两列数据,因而可以对代码进一步优化,节省空间。
![](http://www.cppblog.com/Images/OutliningIndicators/ContractedBlock.gif)
![](http://www.cppblog.com/Images/OutliningIndicators/None.gif)
2
![](http://www.cppblog.com/Images/OutliningIndicators/ExpandedBlockStart.gif)
3
![](http://www.cppblog.com/Images/OutliningIndicators/InBlock.gif)
4
![](http://www.cppblog.com/Images/OutliningIndicators/InBlock.gif)
5
![](http://www.cppblog.com/Images/OutliningIndicators/InBlock.gif)
6
![](http://www.cppblog.com/Images/OutliningIndicators/InBlock.gif)
7
![](http://www.cppblog.com/Images/OutliningIndicators/InBlock.gif)
8
![](http://www.cppblog.com/Images/OutliningIndicators/InBlock.gif)
9
![](http://www.cppblog.com/Images/OutliningIndicators/ExpandedSubBlockStart.gif)
10
![](http://www.cppblog.com/Images/OutliningIndicators/InBlock.gif)
11
![](http://www.cppblog.com/Images/OutliningIndicators/InBlock.gif)
12
![](http://www.cppblog.com/Images/OutliningIndicators/InBlock.gif)
13
![](http://www.cppblog.com/Images/OutliningIndicators/InBlock.gif)
14
![](http://www.cppblog.com/Images/OutliningIndicators/ExpandedSubBlockEnd.gif)
15
![](http://www.cppblog.com/Images/OutliningIndicators/InBlock.gif)
16
![](http://www.cppblog.com/Images/OutliningIndicators/InBlock.gif)
17
![](http://www.cppblog.com/Images/OutliningIndicators/InBlock.gif)
18
![](http://www.cppblog.com/Images/OutliningIndicators/InBlock.gif)
19
![](http://www.cppblog.com/Images/OutliningIndicators/ExpandedSubBlockStart.gif)
20
![](http://www.cppblog.com/Images/OutliningIndicators/InBlock.gif)
21
![](http://www.cppblog.com/Images/OutliningIndicators/ExpandedSubBlockStart.gif)
22
![](http://www.cppblog.com/Images/OutliningIndicators/ExpandedSubBlockStart.gif)
23
![](http://www.cppblog.com/Images/OutliningIndicators/InBlock.gif)
24
![](http://www.cppblog.com/Images/OutliningIndicators/InBlock.gif)
25
![](http://www.cppblog.com/Images/OutliningIndicators/InBlock.gif)
26
![](http://www.cppblog.com/Images/OutliningIndicators/ExpandedSubBlockEnd.gif)
27
![](http://www.cppblog.com/Images/OutliningIndicators/ExpandedSubBlockEnd.gif)
28
![](http://www.cppblog.com/Images/OutliningIndicators/InBlock.gif)
29
![](http://www.cppblog.com/Images/OutliningIndicators/ExpandedSubBlockEnd.gif)
30
![](http://www.cppblog.com/Images/OutliningIndicators/InBlock.gif)
31
![](http://www.cppblog.com/Images/OutliningIndicators/ExpandedBlockEnd.gif)
32
![](http://www.cppblog.com/Images/OutliningIndicators/None.gif)
33
![](http://www.cppblog.com/Images/OutliningIndicators/None.gif)
上面的代码还可进一步优化,比如通过指针而不是数组名来访问内存。如果内存足够大,可以多申请空间,每次循环,通过修改保存的数据起始位置,避免内存复制。
补充:字符串的相似度,就是求编辑距离(edit distance)。
作者: flyinghearts
出处: http://www.cnblogs.com/flyinghearts/
本文采用知识共享署名-非商业性使用-相同方式共享 2.5 中国大陆许可协议进行许可,欢迎转载,但未经作者同意必须保留此段声明,且在文章页面明显位置给出原文连接,否则保留追究法律责任的权利。