PHP通过KMP算法实现strpos
起因
昨天看了阮一峰老师的一篇博客《字符串匹配的KMP算法》,讲的非常棒。这篇文章也是解决了:
有一个字符串"BBC ABCDAB ABCDABCDABDE",里面是否包含另一个字符串"ABCDABD"?
后来发现,其实这不是就PHP自带函数strpos的功能吗?于是突发奇想,自己写个类,实现一下这个算法。
代码
<?php
class KMP
{
public $haystack;
public $needle;
private $_haystackLen;
private $_needleLen;
private $_matchTable;
private $_isMatch;
//构造函数
function __construct($haystack,$needle)
{
$this->haystack = $haystack;
$this->needle = $needle;
}
//初始化一些参数
private function init(){
$this->_haystackLen = $this->getLen($this->haystack);
$this->_needleLen = $this->getLen($this->needle);
$this->_matchTable = $this->getMatchTable();
$this->_isMatch = false;
}
//类似strpos函数功能
public function strpos()
{
$this->init();
//haystack
$haystackIdx = $matchNum = 0;
while ($haystackIdx <= $this->_haystackLen - $this->_needleLen){
//needle
$needIdx = 0;
for (; $needIdx < $this->_needleLen; $needIdx++){
if (strcmp($this->haystack[$haystackIdx],$this->needle[$needIdx]) <> 0){
if ($matchNum > 0){
$lastMatchValue = $this->getLastMatchValue($needIdx-1);
$haystackIdx += $this->getStep($matchNum,$lastMatchValue);
$matchNum = 0;
} else {
$haystackIdx++;
}
break;
} else {
$haystackIdx++;
$matchNum++;
if ($matchNum == $this->_needleLen){
$this->_isMatch = true;
break;
}
}
}
if($this->_isMatch == true){
break;
}
}
return $this->_isMatch ? $haystackIdx - $this->_needleLen : false;
}
//获取字符长度
private function getLen($str)
{
return mb_strlen($str,'utf-8');
}
//获取部分匹配表
private function getMatchTable()
{
$matchTable = [];
for ($i=0; $i < $this->_needleLen; $i++){
$intersectLen = 0;
$nowStr = mb_substr($this->needle,0,$i + 1,'utf-8');
$preFixArr = $this->getPreFix($nowStr);
$sufFixArr = $this->getSufFix($nowStr);
if($preFixArr && $sufFixArr){
$intersectArr = array_intersect($preFixArr,$sufFixArr);
if (!empty($intersectArr)){
$intersect = array_pop($intersectArr);
$intersectLen = mb_strlen($intersect,'utf-8');
}
}
$matchTable[$i] = $intersectLen;
}
return $matchTable;
}
//获取前缀数组
private function getPreFix($str)
{
$outArr = [];
$strLen = $this->getLen($str);
if ($strLen > 1){
for ($i = 1;$i < $strLen; $i++){
$outArr[] = mb_substr($str,0,$i,'utf-8');
}
}
return $outArr;
}
//获取后缀数组
private function getSufFix($str)
{
$outArr = [];
$strLen = $this->getLen($str);
if ($strLen > 1){
for ($i = 1;$i < $strLen; $i++){
$outArr[] = mb_substr($str,$i,null,'utf-8');
}
}
return $outArr;
}
//计算步长
private function getStep($matchNum,$lastMatchValue)
{
return $matchNum - $lastMatchValue;
}
//获取最后匹配值
private function getLastMatchValue($index)
{
return isset($this->_matchTable[$index]) ? $this->_matchTable[$index] : 0;
}
}
$str = 'BBC ABCDAB ABCDABCDABDE';
$subStr = 'ABCDABD';
$kmp = new KMP($str,$subStr);
var_dump($kmp->strpos());
$kmp->haystack = 'i believe';
$kmp->needle = 'lie';
var_dump($kmp->strpos());
$kmp->haystack = 'i love u';
$kmp->needle = 'hate';
var_dump($kmp->strpos());
结论
以上代码就是我看完这篇文章最想做的事-实践。虽然代码不是最好的,但是基本功能实现啦。