采集解析内容、字符串空格等处理笔记

$str ='人民网  
									3小时前
';
$str1 = '拍照的阿步  
                                                2019年08月09日 21:56
';

输出到浏览器看到的是 ,当时就是替换不掉。需要使用如下正则

  $str = preg_replace("/(\xc2\xa0)/", " ", trim($str));

上面的正则/(\xc2\xa0)/要copy，不要自己写

 /**
     * 解析作者&时间数据
     * @param $str
     * @return array
     */
    public static function doAuthorAndTime($str)
    {
        $str = str_replace("&nbsp;", ' ', trim($str));
        $str = preg_replace("/(\xc2\xa0)/", " ", trim($str));
        $str = preg_replace("/\s+/", ' ', $str);
        $author = '';
        $release_time = '';
        if ($str) {
            $tmp_arr = explode(' ', $str);
            $author = trim($tmp_arr[0] ?? '');
            $release_time = trim(($tmp_arr[1] ?? '') . ($tmp_arr[2] ?? ''));
        }
        return [$author, $release_time];
    }

    /**
     * 去掉字符串的空格
     * @param $str
     * @return mixed|null|string|string[]
     */
    public static function stripSpacing($str)
    {
        $str = str_replace("&nbsp;", '', trim($str));
        $str = preg_replace("/(\xc2\xa0)/", '', trim($str));
        $str = preg_replace("/\s+/", '', $str);
        return $str;
    }

    /**
     * 去除标签以及里面的内容
     * @param array $tags 标签 ['span','p']
     * @param string $str 处理的html内容
     * @return null|string|string[]
     */
    public static function stripHtmlTags($tags, $str)
    {
        $html = [];
        foreach ($tags as $tag) {
            $html[] = '/(<' . $tag . '.*?>[\s|\S]*?<\/' . $tag . '>)/';
        }
        return preg_replace($html, '', $str);
    }

    /**
     * 去掉内容注释
     * @param $content
     * @return string
     */
    public static function stripAnnotation($content)
    {
        return preg_replace('#<!--[^\!\[]*?(?<!\/\/)-->#', '', $content);
    }

posted @ 2019-08-20 12:11 ncsb 阅读(300) 评论(0) 编辑收藏举报

刷新页面返回顶部

ncsb

给自己看的学习&工作笔记

采集解析内容、字符串空格等处理笔记