PHP preg_replace

preg_replace

(PHP 3 >= 3.0.9, PHP 4, PHP 5)

preg_replace -- 执行正则表达式的搜索和替换

说明

mixed preg_replace ( mixed pattern, mixed replacement, mixed subject [, int limit] )

subject 中搜索 pattern 模式的匹配项并替换为 replacement。如果指定了 limit,则仅替换 limit 个匹配,如果省略 limit 或者其值为 -1,则所有的匹配项都会被替换。

replacement 可以包含 \\n 形式或(自 PHP 4.0.4 起)$n 形式的逆向引用,首选使用后者。每个此种引用将被替换为与第 n 个被捕获的括号内的子模式所匹配的文本。n 可以从 0 到 99,其中 \\0$0 指的是被整个模式所匹配的文本。对左圆括号从左到右计数(从 1 开始)以取得子模式的数目。

对替换模式在一个逆向引用后面紧接着一个数字时(即:紧接在一个匹配的模式后面的数字),不能使用熟悉的 \\1 符号来表示逆向引用。举例说 \\11,将会使 preg_replace() 搞不清楚是想要一个 \\1 的逆向引用后面跟着一个数字 1 还是一个 \\11 的逆向引用。本例中的解决方法是使用 \${1}1。这会形成一个隔离的 $1 逆向引用,而使另一个 1 只是单纯的文字。

 

例子 1. 逆向引用后面紧接着数字的用法

<?php
$string
= "April 15, 2003";
$pattern = "/(\w+) (\d+), (\d+)/i";
$replacement = "\${1}1,\$3";
print
preg_replace($pattern, $replacement, $string);

/* Output
   ======

April1,2003

*/
?>

如果搜索到匹配项,则会返回被替换后的 subject,否则返回原来不变的 subject

preg_replace() 的每个参数(除了 limit)都可以是一个数组。如果 patternreplacement 都是数组,将以其键名在数组中出现的顺序来进行处理。这不一定和索引的数字顺序相同。如果使用索引来标识哪个 pattern 将被哪个 replacement 来替换,应该在调用 preg_replace() 之前用 ksort() 对数组进行排序。

 

例子 2. 在 preg_replace() 中使用索引数组

<?php
$string
= "The quick brown fox jumped over the lazy dog.";

$patterns[0] = "/quick/";
$patterns[1] = "/brown/";
$patterns[2] = "/fox/";

$replacements[2] = "bear";
$replacements[1] = "black";
$replacements[0] = "slow";

print
preg_replace($patterns, $replacements, $string);

/* Output
   ======

The bear black slow jumped over the lazy dog.

*/

/* By ksorting patterns and replacements,
   we should get what we wanted. */

ksort($patterns);
ksort($replacements);

print
preg_replace($patterns, $replacements, $string);

/* Output
   ======

The slow black bear jumped over the lazy dog.

*/

?>

如果 subject 是个数组,则会对 subject 中的每个项目执行搜索和替换,并返回一个数组。

如果 patternreplacement 都是数组,则 preg_replace() 会依次从中分别取出值来对 subject 进行搜索和替换。如果 replacement 中的值比 pattern 中的少,则用空字符串作为余下的替换值。如果 pattern 是数组而 replacement 是字符串,则对 pattern 中的每个值都用此字符串作为替换值。反过来则没有意义了。

/e 修正符使 preg_replace()replacement 参数当作 PHP 代码(在适当的逆向引用替换完之后)。提示:要确保 replacement 构成一个合法的 PHP 代码字符串,否则 PHP 会在报告在包含 preg_replace() 的行中出现语法解析错误。

 

例子 3. 替换数个值

<?php
$patterns
= array ("/(19|20)(\d{2})-(\d{1,2})-(\d{1,2})/",
                   
"/^\s*{(\w+)}\s*=/");
$replace = array ("\\3/\\4/\\1\\2", "$\\1 =");
print
preg_replace ($patterns, $replace, "{startDate} = 1999-5-27");
?>

本例将输出:

$startDate = 5/27/1999

 

例子 4. 使用 /e 修正符

<?php
preg_replace
("/(<\/?)(\w+)([^>]*>)/e",
              
"'\\1'.strtoupper('\\2').'\\3'",
              
$html_body);
?>

这将使输入字符串中的所有 HTML 标记变成大写。

 

例子 5. 将 HTML 转换成文本

<?php
// $document 应包含一个 HTML 文档。
// 本例将去掉 HTML 标记,javascript 代码
// 和空白字符。还会将一些通用的
// HTML 实体转换成相应的文本。

$search = array ("'<script[^>]*?>.*?</script>'si",  // 去掉 javascript
                 
"'<[\/\!]*?[^<>]*?>'si",           // 去掉 HTML 标记
                 
"'([\r\n])[\s]+'",                 // 去掉空白字符
                 
"'&(quot|#34);'i",                 // 替换 HTML 实体
                 
"'&(amp|#38);'i",
                 
"'&(lt|#60);'i",
                 
"'&(gt|#62);'i",
                 
"'&(nbsp|#160);'i",
                 
"'&(iexcl|#161);'i",
                 
"'&(cent|#162);'i",
                 
"'&(pound|#163);'i",
                 
"'&(copy|#169);'i",
                 
"'&#(\d+);'e");                    // 作为 PHP 代码运行

$replace = array ("",
                  
"",
                  
"\\1",
                  
"\"",
                  
"&",
                  
"<",
                  
">",
                  
" ",
                  
chr(161),
                  
chr(162),
                  
chr(163),
                  
chr(169),
                  
"chr(\\1)");

$text = preg_replace ($search, $replace, $document);
?>

注: limit 参数是 PHP 4.0.1pl2 之后加入的。

参见 preg_match()preg_match_all()preg_split()

 

add a note User Contributed Notes
Sune Rievers
25-May-2006 01:58
Updated version of the link script, since the other version didn't work with links in beginning of line, links without http:// and emails. Oh, and a bf2:// detection too for all you gamers ;)

function make_links_blank($text)
{
  return  preg_replace(
     array(
       '/(?(?=<a[^>]*>.+<\/a>)
             (?:<a[^>]*>.+<\/a>)
             |
             ([^="\']?)((?:https?|ftp|bf2|):\/\/[^<> \n\r]+)
         )/iex',
       '/<a([^>]*)target="?[^"\']+"?/i',
       '/<a([^>]+)>/i',
       '/(^|\s)(www.[^<> \n\r]+)/iex',
       '/(([_A-Za-z0-9-]+)(\\.[_A-Za-z0-9-]+)*@([A-Za-z0-9-]+)
       (\\.[A-Za-z0-9-]+)*)/iex'
       ),
     array(
       "stripslashes((strlen('\\2')>0?'\\1<a href=\"\\2\">\\2</a>\\3':'\\0'))",
       '<a\\1',
       '<a\\1 target="_blank">',
       "stripslashes((strlen('\\2')>0?'\\1<a href=\"http://\\2\">\\2</a>\\3':'\\0'))",
       "stripslashes((strlen('\\2')>0?'<a href=\"mailto:\\0\">\\0</a>':'\\0'))"
       ),
       $text
   );
}
sep16 at psu dot edu
19-May-2006 11:28
Re: preg_replace() with the /e modifier; handling escaped quotes.

I was writing a replacement pattern to parse HTML text which sometimes contained PHP variable-like strings. Various initial solutions yeilded either escaped quotes or fatal errors due to these variable-like strings being interpreted as poorly formed variables.

"Tim K." and "steven -a-t- acko dot net" provide some detailed discussion of preg_replace's quote escaping in the comments below, including the use of str_replace() to remove the preg_replace added slash-quote.  However, this suggestion is applied to the entire text AFTER the preg_match.  This isn't a robust solution in that it is conceivable that the text unaffected by the preg_replace() may contain the string \\" which should not be fixed.  Furthermore, the addition of escaped quotes within preg_replaces with multiple patterns/replacements (with arrays) may break one of the following patterns.

The solution, then, must fix the quote-escaped text BEFORE replacing it in the target, and possibly before it is passed to a function within the replacement code.  Since the replacement string is interpreted as PHP code, just use str_replace('\\"','"','$1') where you need an unadulterated $1 to appear.  The key is to properly escape the necessary characters.  Three variations appear in the examples below, as well as a set of incorrect examples.  I haven't seen this solution posted before, so hopefully this will be helpful rather than covering old ground.

Try this example code:

<?php
/*
   Using preg_replace with the /e modifier on ANY text, regardless of single
   quotes, double quotes, dollar signs, backslashes, and variable interpolation.

   Tested on PHP 5.0.4 (cli), PHP 5.1.2-1+b1 (cli), and PHP 5.1.2 for Win32.

   Solution?
       1.  Use single quotes for the replacement string.
       2.  Use escaped single quotes around the captured text variable (\\'$1\\').
       3.  Use str_replace() to remove the escaped double quotes
           from within the replacement code (\\" -> ").
*/

function _prc_function1($var1,$var2,$match) {
   $match = str_replace('\\"','"',$match);
   // ... do other stuff ...
   return $var1.$match.$var2;
}
function _prc_function2($var1,$var2,$match) {
   // ... do other stuff ...
   return $var1.$match.$var2;
}

$v1 = '(';
$v2 = ')';
// Lots of tricky characters:
$text = "<xxx>...\\$varlike_text[_'\\\\\\""\\"'...</xxx>";
$pattern = '/<xxx>(.*?)</xxx>/e';

echo $text . " Original.<br>\\n";

// Example #1 - Processing in place.
// returns (...$varlike_text['"""\\'...)
echo preg_replace(
   $pattern,
   '$v1 . str_replace(\\'\\\\"',\\'"\\',\\'$1\\') . $v2',
   $text) . " Escaped double quotes replaced with str_replace. (Good.)<br>n";

// Example #2 - Processing within a function.
// returns (...$varlike_text['\\"""'...)
echo preg_replace(
   $pattern,
   '_prc_function1($v1,$v2,'$1\\')',
   $text) . " Escaped double quotes replaced in a function. (Good.)<br>\\n";

// Example #3 - Preprocessing before a function.
// returns (...$varlike_text['"""\\'...)
echo preg_replace(
   $pattern,
   '_prc_function2($v1,$v2,str_replace(\\'\\\\"',\\'"\\',\\'$1\\'))',
   $text) . " Escaped double quotes replaced with str_replace before sending match to a function. (Good.)<br>n";

// Example #4 - INCORRECT implementations
//  a. returns (...$varlike_text[_'\\\\""\\"'...)
//  b. returns (...$varlike_text[_'"\\\\""\\'...)
//  c. returns (...$varlike_text[_'\\\\""\\"'...)
//  d. Causes a syntax+fatal error, unexpected T_BAD_CHARACTER...
echo preg_replace( $pattern, "\\$v1 . '$1' . \\$v2", $text)," Enclosed in double/single quotes. (Wrong!  Extra slashes.)<br>\\n";
echo preg_replace( $pattern, "\\$v1 . '$1' . \\$v2", $text)," Enclosed in double/single quotes, $ escaped. (Wrong!  Extra slashes.)<br>\\n";
echo preg_replace( $pattern, '$v1 . '$1\\' . $v2', $text)," Enclosed in single/single quotes. (Wrong!  Extra slashes.)<br>\\n";
echo preg_replace( $pattern, '$v1 . "$1" . $v2', $text)," Enclosed in single quotes. (Wrong!  Dollar sign in text is interpreted as variable interpolation.)<br>\\n";

?>
klemens at ull dot at
16-May-2006 05:24
See as well the excellent tutorial at http://www.tote-taste.de/X-Project/regex/index.php

;-) Klemens
robvdl at gmail dot com
21-Apr-2006 08:15
For those of you that have ever had the problem where clients paste text from msword into a CMS, where word has placed all those fancy quotes throughout the text, breaking the XHTML validator... I have created a nice regular expression, that replaces ALL high UTF-8 characters with HTML entities, such as &#8217;.

Note that most user examples on php.net I have read, only replace selected characters, such as single and double quotes. This replaces all high characters, including greek characters, arabian characters, smilies, whatever.

It took me ages to get it just downto two regular expressions, but it handles all high level characters properly.

$text = preg_replace('/([\xc0-\xdf].)/se', "'&#' . ((ord(substr('$1', 0, 1)) - 192) * 64 + (ord(substr('$1', 1, 1)) - 128)) . ';'", $text);
$text = preg_replace('/([\xe0-\xef]..)/se', "'&#' . ((ord(substr('$1', 0, 1)) - 224) * 4096 + (ord(substr('$1', 1, 1)) - 128) * 64 + (ord(substr('$1', 2, 1)) - 128)) . ';'", $text);
heppa(at)web(dot)de
20-Apr-2006 11:37
I just wanted to give an example for some people that have the problem, that their match is taking away too much of the string.

I wanted to have a function that extracts only wanted parameters out of a http query string, and they had to be flexible, eg 'updateItem=1' should be replaced, as well as 'updateCategory=1', but i sometimes ended up having too much replaced from the query.

example:

my query string: 'updateItem=1&itemID=14'

ended up in a query string like this: '4' , which was not really covering the plan ;)

i was using this regexp:

preg_replace('/&?update.*=1&?/','',$query_string);

i discovered, that preg_replace matches the longest possible string, which means that it replaces everything from the first u up to the 1 after itemID=

I assumed, that it would take the shortest possible match.
Ritter
19-Apr-2006 05:08
for those of you with multiline woes like I was having, try:

$str = preg_replace('/<tag[^>](.*)>(.*)<\/tag>/ims','<!-- edited -->', $str);
Eric
10-Apr-2006 02:54
Here recently I needed a way to replace links (<a href="blah.com/blah.php">Blah</a>) with their anchor text, in this case Blah. It might seem simple enough for some..or most, but at the benefit of helping others:

<?php

$value = '<a href="http://www.domain.com/123.html">123</a>';

echo preg_replace('/<a href="(.*?)">(.*?)<\\/a>/i', '$2', $value);

//Output
// 123

?>
sesha_srinivas at yahoo dot com
08-Apr-2006 04:13
If you have a form element displaying the amounts using "$" and ",". Before posting it to the db you can use the following:

$search = array('/,/','/\$/');

$replace = array('','');

$data['amount_limit'] = preg_replace($search,'',$data['amount_limit']);
ciprian dot amariei Mtaiil gmail * com
06-Apr-2006 01:21
I found some situations that my function bellow doesn't
perform as expected. Here is the new version.

<?php
function make_links_blank( $text )
{
 return  preg_replace(
     array(
       '/(?(?=<a[^>]*>.+<\/a>)
             (?:<a[^>]*>.+<\/a>)
             |
             ([^="\'])((?:https?|ftp):\/\/[^<> \n\r]+)
         )/iex',
       '/<a([^>]*)target="?[^"\']+"?/i',
       '/<a([^>]+)>/i'
       ),
     array(
       "stripslashes((strlen('\\2')>0?'\\1<a href=\"\\2\">\\2</a>\\3':'\\0'))",
       '<a\\1',
       '<a\\1 target="_blank">'
       ),
       $text
   );
}

?>

This function replaces links (http(s)://, ftp://) with respective html anchor tag, and also makes all anchors open in a new window.
ae at instinctive dot de
28-Mar-2006 11:40
Something innovative for a change ;-) For a news system, I have a special format for links:

"Go to the [Blender3D Homepage|http://www.blender3d.org] for more Details"

To get this into a link, use:

$new = preg_replace('/\[(.*?)\|(.*?)\]/', '<a href="$2" target="_blank">$1</a>', $new);
c_stewart0a at yahoo dot com
18-Mar-2006 06:35
In response to elaineseery at hotmail dot com

[quote]if you're new to this function, and getting an error like  'delimiter must not alphanumeric backslash ...[/quote]

Note that if you use arrays for search and replace then you will want to quote your searching expression with / or you will get this error.

However, if you use a single string to search and replace then you will not recieve this error if you do not quote your regular expression in /
Graham Dawson <graham at imdanet dot com>
16-Mar-2006 06:46
I said there was a better way. There is!

The regexp is essentially the same but now I deal with problems that it couldn't handle, such as urls, which tended to screw things up, and the odd placement of a : or ; in the body text, by using functions. This makes it easier to expand to take account of all the things I know I've not taken account of. But here it is in its essential glory. Or mediocrity. Take your pick.

<?php

define('PARSER_ALLOWED_STYLES_',
'text-align,font-family,font-size,text-decoration');

function strip_styles($source=NULL) {
  $exceptions = str_replace(',', '|', @constant('PARSER_ALLOWED_STYLES_'));

  /* First we want to fix anything that might potentially break the styler stripper, sow e try and replace
   * in-text instances of : with its html entity replacement.
   */

  function Replacer($text) {
   $check = array (
       '@:@s',
   );
   $replace = array(
       '&#58;',
   );

   return preg_replace($check, $replace, $text[0]);
  }

  $source = preg_replace_callback('@>(.*)<@Us', 'Replacer', $source);

  $regexp =

'@([^;"]+)?(?<!'. $exceptions. ')(?<!\>\w):(?!\/\/(.+?)\/|<|>)((.*?)[^;"]+)(;)?@is';

  $source = preg_replace($regexp, '', $source);

  $source = preg_replace('@[a-z]*=""@is', '', $source);

  return $source;
}

?>
rybasso
16-Mar-2006 05:33
"Document contains no data" message in FF and 'This page could not be found' in IE occures when you pass too long <i>subject</i> string to preg_replace() with default <i>limit</i>.

Increment the limit to be sure it's larger than a subject lenght.
Ciprian Amariei
16-Mar-2006 06:50
Here is a function that replaces the links (http(s)://, ftp://) with respective html anchor, and also makes all anchors open in a new window.

function make_links_blank( $text )
{
 
 return  preg_replace( array(
               "/[^\"'=]((http|ftp|https):\/\/[^\s\"']+)/i",
               "/<a([^>]*)target=\"?[^\"']+\"?/i",
               "/<a([^>]+)>/i"
       ),
         array(
               "<a href=\"\\1\">\\1</a>",
               "<a\\1",
               "<a\\1 target=\"_blank\" >"
           ),
       $text
       );
}
felipensp at gmail dot com
13-Mar-2006 01:02
Sorry, I don't know English.

Replacing letters of badword for a definite character.
View example:

<?php

function censured($string, $aBadWords, $sChrReplace) {

   foreach ($aBadWords as $key => $word) {

       // Regexp for case-insensitive and use the functions
       $aBadWords[$key] = "/({$word})/ie";

   }

   // to substitue badwords for definite character
   return preg_replace($aBadWords,
                       "str_repeat('{$sChrReplace}', strlen('\\1'))",
                       $string
                       );

}

// To show modifications
print censured('The nick of my friends are rand, v1d4l0k4, P7rk, ferows.',
               array('RAND', 'V1D4L0K4', 'P7RK', 'FEROWS'),
               '*'
               );
  
?>
Graham Dawson graham_at_imdanet_dot_com
07-Mar-2006 05:32
Inspired by the query-string cleaner from greenthumb at 4point-webdesign dot com and istvan dot csiszar at weblab dot hu. This little bit of code cleans up any "style" attributes in your tags, leaving behind only styles that you have specifically allowed. Also conveniently strips out nonsense styles. I've not fully tested it yet so I'm not sure if it'll handle features like url(), but that shouldn't be a difficulty.

<?php

/* The string would normally be a form-submitted html file or text string */

$string = '<span style="font-family:arial; font-size:20pt; text-decoration:underline; sausage:bueberry;" width="200">Hello there</span> This is some <div style="display:inline;">test text</div>';

/* Array of styles to allow. */
$except = array('font-family', 'text-decoration');
$allow = implode($except, '|');

/* The monster beast regexp. I was up all night trying to figure this one out. */

$regexp = '@([^;"]+)?(?<!'.$allow.'):(?!\/\/(.+?)\/)((.*?)[^;"]+)(;)?@is';
print str_replace('<', '<', $regexp).'<br/><br/>';

$out = preg_replace($regexp, '', $string);

/* Now lets get rid of any unwanted empty style attributes */

$out = preg_replace('@[a-z]*=""@is', '', $out);

print $out;

?>

This should produce the following:

<span style="font-family:arial; text-decoration:underline;" width="200">Hello there</span> This is some <div >test text</div>

Now, I'm a relative newbie at this so I'm sure there's a better way to do it. There's *always* a better way.
elaineseery at hotmail dot com
15-Feb-2006 10:44
if you're new to this function, and getting an error like
'delimiter must not alphanumeric backslash ...

note that whatever is in $pattern (and only $pattern, not $string, or $replacement) must be enclosed by '/  /' (note the forward slashes)

e.g.
$pattern = '/and/';
$replacement = 'sandy';
$string = 'me and mine';

generates 'me sandy mine'

seems to be obvious to everyone else, but took me a while to figure out!!
jsirovic at gmale dot com
08-Feb-2006 01:23
If the lack of &$count is aggravating in PHP 4.x, try this:

$replaces = 0;

$return .= preg_replace('/(\b' . $substr . ')/ie', '"<$tag>$1<$end_tag>" . (substr($replaces++,0,0))', $s2, $limit);
no-spam@idiot^org^ru
05-Feb-2006 04:21
decodes ie`s escape() result

<?

function unicode_unescape(&$var, $convert_to_cp1251 = false){
   $var = preg_replace(
       '#%u([\da-fA-F]{4})#mse',
       $convert_to_cp1251 ? '@iconv("utf-16","windows-1251",pack("H*","\1"))' : 'pack("H*","\1")',
       $var
   );
}

//

$str = 'to %u043B%u043E%u043F%u0430%u0442%u0430 or not to %u043B%u043E%u043F%u0430%u0442%u0430';

unicode_unescape($str, true);

echo $str;

?>
leandro[--]ico[at]gm[--]ail[dot]com
05-Feb-2006 01:40
I've found out a really odd error.

When I try to use the 'empty' function in the replacement string (when using the 'e' modifier, of course) the regexp interpreter get stucked at that point.

An examplo of this failure:

<?php
echo $test = preg_replace( "/(bla)/e", "empty(123)", "bla bla ble" );

# it should print something like:
# "1 1 ble"
?>

Very odd, huh?
04-Feb-2006 12:00
fairly useful script to replace normal html entities with ordinal-value entities.  Useful for writing to xml documents where entities aren't defined.
<?php
$p='#(\&[\w]+;)#e';
$r="'&#'.ord(html_entity_decode('$1')).';'";
$text=preg_replace($p,$r,$_POST['data']);
?>
Rebort
03-Feb-2006 03:51
Following up on pietjeprik at gmail dot com's great string to parse [url] bbcode:
<?php
$url = '[url=http://www.foo.org]The link[/url]';
$text = preg_replace("/\[url=(\W?)(.*?)(\W?)\](.*?)\[\/url\]/", '<a href="$2">$4</a>', $url);
?>

This allows for the user to enter variations:

[url=http://www.foo.org]The link[/url]
[url="http://www.foo.org"]The link[/url]
[url='http://www.foo.org']The link[/url]

or even

[url=#http://www.foo.org#]The link[/url]
[url=!http://www.foo.org!]The link[/url]
01-Feb-2006 02:23
Uh-oh. When I looked at the text in the preview, I had to double the number of backslashes to make it look right.
I'll try again with my original text:

$full_text = preg_replace('/\[p=(\d+)\]/e',
  "\"<a href=\\\"./test.php?person=$1\\\">\"
   .get_name($1).\"</a>\"",
   $short_text);

I hope that it comes out correctly this time :-)
leif at solumslekt dot org
01-Feb-2006 12:24
I've found a use for preg_replace. If you've got eg. a database with persons assiciated with numbers, you may want to input links in a kind of shorthand, like [p=12345], and have it expanded to a full url with a name in it.

This is my solution:

$expanded_text = preg_replace('/\\[p=(\d+)\\]/e',
   "\\"<a href=\\\\\\"./test.php?person=$1\\\\\\">\\".get_name($1).\\"</a&>\\"",
       $short_text);

It took me some time to work out the proper number of quotes and backslashes.

regards, Leif.
SG_01
20-Jan-2006 08:43
Re: wcc at techmonkeys dot org

You could put this in 1 replace for faster execution as well:

<?php

/*
 * Removes all blank lines from a string.
 */
function removeEmptyLines($string)
{
   return preg_replace("/(^[\r\n]*|[\r\n]+)[\s\t]*[\r\n]+/", "\n", $string);
}

?>
05-Jan-2006 06:09
First, I have no idea about regexp, all I did has been through trial and error,
I wrote this function which tries to clean crappy ms word html, I use it to clean user pasted code to online wysiwyg online editors from ms word.
Theres a huge space for improvement, I post it here because after searching I could not find any pure php solution, the best alternative however, is tidy, but for those of us who are still using PHP 4 and do not have access to the server, this could be an alternative, use it under your own risk... once again, it was a quickie and I know there can be much better ways to do this:

function decraper($htm, $delstyles=false) {
   $commoncrap = array('&quot;'
   ,'font-weight: normal;'
   ,'font-style: normal;'
   ,'line-height: normal;'
   ,'font-size-adjust: none;'
   ,'font-stretch: normal;');
   $replace = array("'");
   $htm = str_replace($commoncrap, $replace, $htm);
     $pat = array();
   $rep = array();
   $pat[0] = '/(<table\s.*)(width=)(\d+%)(\D)/i';
   $pat[1] = '/(<td\s.*)(width=)(\d+%)(\D)/i';
   $pat[2] = '/(<th\s.*)(width=)(\d+%)(\D)/i';
   $pat[3] = '/<td( colspan="[0-9]+")?( rowspan="[0-9]+")?
( width="[0-9]+")?( height="[0-9]+")?.*?>/i';
   $pat[4] = '/<tr.*?>/i';
   $pat[5]=
'/<\/st1:address>(<\/st1:\w*>)?
<\/p>[\n\r\s]*<p[\s\w="\']*>/i';
   $pat[6] = '/<o:p.*?>/i';
   $pat[7] = '/<\/o:p>/i';
   $pat[8] = '/<o:SmartTagType[^>]*>/i';
   $pat[9] = '/<st1:[\w\s"=]*>/i';
   $pat[10] = '/<\/st1:\w*>/i';
   $pat[11] = '/<p[^>]*>(.*?)<\/p>/i';
   $pat[12] = '/ style="margin-top: 0cm;"/i';
   $pat[13] = '/<(\w[^>]*) class=([^ |>]*)([^>]*)/i';
   $pat[14] = '/<ul(.*?)>/i';
   $pat[15] = '/<ol(.*?)>/i';
   $pat[17] = '/<br \/>&nbsp;<br \/>/i';
   $pat[18] = '/&nbsp;<br \/>/i';
   $pat[19] = '/<!-.*?>/';
   $pat[20] = '/\s*style=(""|\'\')/';
   $pat[21] = '/ style=[\'"]tab-interval:[^\'"]*[\'"]/i';
   $pat[22] = '/behavior:[^;\'"]*;*(\n|\r)*/i';
   $pat[23] = '/mso-[^:]*:"[^"]*";/i';
   $pat[24] = '/mso-[^;\'"]*;*(\n|\r)*/i';
   $pat[25] = '/\s*font-family:[^;"]*;?/i';
   $pat[26] = '/margin[^"\';]*;?/i';
   $pat[27] = '/text-indent[^"\';]*;?/i';
   $pat[28] = '/tab-stops:[^\'";]*;?/i';
   $pat[29] = '/border-color: *([^;\'"]*)/i';
   $pat[30] = '/border-collapse: *([^;\'"]*)/i';
   $pat[31] = '/page-break-before: *([^;\'"]*)/i';
   $pat[32] = '/font-variant: *([^;\'"]*)/i';
   $pat[33] = '/<span [^>]*><br \/><\/span><br \/>/i';
   $pat[34] = '/" "/';
   $pat[35] = '/[\t\r\n]/';
   $pat[36] = '/\s\s/s';
   $pat[37] = '/ style=""/';
   $pat[38] = '/<span>(.*?)<\/span>/i';
//empty (no attribs) spans
   $pat[39] = '/<span>(.*?)<\/span>/i';
//twice, nested spans
   $pat[40] = '/(;\s|\s;)/';
   $pat[41] = '/;;/';
   $pat[42] = '/";/';
   $pat[43] = '/<li(.*?)>/i';
   $pat[44] =
'/(<\/b><b>|<\/i><i>|<\/em><em>|
<\/u><u>|<\/strong><strong>)/i';
   $rep[0] = '$1$2"$3"$4';
   $rep[1] = '$1$2"$3"$4';
   $rep[2] = '$1$2"$3"$4';
   $rep[3] = '<td$1$2$3$4>';
   $rep[4] = '<tr>';
   $rep[5] = '<br />';
   $rep[6] = '';
   $rep[7] = '<br />';
   $rep[8] = '';
   $rep[9] = '';
   $rep[10] = '';
   $rep[11] = '$1<br />';
   $rep[12] = '';
   $rep[13] = '<$1$3';
   $rep[14] = '<ul>';
   $rep[15] = '<ol>';
   $rep[17] = '<br />';
   $rep[18] = '<br />';
   $rep[19] = '';
   $rep[20] = '';
   $rep[21] = '';
   $rep[22] = '';
   $rep[23] = '';
   $rep[24] = '';
   $rep[25] = '';
   $rep[26] = '';
   $rep[27] = '';
   $rep[28] = '';
   $rep[29] = '';
   $rep[30] = '';
   $rep[31] = '';
   $rep[32] = '';
   $rep[33] = '<br />';
   $rep[34] = '""';
   $rep[35] = '';
   $rep[36] = '';
   $rep[37] = '';
   $rep[38] = '$1';
   $rep[39] = '$1';
   $rep[40] = ';';
   $rep[41] = ';';
   $rep[42] = '"';
   $rep[43] = '<li>';
   $rep[44] = '';
   if($delstyles===true){
       $pat[50] = '/ style=".*?"/';
       $rep[50] = '';
   }
   ksort($pat);
   ksort($rep);
   return $htm;
}

Hope it helps, critics are more than welcome.
kyle at vivahate dot com
23-Dec-2005 04:08
Here is a regular expression to "slashdotify" html links.  This has worked well for me, but if anyone spots errors, feel free to make corrections.

<?php
$url = '<a attr="garbage" href="http://us3.php.net/preg_replace">preg_replace - php.net</a>';
$url = preg_replace( '/<.*href="?(.*:\/\/)?([^ \/]*)([^ >"]*)"?[^>]*>(.*)(<\/a>)/', '<a href="$1$2$3">$4</a> [$2]', $url );
?>

Will output:

<a href="http://us3.php.net/preg_replace">preg_replace - php.net</a> [us3.php.net]
istvan dot csiszar at weblab dot hu
21-Dec-2005 05:53
This is an addition to the previously sent removeEvilTags function. If you don't want to remove the style tag entirely, just certain style attributes within that, then you might find this piece of code useful:

<?php

function removeEvilStyles($tagSource)
{
   // this will leave everything else, but:
   $evilStyles = array('font', 'font-family', 'font-face', 'font-size', 'font-size-adjust', 'font-stretch', 'font-variant');

   $find = array();
   $replace = array();
  
   foreach ($evilStyles as $v)
   {
       $find[]    = "/$v:.*?;/";
       $replace[] = '';
   }
  
   return preg_replace($find, $replace, $tagSource);
}

function removeEvilTags($source)
{
   $allowedTags = '<h1><h2><h3><h4><h5><a><img><label>'.
       '<p><br><span><sup><sub><ul><li><ol>'.
       '<table><tr><td><th><tbody><div><hr><em><b><i>';
   $source = strip_tags(stripslashes($source), $allowedTags);
   return trim(preg_replace('/<(.*?)>/ie', "'<'.removeEvilStyles('\\1').'>'", $source));
}

?>
triphere
18-Dec-2005 01:13
to remove Bulletin Board Code (remove bbcode)

$body = preg_replace("[\[(.*?)\]]", "", $body);
jcheger at acytec dot com
09-Dec-2005 04:16
Escaping quotes may be very tricky. Magic quotes and preg_quote are not protected against double escaping. This means that an escaped quote will get a double backslash, or even more. preg_quote ("I\'m using regex") will return "I\\'m using regex".

The following example escapes only unescaped single quotes:

<?php
$a = "I'm using regex";
$b = "I\'m using regex";

$patt = "/(?<!\\\)\'/";
$repl = "\\'";

print "a:  ".preg_replace ($patt, $repl, $a)."\n";
print "b:  ".preg_replace ($patt, $repl, $b)."\n";
?>

and prints:
a:  I\'m using regex
b:  I\'m using regex

Remark: matching a backslashe require a triple backslash (\\\).
urbanheroes {at} gmail {dot} com
16-Aug-2005 04:00
Here are two functions to trim a string down to a certain size.

"wordLimit" trims a string down to a certain number of words, and adds an ellipsis after the last word, or returns the string if the limit is larger than the number of words in the string.

"stringLimit" trims a string down to a certain number of characters, and adds an ellipsis after the last word, without truncating any words in the middle (it will instead leave it out), or returns the string if the limit is larger than the string size. The length of a string will INCLUDE the length of the ellipsis.

<?php

function wordLimit($string, $length = 50, $ellipsis = '...') {
   return count($words = preg_split('/\s+/', ltrim($string), $length + 1)) > $length ?
       rtrim(substr($string, 0, strlen($string) - strlen(end($words)))) . $ellipsis :
       $string;
}

function stringLimit($string, $length = 50, $ellipsis = '...') {
   return strlen($fragment = substr($string, 0, $length + 1 - strlen($ellipsis))) < strlen($string) + 1 ?
       preg_replace('/\s*\S*$/', '', $fragment) . $ellipsis : $string;
}

echo wordLimit('  You can limit a string to only so many words.', 6);
// Output: "You can limit a string to..."
echo stringLimit('Or you can limit a string to a certain amount of characters.', 32);
// Output: "Or you can limit a string to..."

?>
avizion at relay dot dk
25-Apr-2005 03:04
Just a note for all FreeBSD users wondering why this function is not present after installing php / mod_php (4 and 5) from ports.

Remember to install:

/usr/ports/devel/php4-pcre (or 5 for -- 5 ;)

That's all... enjoy - and save 30 mins. like I could have used :D
jhm at cotren dot net
19-Feb-2005 06:04
It took me a while to figure this one out, but here is a nice way to use preg_replace to convert a hex encoded string back to clear text

<?php
   $text = "PHP rocks!";
   $encoded = preg_replace(
           "'(.)'e"
         ,"dechex(ord('\\1'))"
         ,$text
   );
   print "ENCODED: $encoded\n";
?>
ENCODED: 50485020726f636b7321
<?php
   print "DECODED: ".preg_replace(
       "'([\S,\d]{2})'e"
     ,"chr(hexdec('\\1'))"
     ,$encoded)."\n";
?>
DECODED: PHP rocks!
gbaatard at iinet dot net dot au
15-Feb-2005 01:56
on the topic of implementing forum code ([b][/b] to <b></b> etc), i found this worked well...

<?php
$body = preg_replace('/\[([biu])\]/i', '<\\1>', $body);
$body = preg_replace('/\[\/([biu])\]/i', '</\\1>', $body);
?>

First line replaces [b] [B] [i] [I] [u] [U] with the appropriate html tags(<b>, <i>, <u>)

Second one does the same for closing tags...

For urls, I use...

<?php
$body = preg_replace('/\s(\w+:\/\/)(\S+)/', ' <a href="\\1\\2" target="_blank">\\1\\2</a>', $body);
?>

and for urls starting with www., i use...

<?php
$body = preg_replace('/\s(www\.)(\S+)/', ' <a href="http://\\1\\2" target="_blank">\\1\\2</a>', $body);
?>

Pop all these lines into a function that receives and returns the text you want 'forum coded' and away you go:)
tash at quakersnet dot com
30-Jan-2005 08:25
A better way for link & email conversaion, i think. :)

<?php
function change_string($str)
   {
     $str = trim($str);
     $str = htmlspecialchars($str);
     $str = preg_replace('#(.*)\@(.*)\.(.*)#','<a href="mailto:\\1@\\2.\\3">Send email</a>',$str);
     $str = preg_replace('=([^\s]*)(www.)([^\s]*)=','<a href="http://\\2\\3" target=\'_new\'>\\2\\3</a>',$str);
     return $str;
   }
?>
jw-php at valleyfree dot com
26-Jan-2005 12:28
note the that if you want to replace all backslashes in a string with double backslashes (like addslashes() does but just for backslashes and not quotes, etc), you'll need the following:

$new = preg_replace('/\\\\/','\\\\\\\\',$old);

note the pattern uses 4 backslashes and the replacement uses 8!  the reason for 4 slashses in the pattern part has already been explained on this page, but nobody has yet mentioned the need for the same logic in the replacement part in which backslashes are also doubly parsed, once by PHP and once by the PCRE extension.  so the eight slashes break down to four slashes sent to PCRE, then two slashes put in the final output.
Nick
21-Jan-2005 07:05
Here is a more secure version of the link conversion code which hopefully make cross site scripting attacks more difficult.

<?php
function convert_links($str) {
       $replace = <<<EOPHP
'<a href="'.htmlentities('\\1').htmlentities('\\2').//remove line break
'">'.htmlentities('\\1').htmlentities('\\2').'</a>'
EOPHP;
   $str = preg_replace('#(http://)([^\s]*)#e', $replace, $str);
   return $str;
}
?>
ignacio paz posse
22-Oct-2004 04:22
I needed to treat exclusively long urls and not shorter ones for which my client prefered to have their complete addresses displayed. Here's the function I end up with:

<?php
function auto_url($txt){

  # (1) catch those with url larger than 71 characters
  $pat = '/(http|ftp)+(?:s)?:(\\/\\/)'
       .'((\\w|\\.)+)(\\/)?(\\S){71,}/i';
  $txt = preg_replace($pat, "<a href=\"\\0\" target=\"_blank\">$1$2$3/...</a>",
$txt);

  # (2) replace the other short urls provided that they are not contained inside an html tag already.
  $pat = '/(?<!href=\")(http|ftp)+(s)?:' .
     .'(\\/\\/)((\\w|\\.)+) (\\/)?(\\S)/i';
  $txt = preg_replace($pat,"<a href=\"$0\" target=\"_blank\">$0</a> ",
  $txt);

  return $txt;
}
?>
Note the negative look behind expression added in the second instance for exempting those that are preceded with ' href=" ' (meaning that they were already put inside appropiate html tags by the previous expression)

(get rid of the space between question mark and the last parenthesis group in both regex, I need to put it like that to be able to post this comment)
gabe at mudbuginfo dot com
19-Oct-2004 04:39
It is useful to note that the 'limit' parameter, when used with 'pattern' and 'replace' which are arrays, applies to each individual pattern in the patterns array, and not the entire array.
<?php

$pattern = array('/one/', '/two/');
$replace = array('uno', 'dos');
$subject = "test one, one two, one two three";

echo preg_replace($pattern, $replace, $subject, 1);
?>

If limit were applied to the whole array (which it isn't), it would return:
test uno, one two, one two three

However, in reality this will actually return:
test uno, one dos, one two three
silasjpalmer at optusnet dot com dot au
19-Mar-2004 10:00
Using preg_rep to return extracts without breaking the middle of words
(useful for search results)

<?php
$string = "Don't split words";
echo substr($string, 0, 10); // Returns "Don't spli"

$pattern = "/(^.{0,10})(\W+.*$)/";
$replacement = "\${1}";
echo preg_replace($pattern, $replacement, $string); // Returns "Don't"
?>
j-AT-jcornelius-DOT-com
25-Feb-2004 05:02
I noticed that a lot of talk here is about parsing URLs. Try the
parse_url() function in PHP to make things easier.

http://www.php.net/manual/en/function.parse-url.php

- J.
steven -a-t- acko dot net
09-Feb-2004 01:45
People using the /e modifier with preg_replace should be aware of the following weird behaviour. It is not a bug per se, but can cause bugs if you don't know it's there.

The example in the docs for /e suffers from this mistake in fact.

With /e, the replacement string is a PHP expression. So when you use a backreference in the replacement expression, you need to put the backreference inside quotes, or otherwise it would be interpreted as PHP code. Like the example from the manual for preg_replace:

preg_replace("/(<\/?)(\w+)([^>]*>)/e",
             "'\\1'.strtoupper('\\2').'\\3'",
             $html_body);

To make this easier, the data in a backreference with /e is run through addslashes() before being inserted in your replacement expression. So if you have the string

 He said: "You're here"

It would become:

 He said: \"You\'re here\"

...and be inserted into the expression.
However, if you put this inside a set of single quotes, PHP will not strip away all the slashes correctly! Try this:

 print ' He said: \"You\'re here\" ';
 Output: He said: \"You're here\"

This is because the sequence \" inside single quotes is not recognized as anything special, and it is output literally.

Using double-quotes to surround the string/backreference will not help either, because inside double-quotes, the sequence \' is not recognized and also output literally. And in fact, if you have any dollar signs in your data, they would be interpreted as PHP variables. So double-quotes are not an option.

The 'solution' is to manually fix it in your expression. It is easiest to use a separate processing function, and do the replacing there (i.e. use "my_processing_function('\\1')" or something similar as replacement expression, and do the fixing in that function).

If you surrounded your backreference by single-quotes, the double-quotes are corrupt:
$text = str_replace('\"', '"', $text);

People using preg_replace with /e should at least be aware of this.

I'm not sure how it would be best fixed in preg_replace. Because double-quotes are a really bad idea anyway (due to the variable expansion), I would suggest that preg_replace's auto-escaping is modified to suit the placement of backreferences inside single-quotes (which seemed to be the intention from the start, but was incorrectly applied).
Peter
02-Nov-2003 09:00
Suppose you want to match '\n' (that's backslash-n, not newline). The pattern you want is not /\\n/ but /\\\\n/. The reason for this is that before the regex engine can interpret the \\ into \, PHP interprets it. Thus, if you write the first, the regex engine sees \n, which is reads as newline. Thus, you have to escape your backslashes twice: once for PHP, and once for the regex engine.
Travis
18-Oct-2003 06:37
I spent some time fighting with this, so hopefully this will help someone else.

Escaping a backslash (\) really involves not two, not three, but four backslashes to work properly.

So to match a single backslash, one should use:

preg_replace('/(\\\\)/', ...);

or to, say, escape single quotes not already escaped, one could write:

preg_replace("/([^\\\\])'/", "\$1\'", ...);

Anything else, such as the seemingly correct

preg_replace("/([^\\])'/", "\$1\'", ...);

gets evaluated as escaping the ] and resulting in an unterminated character class.

I'm not exactly clear on this issue of backslash proliferation, but it seems to involve the combination of PHP string processing and PCRE processing.
posted @ 2013-07-20 18:06  凌少  阅读(4166)  评论(0编辑  收藏  举报