PHP过滤各种HTML标签的表达式,值得收藏
1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
37
38
39
40
41
42
43
44
45
46
47
48
49
50
51
52
53
54
55
56
57
58
59
|
$str =preg_replace( "/<\s*img\s+[^>]*?src\s*=\s*(\'|\")(.*?)\\1[^>]*?\/?\s*>/i" , " " , $str ); //过滤img标签 $str =preg_replace( "/\s+/" , " " , $str ); //过滤多余回车 $str =preg_replace( "/<[ ]+/si" , "<" , $str ); //过滤<__("<"号后面带空格) $str =preg_replace( "/<\!--.*?-->/si" , "" , $str ); //注释 $str =preg_replace( "/<(\!.*?)>/si" , "" , $str ); //过滤DOCTYPE $str =preg_replace( "/<(\/?html.*?)>/si" , "" , $str ); //过滤html标签 $str =preg_replace( "/<(\/?head.*?)>/si" , "" , $str ); //过滤head标签 $str =preg_replace( "/<(\/?meta.*?)>/si" , "" , $str ); //过滤meta标签 $str =preg_replace( "/<(\/?body.*?)>/si" , "" , $str ); //过滤body标签 $str =preg_replace( "/<(\/?link.*?)>/si" , "" , $str ); //过滤link标签 $str =preg_replace( "/<(\/?form.*?)>/si" , "" , $str ); //过滤form标签 $str =preg_replace( "/cookie/si" , "COOKIE" , $str ); //过滤COOKIE标签 $str =preg_replace( "/<(applet.*?)>(.*?)<(\/applet.*?)>/si" , "" , $str ); //过滤applet标签 $str =preg_replace( "/<(\/?applet.*?)>/si" , "" , $str ); //过滤applet标签 $str =preg_replace( "/<(style.*?)>(.*?)<(\/style.*?)>/si" , "" , $str ); //过滤style标签 $str =preg_replace( "/<(\/?style.*?)>/si" , "" , $str ); //过滤style标签 $str =preg_replace( "/<(title.*?)>(.*?)<(\/title.*?)>/si" , "" , $str ); //过滤title标签 $str =preg_replace( "/<(\/?title.*?)>/si" , "" , $str ); //过滤title标签 $str =preg_replace( "/<(object.*?)>(.*?)<(\/object.*?)>/si" , "" , $str ); //过滤object标签 $str =preg_replace( "/<(\/?objec.*?)>/si" , "" , $str ); //过滤object标签 $str =preg_replace( "/<(noframes.*?)>(.*?)<(\/noframes.*?)>/si" , "" , $str ); //过滤noframes标签 $str =preg_replace( "/<(\/?noframes.*?)>/si" , "" , $str ); //过滤noframes标签 $str =preg_replace( "/<(i?frame.*?)>(.*?)<(\/i?frame.*?)>/si" , "" , $str ); //过滤frame标签 $str =preg_replace( "/<(\/?i?frame.*?)>/si" , "" , $str ); //过滤frame标签 $str =preg_replace( "/<(script.*?)>(.*?)<(\/script.*?)>/si" , "" , $str ); //过滤script标签 $str =preg_replace( "/<(\/?script.*?)>/si" , "" , $str ); //过滤script标签 $str =preg_replace( "/javascript/si" , "Javascript" , $str ); //过滤script标签 $str =preg_replace( "/vbscript/si" , "Vbscript" , $str ); //过滤script标签 $str =preg_replace( "/on([a-z]+)\s*=/si" , "On\\1=" , $str ); //过滤script标签 $str =preg_replace( "/&#/si" , "&#" , $str ); //过滤script标签 |
如果仅仅是过滤一整篇篇html文档,
比如说head这样的整篇文档中所有的html标签,则可以用 php的 strip_tags()函数来脱掉所有的html或php标签。
需要注意的是:strip_tags()函数是不怎么验证标签的完整性的,也就说如果缺少一个或者其他的标签,则很有可能导致有更多的内容被脱掉。