python开发_html_html处理

'''
   python中,html模块提供了只提供了一个方法:
   html.escape(s, quote = True)
       该方法主要是把html文件中的特殊字符(&,<,>,",'等)转换为HTML-safe字符
'''

下面是我做的一个demo:

运行效果

Python 3.3.2 (v3.3.2:d047928ae3f6, May 16 2013, 00:03:43) [MSC v.1600 32 bit (Intel)] on win32
Type "copyright", "credits" or "license()" for more information.
>>> ================================ RESTART ================================
>>> 
源html文件:
<!DOCTYPE HTML PUBLIC "-//W3C//DTD HTML 4.01 Transitional//EN" "http://www.w3.org/TR/html4/loose.dtd">
<html>
 <head>
  <title> Python Html module </title>
  <meta name="Generator" content="EditPlus">
  <meta name="Author" content="Hongten">
  <meta name="Keywords" content="hongten,python">
  <meta name="Description" content="this blogs is about python">
 </head>

 <body>
    <table border = "1">
        <tr>
            <td>
            Author
            </td>
            <td>
            Hongten
            </td>
            <td>
            Mail
            </td>
            <td>
            hongtenzone@foxmail.com
            </td>
        </tr>
        <tr>
            <td>
            Blos
            </td>
            <td>
            <a href="http://www.blogs.com/hongten">http://www.blogs.com/hongten</a>
            </td>
            <td>
            QQ
            </td>
            <td>
            648719819
            </td>
        </tr>
    </table>
 </body>
</html>
    
##################################################
转换html文件:
&lt;!DOCTYPE HTML PUBLIC &quot;-//W3C//DTD HTML 4.01 Transitional//EN&quot; &quot;http://www.w3.org/TR/html4/loose.dtd&quot;&gt;
&lt;html&gt;
 &lt;head&gt;
  &lt;title&gt; Python Html module &lt;/title&gt;
  &lt;meta name=&quot;Generator&quot; content=&quot;EditPlus&quot;&gt;
  &lt;meta name=&quot;Author&quot; content=&quot;Hongten&quot;&gt;
  &lt;meta name=&quot;Keywords&quot; content=&quot;hongten,python&quot;&gt;
  &lt;meta name=&quot;Description&quot; content=&quot;this blogs is about python&quot;&gt;
 &lt;/head&gt;

 &lt;body&gt;
    &lt;table border = &quot;1&quot;&gt;
        &lt;tr&gt;
            &lt;td&gt;
            Author
            &lt;/td&gt;
            &lt;td&gt;
            Hongten
            &lt;/td&gt;
            &lt;td&gt;
            Mail
            &lt;/td&gt;
            &lt;td&gt;
            hongtenzone@foxmail.com
            &lt;/td&gt;
        &lt;/tr&gt;
        &lt;tr&gt;
            &lt;td&gt;
            Blos
            &lt;/td&gt;
            &lt;td&gt;
            &lt;a href=&quot;http://www.blogs.com/hongten&quot;&gt;http://www.blogs.com/hongten&lt;/a&gt;
            &lt;/td&gt;
            &lt;td&gt;
            QQ
            &lt;/td&gt;
            &lt;td&gt;
            648719819
            &lt;/td&gt;
        &lt;/tr&gt;
    &lt;/table&gt;
 &lt;/body&gt;
&lt;/html&gt;
    
>>> 

经过源文件内容和转换后的内容相比较,我想你知道html.escape()方法的作用了吧

================================================

代码部分:

================================================

 1 #python html
 2 
 3 #Author  : Hongten
 4 #Mailto  : hongtenzone@foxmail.com
 5 #Blog    : http://www.cnblogs.com/hongten
 6 #QQ      : 648719819
 7 #Create  : 2013-08-26
 8 #Version : 1.0
 9 
10 import html
11 
12 '''
13    python中,html模块提供了只提供了一个方法:
14    html.escape(s, quote = True)
15        该方法主要是把html文件中的特殊字符(&,<,>,",'等)转换为HTML-safe字符
16 '''
17 
18 #global var
19 #html源文件内容
20 HTML_STR = ''
21 
22 def html_escape(html_str):
23     '''转换特殊字符'''
24     return html.escape(html_str)
25 
26 def init():
27     global HTML_STR
28     HTML_STR = '''
29 <!DOCTYPE HTML PUBLIC "-//W3C//DTD HTML 4.01 Transitional//EN" "http://www.w3.org/TR/html4/loose.dtd">
30 <html>
31  <head>
32   <title> Python Html module </title>
33   <meta name="Generator" content="EditPlus">
34   <meta name="Author" content="Hongten">
35   <meta name="Keywords" content="hongten,python">
36   <meta name="Description" content="this blogs is about python">
37  </head>
38 
39  <body>
40     <table border = "1">
41         <tr>
42             <td>
43             Author
44             </td>
45             <td>
46             Hongten
47             </td>
48             <td>
49             Mail
50             </td>
51             <td>
52             hongtenzone@foxmail.com
53             </td>
54         </tr>
55         <tr>
56             <td>
57             Blos
58             </td>
59             <td>
60             <a href="http://www.blogs.com/hongten">http://www.blogs.com/hongten</a>
61             </td>
62             <td>
63             QQ
64             </td>
65             <td>
66             648719819
67             </td>
68         </tr>
69     </table>
70  </body>
71 </html>
72     '''
73 
74 def main():
75     init()
76     print('源html文件:{}'.format(HTML_STR))
77     print('#' * 50)
78     old_str = html_escape(HTML_STR)
79     print('转换html文件:{}'.format(old_str))
80 
81 if __name__ == '__main__':
82     main()

 

posted @ 2013-08-26 16:20  Hongten  阅读(11122)  评论(0编辑  收藏  举报
Fork me on GitHub