python开发_html_html处理
''' python中,html模块提供了只提供了一个方法: html.escape(s, quote = True) 该方法主要是把html文件中的特殊字符(&,<,>,",'等)转换为HTML-safe字符 '''
下面是我做的一个demo:
运行效果:
Python 3.3.2 (v3.3.2:d047928ae3f6, May 16 2013, 00:03:43) [MSC v.1600 32 bit (Intel)] on win32 Type "copyright", "credits" or "license()" for more information. >>> ================================ RESTART ================================ >>> 源html文件: <!DOCTYPE HTML PUBLIC "-//W3C//DTD HTML 4.01 Transitional//EN" "http://www.w3.org/TR/html4/loose.dtd"> <html> <head> <title> Python Html module </title> <meta name="Generator" content="EditPlus"> <meta name="Author" content="Hongten"> <meta name="Keywords" content="hongten,python"> <meta name="Description" content="this blogs is about python"> </head> <body> <table border = "1"> <tr> <td> Author </td> <td> Hongten </td> <td> Mail </td> <td> hongtenzone@foxmail.com </td> </tr> <tr> <td> Blos </td> <td> <a href="http://www.blogs.com/hongten">http://www.blogs.com/hongten</a> </td> <td> QQ </td> <td> 648719819 </td> </tr> </table> </body> </html> ################################################## 转换html文件: <!DOCTYPE HTML PUBLIC "-//W3C//DTD HTML 4.01 Transitional//EN" "http://www.w3.org/TR/html4/loose.dtd"> <html> <head> <title> Python Html module </title> <meta name="Generator" content="EditPlus"> <meta name="Author" content="Hongten"> <meta name="Keywords" content="hongten,python"> <meta name="Description" content="this blogs is about python"> </head> <body> <table border = "1"> <tr> <td> Author </td> <td> Hongten </td> <td> Mail </td> <td> hongtenzone@foxmail.com </td> </tr> <tr> <td> Blos </td> <td> <a href="http://www.blogs.com/hongten">http://www.blogs.com/hongten</a> </td> <td> QQ </td> <td> 648719819 </td> </tr> </table> </body> </html> >>>
经过源文件内容和转换后的内容相比较,我想你知道html.escape()方法的作用了吧
================================================
代码部分:
================================================
1 #python html 2 3 #Author : Hongten 4 #Mailto : hongtenzone@foxmail.com 5 #Blog : http://www.cnblogs.com/hongten 6 #QQ : 648719819 7 #Create : 2013-08-26 8 #Version : 1.0 9 10 import html 11 12 ''' 13 python中,html模块提供了只提供了一个方法: 14 html.escape(s, quote = True) 15 该方法主要是把html文件中的特殊字符(&,<,>,",'等)转换为HTML-safe字符 16 ''' 17 18 #global var 19 #html源文件内容 20 HTML_STR = '' 21 22 def html_escape(html_str): 23 '''转换特殊字符''' 24 return html.escape(html_str) 25 26 def init(): 27 global HTML_STR 28 HTML_STR = ''' 29 <!DOCTYPE HTML PUBLIC "-//W3C//DTD HTML 4.01 Transitional//EN" "http://www.w3.org/TR/html4/loose.dtd"> 30 <html> 31 <head> 32 <title> Python Html module </title> 33 <meta name="Generator" content="EditPlus"> 34 <meta name="Author" content="Hongten"> 35 <meta name="Keywords" content="hongten,python"> 36 <meta name="Description" content="this blogs is about python"> 37 </head> 38 39 <body> 40 <table border = "1"> 41 <tr> 42 <td> 43 Author 44 </td> 45 <td> 46 Hongten 47 </td> 48 <td> 49 Mail 50 </td> 51 <td> 52 hongtenzone@foxmail.com 53 </td> 54 </tr> 55 <tr> 56 <td> 57 Blos 58 </td> 59 <td> 60 <a href="http://www.blogs.com/hongten">http://www.blogs.com/hongten</a> 61 </td> 62 <td> 63 QQ 64 </td> 65 <td> 66 648719819 67 </td> 68 </tr> 69 </table> 70 </body> 71 </html> 72 ''' 73 74 def main(): 75 init() 76 print('源html文件:{}'.format(HTML_STR)) 77 print('#' * 50) 78 old_str = html_escape(HTML_STR) 79 print('转换html文件:{}'.format(old_str)) 80 81 if __name__ == '__main__': 82 main()