Strip HTML tags using Python

Strip HTML tags using Python

We often need to strip HTML tags from string (or HTML source). I usually do it using a simple regular expression in Python. Here is my function to strip HTML tags:

def remove_html_tags(data):
p = re.compile(r'<.*?>')
return p.sub('', data)
Here is another function to remove more than one consecutive white spaces:

def remove_extra_spaces(data):
p = re.compile(r'\s+')
return p.sub(' ', data)
Note that re module needs to be imported in order to use regular expression.

posted on 2013-01-15 16:29  misoag  阅读(232)  评论(0编辑  收藏  举报

导航