htm2txt

1.安装BeautifulSoup

pip install beautifulsoup4

2.读取htm文件

htmcontent = open(path,'r').read()

soup = BeautifulSoup(htmcontent)

htmcontent = soup.get_text()

posted @ 2016-12-23 16:11  levyleo  阅读(195)  评论(0编辑  收藏  举报