python 提取 html中的文字(用于rech text计算文字个数)
https://exceptionshub.com/python-code-to-remove-html-tags-from-a-string-duplicate.html
https://stackoverflow.com/questions/9662346/python-code-to-remove-html-tags-from-a-string
https://tutorialedge.net/python/removing-html-from-string/
https://stackoverflow.com/questions/753052/strip-html-from-strings-in-python (最好)
以下,只使用python标准库
from io import StringIO from html.parser import HTMLParser class MLStripper(HTMLParser): def __init__(self): super().__init__() self.reset() self.strict = False self.convert_charrefs= True self.text = StringIO() def handle_data(self, d): self.text.write(d) def get_data(self): return self.text.getvalue() def strip_tags(html): s = MLStripper() s.feed(html) return s.get_data()