python过滤文件中特殊标签
Beautiful Soup
Beautiful Soup 是用Python写的一个HTML/XML的解析器,它可以很好的处理不规范标记并生成剖析树(parse tree)。 它提供简单又常用的导航(navigating),搜索以及修改剖析树的操作。它可以大大节省你的编程时间。 对于Ruby,使用Rubyful Soup。
https://www.crummy.com/software/BeautifulSoup/bs4/doc.zh/
# 添加文章,并且过滤文章内容
def add_artical(request, username): if request.method == "POST": user = request.user artical_title = request.POST.get("artical_title") artical_content = request.POST.get("artical_content") # desc = artical_content[0:150] # 解释html标签 from bs4 import BeautifulSoup # html.parser为解析器,是python标准库 bs = BeautifulSoup(artical_content, "html.parser") desc = bs.text[0:150] + "..." # 过滤非法标签 for tag in bs.find_all(): if tag.name in ["script", "link"]: # 将该非法标签从对象中移除 tag.decompose() # 打印结果为"123 <class 'bs4.BeautifulSoup'>" print(bs,type(bs)) try: artical_obj = models.Artical.objects.create(user=user, desc=desc, title=artical_title) models.ArticalDetail.objects.create(content=str(bs), artical=artical_obj) except: return HttpResponse("更新文章失败 ") return HttpResponse("添加成功") return render(request, "add_artical.html")