bs4使用

HTML解析库beautifulsoup4

安装:pip install beautifulsoup4

下面是一段例子代码:

 1 html_doc = """
 2 <html><head><title>The Dormouse's story</title></head>
 3 <body>
 4 <p class="title"><b>The Dormouse's story</b></p>
 5 
 6 <p class="story">Once upon a time there were three little sisters; and their names were
 7 <a href="http://example.com/elsie" class="sister" id="link1">Elsie</a>,
 8 <a href="http://example.com/lacie" class="sister" id="link2">Lacie</a> and
 9 <a href="http://example.com/tillie" class="sister" id="link3">Tillie</a>;
10 and they lived at the bottom of a well.</p>
11 
12 <p class="story">...</p>
13 """

我们获取的网页数据通常会像上面这样是完全的字符串格式,所以我们首先需要使用BeautifulSoup来解析这段字符串。然后会获得一个BeautifulSoup对象,通过这个对象我们就可以进行一系列操作了

使用:
指定解析器:一般用“”lxml“” 需要安装    pip install lxml 安装完成直接用即可,不需要导入

soup = BeautifulSoup(html_doc, "lxml")

节点对象:
BeautifulSoup将复杂HTML文档转换成一个复杂的树形结构,每个节点都是Python对象,所有对象可以归纳为4种:TagNavigableStringBeautifulSoupComment

tag

tag 就是标签的意思

posted @ 2018-11-13 07:51  kanglun  阅读(170)  评论(0编辑  收藏  举报