bs4使用

HTML解析库beautifulsoup4

安装：pip install beautifulsoup4

下面是一段例子代码：

 1 html_doc = """
 2 <html><head><title>The Dormouse's story</title></head>
 3 <body>
 4 <p class="title"><b>The Dormouse's story</b></p>
 5 
 6 <p class="story">Once upon a time there were three little sisters; and their names were
 7 <a href="http://example.com/elsie" class="sister" id="link1">Elsie</a>,
 8 <a href="http://example.com/lacie" class="sister" id="link2">Lacie</a> and
 9 <a href="http://example.com/tillie" class="sister" id="link3">Tillie</a>;
10 and they lived at the bottom of a well.</p>
11 
12 <p class="story">...</p>
13 """

我们获取的网页数据通常会像上面这样是完全的字符串格式，所以我们首先需要使用BeautifulSoup来解析这段字符串。然后会获得一个BeautifulSoup对象，通过这个对象我们就可以进行一系列操作了

使用：
指定解析器：一般用“”lxml“” 需要安装 pip install lxml 安装完成直接用即可，不需要导入

soup = BeautifulSoup(html_doc, "lxml")

节点对象：
BeautifulSoup将复杂HTML文档转换成一个复杂的树形结构，每个节点都是Python对象，所有对象可以归纳为4种：Tag，NavigableString，BeautifulSoup，Comment。

tag

tag 就是标签的意思

posted @ 2018-11-13 07:51 kanglun 阅读(178) 评论(0) 收藏举报

刷新页面返回顶部

kanglun

bs4使用

公告