BeautifulSoup 安装使用
Linux环境
1. 安装
方法一:
下载:http://www.crummy.com/software/BeautifulSoup/bs4/download/4.2/
解压:tar -xzvf beautifulsoup4-4.2.0.tar.gz
安装:进入解压后的目录
python setup.py build
sudo python setup.py install
方法二(快速安装)
(Ubuntu) sudo apt-get install python-bs4
或者
install beautifulsoup4
或着
easy_install beautifulsoup4
2. 引用(python环境下)
from bs4 import BeautifulSoup
3. 使用
案例
html_doc = """ <html><head><title>The Dormouse's story</title></head> <p class="title"><b>The Dormouse's story</b></p> <p class="story">Once upon a time there were three little sisters; and their names were <a href="http://example.com/elsie" class="sister" id="link1">Elsie</a>, <a href="http://example.com/lacie" class="sister" id="link2">Lacie</a> and <a href="http://example.com/tillie" class="sister" id="link3">Tillie</a>; and they lived at the bottom of a well.</p> <p class="story">...</p> """
开始
from bs4 import BeautifulSoup soup = BeautifulSoup(html_doc)
>>> soup.head() [<title>The Dormouse's story</title>]
>>> soup.title <title>The Dormouse's story</title> >>> soup.title.string u"The Dormouse's story"
>>> soup.body.b <b>The Dormouse's story</b> >>> soup.body.b.string u"The Dormouse's story"
>>> soup.a <a class="sister" href="http://example.com/elsie" id="link1">Elsie</a>
找到所有的a
soup.find_all('a') [<a class="sister" href="http://example.com/elsie" id="link1">Elsie</a>, <a class="sister" href="http://example.com/lacie" id="link2">Lacie</a>, <a class="sister" href="http://example.com/tillie" id="link3">Tillie</a>]
打印每个a中的信息
>>> for key in soup.find_all('a'): ... print key.get('class'), key.get("href") ... ['sister'] http://example.com/elsie ['sister'] http://example.com/lacie ['sister'] http://example.com/tillie
参考