beautifulsoup中常用的方法findall与find,清楚这俩个方法的关系和用法
其中还有
.children标签
from urllib.request import urlopen
from bs4 import BeautifulSoup
html = urlopen("http://www.pythonscraping.com/pages/page3.html")
bsObj = BeautifulSoup(html)
for child in bsObj.find("table",{"id":"giftList"}).children:
print(child)
兄弟标签next_siblings()
from urllib.request import urlopen
from bs4 import BeautifulSoup
html = urlopen("http://www.pythonscraping.com/pages/page3.html")
bsObj = BeautifulSoup(html)
for sibling in bsObj.find("table",{"id":"giftList"}).tr.next_siblings:
print(sibling)
这里通过上述的方法找到div class=pl2下的 a标签下的title
# coding=utf-8
from urllib2 import urlopen
from bs4 import BeautifulSoup
import re
html = urlopen("https://book.douban.com/top250?start=0")
bsObj = BeautifulSoup(html)
for link in bsObj.findAll("div",attrs={"class":"pl2"}):
name=link.find("a")
print name.get('title')
如果改成
for link in bsObj.findAll("div",attrs={"class":"pl2"}):
name=link.findAll("a")
print name[0].get('title')
效果是一样的
还能通过name.text获取a标签中的文本内容
.get('href')
.val等方法获取各种属性