python网络数据采集之beautifulsoup

beautifulsoup中常用的方法findall与find，清楚这俩个方法的关系和用法
其中还有  
.children标签

from urllib.request import urlopen
from bs4 import BeautifulSoup
html = urlopen("http://www.pythonscraping.com/pages/page3.html")
bsObj = BeautifulSoup(html)
for child in bsObj.find("table",{"id":"giftList"}).children:
print(child)

兄弟标签next_siblings()

from urllib.request import urlopen
from bs4 import BeautifulSoup
html = urlopen("http://www.pythonscraping.com/pages/page3.html")
bsObj = BeautifulSoup(html)
for sibling in bsObj.find("table",{"id":"giftList"}).tr.next_siblings:
print(sibling)



这里通过上述的方法找到div class=pl2下的 a标签下的title

# coding=utf-8
from urllib2 import urlopen
from bs4 import BeautifulSoup
import re

html = urlopen("https://book.douban.com/top250?start=0")
bsObj = BeautifulSoup(html)

for link in bsObj.findAll("div",attrs={"class":"pl2"}):
    name=link.find("a")
    print name.get('title')

如果改成

for link in bsObj.findAll("div",attrs={"class":"pl2"}):
    name=link.findAll("a")
    print name[0].get('title')
效果是一样的

还能通过name.text获取a标签中的文本内容
.get('href')
.val等方法获取各种属性

posted @ 2016-11-07 19:50 进击的大乐阅读(1120) 评论(0) 收藏举报

刷新页面返回顶部

进击的大乐

python网络数据采集之beautifulsoup

公告