03、书店寻宝(二)
题目要求:你需要爬取的是网上书店Books to ScrapeTravel这类书中,所有书的书名、评分、价格三种信息,并且打印提取到的信息。
1 #3、书店寻宝(二) 2 # 题目要求:你需要爬取的是网上书店Books to ScrapeTravel这类书中,所有书的书名、评分、价格三种信息,并且打印提取到的信息。 3 # 网页URL:http://books.toscrape.com/catalogue/category/books/travel_2/index.html 4 5 import requests 6 from bs4 import BeautifulSoup 7 res = requests.get('http://books.toscrape.com/catalogue/category/books/travel_2/index.html') 8 html = res.text 9 soup = BeautifulSoup(html,'html.parser') 10 items = soup.find_all('article',class_='product_pod') 11 for item in items: 12 print(item.find('h3').find('a')['title']+'\t'+item.find('p')['class'][1],'\t',item.find('p',class_='price_color').text) 13 # print(item.find('h3').find('a')['title']) 14 # print(item.find('p')['class'][1]) 15 # print(item.find('p',class_='price_color').text) 16 17 18 ''' 19 执行结果如下: 20 It's Only the Himalayas Two £45.17 21 Full Moon over Noahâs Ark: An Odyssey to Mount Ararat and Beyond Four £49.43 22 See America: A Celebration of Our National Parks & Treasured Sites Three £48.87 23 Vagabonding: An Uncommon Guide to the Art of Long-Term World Travel Two £36.94 24 Under the Tuscan Sun Three £37.33 25 A Summer In Europe Two £44.34 26 The Great Railway Bazaar One £30.54 27 A Year in Provence (Provence #1) Four £56.88 28 The Road to Little Dribbling: Adventures of an American in Britain (Notes From a Small Island #2) One £23.21 29 Neither Here nor There: Travels in Europe Three £38.95 30 1,000 Places to See Before You Die Five £26.08 31 ''' 32 33 ''' 34 老师的代码 35 36 import requests 37 from bs4 import BeautifulSoup 38 39 res_bookstore = requests.get('http://books.toscrape.com/catalogue/category/books/travel_2/index.html') 40 bs_bookstore = BeautifulSoup(res_bookstore.text,'html.parser') 41 list_books = bs_bookstore.find_all(class_='product_pod') 42 for tag_books in list_books: 43 # 找到a标签需要提取两次 44 tag_name = tag_books.find('h3').find('a') 45 # 这个p标签的class属性有两种:"star-rating",以及具体的几星比如"Two"。我们选择所有书都有的class属性:"star-rating" 46 list_star = tag_books.find('p',class_="star-rating") 47 # 价格比较好找,根据属性提取,或者标签与属性一起都可以 48 tag_price = tag_books.find('p',class_="price_color") 49 # 这里用到了tag['属性名']提取属性值 50 print(tag_name['title']) 51 # 同样是用属性名提取属性值 52 print('star-rating:',list_star['class'][1]) 53 # 用list_star['class']提取出来之后是一个由两个值组成的列表,如:"['star-rating', 'Two']",我们最终要提取的是这个列表的第1个值:"Two"。 54 # 为什么是列表呢?因为这里的class属性有两个值。其实,在这个过程中,我们是使用class属性的第一个值提取出了第二个值。 55 # 打印的时候,我加上了换行,为了让数据更加清晰地分隔开,当然你也可以不加。</code></pre> 56 print('Price:',tag_price.text, end='\n'+'------'+'\n') 57 '''
items中每个Tag的内容如下
1 <article class="product_pod"> 2 <div class="image_container"> 3 <a href="../../../its-only-the-himalayas_981/index.html"><img alt="It's Only the Himalayas" class="thumbnail" 4 src="../../../../media/cache/27/a5/27a53d0bb95bdd88288eaf66c9230d7e.jpg" /></a> 5 </div> 6 <p class="star-rating Two"> 7 <i class="icon-star"></i> 8 <i class="icon-star"></i> 9 <i class="icon-star"></i> 10 <i class="icon-star"></i> 11 <i class="icon-star"></i> 12 </p> 13 <h3><a href="../../../its-only-the-himalayas_981/index.html" title="It's Only the Himalayas">It's Only the 14 Himalayas</a></h3> 15 <div class="product_price"> 16 <p class="price_color">£45.17</p> 17 <p class="instock availability"> 18 <i class="icon-ok"></i> 19 20 21 In stock 22 23 24 </p> 25 <form> 26 <button class="btn btn-primary btn-block" data-loading-text="Adding..." type="submit">Add to basket</button> 27 </form> 28 </div> 29 </article>