requests, Beautifusoup 爬取新浪新闻资讯
###1.爬取新浪新闻首页的新闻标题时间和链接
1 import requests 2 from bs4 import BeautifulSoup 3 4 res = requests.get('http://news.sina.com.cn/china') 5 res.encoding = 'utf-8' 6 soup = BeautifulSoup(res.text, 'html.parser') 7 8 for news in soup.select('.news-item'): 9 if len(news.select('h2')) > 0:
10 h2 = news.select('h2')[0].text 11 time = news.select('.time')[0].text 12 a = news.select('a')[0]['href'] 13 print(time, h2, a)
- 取得新闻内文
res = requests.get('http://news.sina.com.cn/o/2017-09-26/doc-ifymenmt7129299.shtml') res.encoding = 'utf-8' soup = BeautifulSoup(res.text, 'html.parser')
抓取新闻标题