Python 爬取页面内容

import urllib.request
import requests
from bs4 import BeautifulSoup

url = "http://www.stats.gov.cn/tjsj/tjbz/tjyqhdmhcxhfdm/2018/12/1201.html"
headers = ("User-Agent","Mozilla/5.0 (Windows NT 6.1) AppleWebKit/537.36 (KHTML, like Gecko) Chrome/74.0.3729.169 Safari/537.36")
opener = urllib.request.build_opener()
opener.addheaders = [headers]
data = opener.open(url).read()
content = data.decode('GB2312')
soup = BeautifulSoup(content, 'html.parser')
print(soup.find_all('a'))

for link in soup.find_all('a'):
    print('url:',link.attrs['href'])
    print('text:',link.get_text('title'))

posted @ 2019-09-27 15:10 微客鸟窝阅读(234) 评论(0) 编辑收藏举报

会员力量，点亮园子希望

刷新页面返回顶部

微客鸟窝

公众号《微客鸟窝》笔者，目前从事web后端开发，涉及语言PHP、golang！获得美国《时代周刊》2006年度风云人物！

Python 爬取页面内容

公告