python实现网页爬虫示例
用python里面的 requests 与 BeautifulSoup 结合,实现网页爬虫示例。
示例一:抓取中国省份:
import requests from bs4 import BeautifulSoup page = requests.get('http://www.stats.gov.cn/tjsj/tjbz/tjyqhdmhcxhfdm/2021/index.html') # Getting page HTML through request soup = BeautifulSoup(page.content, 'html.parser') # Parsing content using beautifulsoup links = soup.select("table tbody tr.provincetr td a") # Selecting all of the anchors with titles first10 = links # Keep only the first 10 anchors for anchor in first10: print(anchor.text) # Display the innerText of each anchor