【Python】使用BeautifulSoup+requests+lxml解析HTML页面
1、官网
https://beautifulsoup.readthedocs.io/zh_CN/v4.4.0/
2、安装
pip install BeautifulSoup4 pip install lxml pip install requests
3、解析脚本
import requests from bs4 import BeautifulSoup headers = { 'User-Agent': 'Mozilla/5.0 (Windows NT 6.3; WOW64) AppleWebKit/537.36 (KHTML, like Gecko) Chrome/57.0.2987.110 Safari/537.36'} download_url = 'https://registry.npmmirror.com/binary.html?path=chromedriver/' rsp = requests.get(download_url, headers=headers) print(rsp.status_code) print(rsp.text) soup = BeautifulSoup(rsp.text, 'lxml') print(soup) print(soup.find("script").text)
4、输出的script
<script> // Forked from https://chromedriver.storage.googleapis.com/index.html // Split a string in 2 parts. The first is the leading number, if any, // the second is the string following the numbers. function splitNum(s) { var results = new Array(); results[0] = 'None'; for (var i = 0; i < s.length; i++) { var substr = s.substr(0, i+1) if (isNaN(substr)) { // Not a number anymore. results[1] = s.substr(i) ......