【Python】使用BeautifulSoup+requests+lxml解析HTML页面

1、官网

https://beautifulsoup.readthedocs.io/zh_CN/v4.4.0/

 

 

2、安装

pip install BeautifulSoup4
pip install lxml
pip install requests

 

 

 

3、解析脚本

import requests
from bs4 import BeautifulSoup

headers = {
    'User-Agent': 'Mozilla/5.0 (Windows NT 6.3; WOW64) AppleWebKit/537.36 (KHTML, like Gecko) Chrome/57.0.2987.110 Safari/537.36'}
download_url = 'https://registry.npmmirror.com/binary.html?path=chromedriver/'


rsp = requests.get(download_url, headers=headers)

print(rsp.status_code)
print(rsp.text)

soup = BeautifulSoup(rsp.text, 'lxml')
print(soup)

print(soup.find("script").text)

 

4、输出的script

<script>
      // Forked from https://chromedriver.storage.googleapis.com/index.html
      // Split a string in 2 parts. The first is the leading number, if any,
      // the second is the string following the numbers.
      function splitNum(s) {
        var results = new Array();
        results[0] = 'None';
        for (var i = 0; i < s.length; i++) {
          var substr = s.substr(0, i+1)
          if (isNaN(substr)) {
            // Not a number anymore.
            results[1] = s.substr(i)
......

 

posted @ 2022-08-19 13:50  代码诠释的世界  阅读(117)  评论(0编辑  收藏  举报