「Python网络爬虫」利用requests携带chrome中复制出的headers
代码:
import requests import json #这段三引号内的内容是从chrome控制台的网络页下刷新后,找到第一页的headers,整体复制 headers = """ GET /wui/index.html HTTP/1.1 Accept: text/html,application/xhtml+xml,application/xml;q=0.9,image/webp,image/apng,*/*;q=0.8,application/signed-exchange;v=b3;q=0.9 Accept-Encoding: gzip, deflate Accept-Language: zh-CN,zh;q=0.9,en;q=0.8,en-GB;q=0.7,en-US;q=0.6 Cache-Control: max-age=0 Connection: keep-alive Cookie:马赛克 If-Modified-Since: Sat, 30 Jul 2022 08:07:38 GMT If-None-Match: "8Jo马赛克X" Referer: http://马赛克/index.html Upgrade-Insecure-Requests: 1 User-Agent: Mozilla/5.0 (Windows NT 10.0; Win64; x64) AppleWebKit/537.36 (KHTML, like Gecko) Chrome/104.0.5112.102 Safari/537.36 Edg/104.0.1293.63 """ # 去除参数头尾的空格并按换行符分割 headers = headers.strip().split('\n') # 使用字典生成式将参数切片重组,并去掉空格,处理带协议头中的:// headers ={x.split(':')[0].strip():("".join(x.split(':')[1:])).strip().replace('//', "://") for x in headers} #print(json.dumps(headers,indent=4)) url = 'http://马赛克/api' res = requests.get(url,headers=headers).text print(res)
参考:https://baijiahao.baidu.com/s?id=1729201298117686282&wfr=spider&for=pc
https://blog.csdn.net/kylner/article/details/125299214