Python 爬虫出现(ValueError: Invalid header name b':authority')

一,爬取比较有权威的网址

1. 出现

2. 表示在请求头中有不识别的数据,明显是无法解析请求头

3.  这是hppt2的请求,作为RFC 描述,Http 请求头不能以分号开头

  安装hyper进行解析,因为hyper认识这样的请求头

pip install hyper

4. 修改代码

import requests,json
from hyper.contrib import HTTP20Adapter

url = 'https://www.qcc.com/api/bigsearch/judgementList'
headers = {
    ':authority': 'www.a.a',
    ':method': 'POST',
    ':path': '/api/bigsearch/judgementList',
    ':scheme': 'https',
    'accept': 'aa/pa*',
    'accept-encoding': 'ae, br',
    'accept-language': 'a0.9',
    'content-length': '294',
    'content-type': 'at=UTF-8',
    'cookie': 'aa',
    'origin': 'xx',
    'referer': 'xx',
    'sec-fetch-dest': 'x',
    'sec-fetch-mode': 'x',
    'sec-fetch-site': 'x-x',
    'user-agent': 'xxx',
    'x-requested-with': 'xx'
}
payload = {
    "caseName": "",
    "caseNo": "",
    "caseReason": "",
    "content": "",
    "courtName": "",
    "involvedAmtMax": "",
    "involvedAmtMin": "",
    "isExactlySearch": "",
    "judgeDateBegin": "",
    "judgeDateEnd": "",
    "pageSize": "20",
    "party": "xxx",
    "publishDateBegin": "",
    "publishDateEnd": "",
    "searchKey": ""
}
data = json.dumps(payload)


sessions=requests.session()
sessions.mount('https://xxxxx', HTTP20Adapter())
res=sessions.post(url,headers=headers,data=data)
print(res.text)

5. 最后根据method的请求方式进行请求

posted @ 2020-09-11 14:39  Mr-刘  阅读(6645)  评论(1编辑  收藏  举报