python爬虫之伪装请求头

python本身也是通过向浏览器发送请求获取数据的,存在请求头,如果不进行伪装,会被对方服务器识别从而爬取失败

def askURL(url):
    data = bytes(urllib.parse.urlencode({
        "setAction": "classroomQuery",
        "PageAction": "Query",
        "day_time_text": "0011000000000",
        "school_area_code": "1",
        "building": "13",
        "week_no": "256"
        , "day_no": "2",
        "day_time1": "ON",
        "B1": "查询"}), encoding="utf-8")
    headers = {  # 模拟浏览器头部信息,向浏览器发送消息
        "User-Agent": "Mozilla/5.0 (Windows NT 10.0; Win64; x64; rv:99.0) Gecko/20100101 Firefox/99.0"
    }
#用户代理,表示告诉服务器,我么是什么类型的机器,本质上是告诉浏览器,我们可以接受什么类型的内容
    request=urllib.request.Request(url,headers=headers,data=data,method="POST")
    html=""
    try:
        resonse = urllib.request.urlopen(request)
        html=resonse.read().decode("utf-8")
        #print(html)
    except urllib.error.URLError as e:
        if hasattr(e,"code"):
            print(e.code)
        if hasattr(e,"reason"):
            print(e.reason)
    return html

伪装请求头headers

打开任意网站打开控制台网络,复制请求头即可

 

posted @ 2022-04-21 14:47  山海自有归期  阅读(362)  评论(0编辑  收藏  举报