request和response的认识

爬虫学习

request的几个常用的方法

requests.get() 获取HTML网页的主要方法，对应HTTP的GET
requests.post() 向HTML网页提交POST请求的方法，对应HTTP的POST

get方法需要接受一个参数URL，构造一个向目标服务器发送请求的request对象，返回一个包含对象服务器资源的response对象，常用为

response = requests.get("url")

在get方法中也可以加入其它参数params（Python字典类型数据）,headers。这里还有很多参数，可以设置请求超时时间timeout等。

import requests

url = "https://www.baidu.com"

params = {
    'wd':'百度贴吧'
}

headers = {
    'User-Agent' :
    'Mozilla/4.0 (compatible; MSIE 8.0; Windows NT 6.1; WOW64; Trident/4.0; SLCC2; .NET CLR 2.0.50727; .NET CLR 3.5.30729; .NET CLR 3.0.30729; Media Center PC 6.0; .NET4.0C; InfoPath.3)'
} 

response = requests.get(url,params = params, headers = headers)

response对象的属性

response.text HTTP响应内容的字符串形式
response.encoding 响应的内容编码
response.content 响应内容的二进制

在网上随便找一张照片将照片通过爬虫脚本保存到本地。

import requests


class PhotoSpder():
    def __init__(self):
        self.headers = {
            "User-Agent": "Mozilla/5.0 (Windows NT 10.0; Win64; x64) AppleWebKit/537.36 (KHTML, like Gecko) Chrome/79.0.3945.88 Safari/537.36"
        }

    def parse_url(self, url):
        response = requests.get(url, headers=self.headers)
        return response.content

    def save_photo(self, content):
        photo_path = "1.png"
        with open(photo_path, "wb") as f:
            f.write(content)
            f.close()
            print("保存成功")

    def run(self):
        url = "https://www.baidu.com/img/pc_1c6e30772d5e4103103bd460913332f9.png"
        content = self.parse_url(url)
        self.save_photo(content)


if __name__ == '__main__':
    photo = PhotoSpder()
    photo.run()