网络爬虫之requests模块

python3中用于模拟发起网络请求的模块有两个urllib模块和requests模块,由于requests模块相对于urllib模块来说更加简单便捷高效本文就只介绍requests模块。

 

环境安装:

pip install requests

 

GET请求:

  HTTP中最常见的请求之一就是GET请求,下面首先来详细了解一下利用requests模块构建GET请求的方法

首先构建一个最简单的GET请求,url就是请求链接,该网站会判断如果客户发送的是GET请求的话,它返回相印的请求信息
import
request url = 'http://httpbin.org/get’ response = requests.get(url=url) print(response.text) 运行结果如下: { "args": {}, "headers": { "Accept": "*/*", "Accept-Encoding": "gzip, deflate", "Host": "httpbin.org", "User-Agent": "python-requests/2.25.1", "X-Amzn-Trace-Id": "Root=1-6069d800-43b4f5da49eb42f770c9dc90" }, "origin": "113.118.77.36", "url": "http://httpbin.org/get" }

 

对于GET请求如果需要附加额外的信息,只需传入params参数即可

import requests

url = 'http://httpbin.org/get'
params = {
    'name':'germey',
    'age':22
}

response = requests.get(url=url,params=params)
print(response.text)

结果如下:
{
  "args": {
    "age": "22", 
    "name": "germey"
  }, 
  "headers": {
    "Accept": "*/*", 
    "Accept-Encoding": "gzip, deflate", 
    "Host": "httpbin.org", 
    "User-Agent": "python-requests/2.25.1", 
    "X-Amzn-Trace-Id": "Root=1-6069d97d-10bea5df63cec72311101582"
  }, 
  "origin": "113.118.77.36", 
  "url": "http://httpbin.org/get?name=germey&age=22"
}

 

如果网页上是json数据就需要调用响应数据的json方法,如果是二进制数据就需要调用content方法。

response.json()
response.content()

 

POST请求:

  前面了解了最基本的GET请求,另一种比较常见的就是POST请求。通用使用requests实现POST请求同样非常简单。

import requests

url = 'http://httpbin.org/post’
data = {
    'name':'germey',
    'age':22
}
page_text = requests.post(url=url,data=data)
print(page_text.text)

结果如下:
{
  "args": {}, 
  "data": "", 
  "files": {}, 
  "form": {
    "age": "22", 
    "name": "germey"
  }, 
  "headers": {
    "Accept": "*/*", 
    "Accept-Encoding": "gzip, deflate", 
    "Content-Length": "18", 
    "Content-Type": "application/x-www-form-urlencoded", 
    "Host": "httpbin.org", 
    "User-Agent": "python-requests/2.25.1", 
    "X-Amzn-Trace-Id": "Root=1-6069dbb2-20c01c0a048c8d0c239cdf28"
  }, 
  "json": null, 
  "origin": "113.118.77.36", 
  "url": "http://httpbin.org/post"
}

 

通常情况下发起请求需要添加headers参数进行UA伪装,不然网页会拒绝你的请求。

import requests

header = {

    'User-Agent': 'Mozilla/5.0 (Macintosh; Intel Mac OS X 10_15_5) AppleWebKit/605.1.15 (KHTML, like Gecko) Version/13.1.1 Safari/605.1.15'

}

url = ‘https://www.baidu.com'
Response = request.get(url, headers = header)

 

posted @ 2021-04-04 23:40  Ccdjun  阅读(94)  评论(0编辑  收藏  举报