requests库

1.requests库是用来发送http请求，接收http响应的一个python库

requests库经常被用来爬取网站信息

用它发起http请求到网站，从响应消息中提取信息

例：

pip install requests
import requests
respone = requests.get('http://mirrors.sohu.com/')
print(respone.text)

2.抓包工具fiddler

用于查看具体发送请求和接收响应信息

fiddler是代理抓包工具

fiddler启动后，会启动一个代理服务器（同时设置自己为系统代理）监听在8888端口上

浏览器可以用f12

3.抓包过滤

filter--show only the following hosts

例如：localhost;127.0.0.1;*.sohu.com

*作为通配符

4.在inspectors标签查看

上半部分发送请求信息

下半部分接收响应信息

raw标签：查看整个http具体内容

5.requests程序抓包

要让requests发送请求使用代理，只需要如下参数

import requests
proxies = {
    'http':'http://127.0.0.1:8888',
    'https':'http://127.0.1:8888'}
response = requests.get('http://mirrors.sohu.com/', proxies = proxies)
print(response.text)

注：如若抓https需要安装fiddler证书为受信任证书

6.手机抓包

前置条件：电脑和手机在同一WiFi下

fiddler-tools-options-connections-点击allow remov computers to connect

打开手机wifi，修改网络，高级设置，代理设置为手动，填入电脑ip（ipconfig）和端口8888

苹果手机类似

https需要先把证书导入手机

7.构建请求url----params

https://www.baidu.com/s?wd=iphone&rsv_spt=1

?wd=iphone&rsv_spt=1问号后面为url参数

把参数放在字典内

import requests
para={
    ‘wd’ : 'iphone',
    'rsv_spt' : '1'
}
respone = requests.get('https://www.baidu.com/',params=para)
print(respone.text)

8.构建请求头----headers

import requests
head={'user-agent' : 'my-app/0.0.1',
    'auth-type' : 'jwt-token' }
respone=requests.post("http://httpbin.org/post", headers=head)
print(respone.text)

注：这里是post请求了

9.构建请求消息体----data

消息体基本都是文本，文本格式主要是三种：urlencoded，json，xml

（1）xml格式消息体

payload =

''' <?xml version="1.0" encoding="UTF-8"?>

<WorkReport>

<Overall>良好</Overall>

<Progress>30%</Progress>

<Problems>暂无</Problems>

</WorkReport> '''

respone = requests.post('http://httpbin.org/post',
    data=payload.encode('utf8'))
print(respone.text)

注：这里encode限定格式为utf8，如果没有默认为Latin-1

（2）urlencoded格式消息体

这种格式的消息体就是一种键值对的格式存放数据，如下所示

key1=value1&key2=value2

Requests发送这样的数据，当然可以直接把这种格式的字符串传入到data参数里面。

但是，这样写的话，如果参数中本身就有特殊字符，比如等号，就会被看成参数的分隔符，就麻烦了。

我们还有更方便的方法：只需要将这些键值对的数据填入一个字典。

然后使用post方法的时候，指定参数 data 的值为这个字典就可以了，如下

payload = {'key1': 'value1', 'key2': 'value2'}

r = requests.post("http://httpbin.org/post", data=payload)
print(r.text)

（3）json格式----json

import requests,json
payload = { "Overall":"良好", "Progress":"30%", "Problems":[ { "No" : 1, "desc": "问题1...." }, { "No" : 2, "desc": "问题2...." }, ] }
#可以使用json库的dumps方法
r = requests.post("http://httpbin.org/post", data=json.dumps(payload)) 
#抓包后消息体汉字全部为ascll码

r = requests.post("http://httpbin.org/post", data=json.dumps(payload,ensure_ascii=False).encode())

#也可以直接传递给post方法的 json参数  r = requests.post("http://httpbin.org/post", json=payload)

10.检查响应状态码----status_code

import requests
r = requests.get('http://mirrors.sohu.com/')
print(r.status_code)

11.检查响应消息头

import requests,pprint 
r = requests.get('http://mirrors.sohu.com/')

print(type(r))
print(type(r.headers))
pprint.pprint(dict(r.headers))
print(dict(r.headers))
print(r.headers)
print(r.headers['Connection'])
print(dict(r.headers)['Connection'])

运行结果如下

<class 'requests.models.Response'>
<class 'requests.structures.CaseInsensitiveDict'>
{'Access-Control-Allow-Credentials': 'true',
 'Access-Control-Allow-Origin': '*',
 'Connection': 'keep-alive',
 'Content-Length': '832',
 'Content-Type': 'application/json',
 'Date': 'Wed, 08 Mar 2023 02:12:37 GMT',
 'Server': 'gunicorn/19.9.0'}
{'Date': 'Wed, 08 Mar 2023 02:12:37 GMT', 'Content-Type': 'application/json', 'Content-Length': '832', 'Connection': 'keep-alive', 'Server': 'gunicorn/19.9.0', 'Access-Control-Allow-Origin': '*', 'Access-Control-Allow-Credentials': 'true'}
{'Date': 'Wed, 08 Mar 2023 02:12:37 GMT', 'Content-Type': 'application/json', 'Content-Length': '832', 'Connection': 'keep-alive', 'Server': 'gunicorn/19.9.0', 'Access-Control-Allow-Origin': '*', 'Access-Control-Allow-Credentials': 'true'}
keep-alive
keep-alive

print(r.headers['Connection'])
print(dict(r.headers)['Connection'])提取字典一个元素值
r.headers对象类型继承自dict字典类型的一个类
子类继承父类，同字典一样操作

11.检查响应消息体

（1）获取响应消息体的文本内容，直接通过response对象的text属性可以获取

import requests

respone = requests.get('http://mirrors.sohu.com/')

print(respone.text)

（2）那么，requests是以什么编码格式把HTTP响应体的字节串解码成字符串呢？

requests会根据响应消息头（比如："Content-Type": "application/json"）

但有时候没有，需要我们指定方式

import requests

response = requests.get('http://mirrors.sohu.com/')
response.encoding='utf8'
print(response.text)

（3）如果我们要直接获取消息体中的字节串内容，可以使用context属性

import requests

response = requests.get('http://mirrors.sohu.com/')
print(response.content)

得到的是b'<!DOCTYPE html><html><head><meta http-equiv="content-type" content="text/html; charset=utf-8.......字节串

当然，如果可以直接对获取的字节串 bytes对象进行解码

print(response.content.decode('utf8'))

12.api响应的消息体格式以json居多

为了方便处理响应消息中的json格式数据，通常我们把json格式字符串转换成python中的数据对象

使用json数据库中的loads函数

import requests,json
respone = requests.post("http://httpbin.org/post", data={1:1,2:2})
obj=json.loads(respone.content.decode('utf8'))
print(obj)

requests库为我们提供了更方便的方法，可以使用 Response对象的 json方法，

如下：

response = requests.post("http://httpbin.org/post", data={1:1,2:2})
obj = response.json()
print(obj)

声明：参考requests库和 session | 白月黑羽 (byhy.net)，仅用于自学使用

posted @ 2023-03-08 10:52 yj-newboy 阅读(75) 评论(0) 编辑收藏举报

刷新页面返回顶部