请求库之requests模块

一、介绍

# 介绍：使用requests可以模拟浏览器的请求，比起之前用到的urllib，requests模块的api更加便捷（本质就是封装了urllib3）

# 注意：requests库发送请求将网页内容下载下来以后，并不会执行js代码，这需要我们自己分析目标站点然后发起新的request请求

# 安装：pip3 install requests

# 各种请求方式：常用的就是requests.get()和requests.post()
>>> import requests
>>> r = requests.get('https://api.github.com/events')
>>> r = requests.post('http://httpbin.org/post', data = {'key':'value'})
>>> r = requests.put('http://httpbin.org/put', data = {'key':'value'})
>>> r = requests.delete('http://httpbin.org/delete')
>>> r = requests.head('http://httpbin.org/get')
>>> r = requests.options('http://httpbin.org/get')

二、基于get请求

1 基本请求

response是python的对象，包含响应头，响应体......

header = {
    'user-agent': 'Mozilla/5.0 (Windows NT 10.0; WOW64) AppleWebKit/537.36 (KHTML, like Gecko) Chrome/84.0.4147.105 Safari/537.36',
    'referer': 'https://www.mzitu.com/225078/2'
 }

response = requests.get('https://www.mzitu.com/', headers=header)
print(response.text)  # 响应的文本内容-->解析出图片地址

# 朝图片地址发get请求，获取到图片的二进制内容
result = requests.get('https://i3.mmzztt.com/2020/03/14a02.jpg', headers=header)
print(result.content)  # 响应的二进制内容

# 下载并保存图片
with open('a.jpg', 'wb')as f:
   for line in result.iter_content():
       f.write(line)

2 带参数的get请求

header = {
     'user-agent': 'Mozilla/5.0 (Windows NT 10.0; WOW64) AppleWebKit/537.36 (KHTML, like Gecko) Chrome/84.0.4147.105 Safari/537.36',
    }


方式一：直接拼在url后边
res=requests.get('https://www.baidu.com/s?wd=美女',headers=header)
# 如果查询关键词是中文或者有其他特殊符号，则不得不进行url编码
# from urllib.parse import urlencode,unquote 
# 编码 urlencode('美女',encoding='utf-8')  
# 解码 unquote('%2Fs%3Fwd%3D%25E7%') 

方式二：用params携带参数, 可以自动url编码
res=requests.get('http://www.baidu.com/s', headers=header, params={'wd':'美女'})

3 请求携带cookie

方式一，在header中放
header = {
     'user-agent': 'Mozilla/5.0 (Windows NT 10.0; WOW64) AppleWebKit/537.36 (KHTML, like Gecko) Chrome/84.0.4147.105 Safari/537.36',
     'cookie':'key=asdfasdfasdfsdfsaasdf; key2=asdfasdf; key3=asdfasdf'
     }
res=requests.get(url, headers=header)


方式二，当成参数直接传,推荐
header = {
    'user-agent': 'Mozilla/5.0 (Windows NT 10.0; WOW64) AppleWebKit/537.36 (KHTML, like Gecko) Chrome/84.0.4147.105 Safari/537.36',
      }
res=requests.get(url, headers=header, cookies={'key':'asdfasdf'})
print(res.text)

# cookies是一个字典或者CookieJar对象,第一次访问利用respone.cookies获取CookieJar对象-->赋值给变量，访问其他页面时，传入CookieJar对象

三、基于post请求

1 基本用法

# requests.post()用法与requests.get()完全一致，特殊的是requests.post()有一个data参数，用来存放请求体数据
# data参数携带数据（urlencoded和json）

res=requests.post(url, data={'name': 'lqz'})   # 后端通过request.body拿到的是b'name=lqz'，即urlencoded格式

res=requests.post(url, json={'age': 18})       # 后端通过request.body拿到的是b'{"age": 18}'，即json格式

2 发送post请求，模拟浏览器的登录行为

自动登录github(手动处理cookie信息)

2.1 目标站点分析
    浏览器输入https://github.com/login
    然后输入错误的账号密码，抓包
    发现登录行为是post提交到：https://github.com/session
    请求头包含cookie
    请求体包含：
        commit:Sign in
        utf8:✓
        authenticity_token:lbI8IJCwGslZS8qJPnof5e7ZkCoSoMn6jmDTsL1r/m06NLyIbw7vCrpwrFAPzHMep3Tmf/TSJVoXWrvDZaVwxQ==
        login:egonlin
        password:123


2.2流程分析
    先GET：https://github.com/login拿到初始cookie与authenticity_token
    返回POST：https://github.com/session， 带上初始cookie，带上请求体（authenticity_token，用户名，密码等）
    最后拿到登录cookie

ps：如果密码时密文形式，则可以先输错账号，输对密码，然后到浏览器中拿到加密后的密码，github的密码是明文
-----------------------------------------------------------------------------------------------------------

模拟登录，获取cookie

import requests
import re

# 第一次请求
r1=requests.get('https://github.com/login')
r1_cookie=r1.cookies.get_dict() #拿到初始cookie(未被授权)
authenticity_token=re.findall(r'name="authenticity_token".*?value="(.*?)"',r1.text)[0] #从页面中拿到CSRF TOKEN

# 第二次请求：带着初始cookie和TOKEN发送POST请求给登录页面，带上账号密码
data={
    'commit':'Sign in',
    'utf8':'✓',
    'authenticity_token':authenticity_token,
    'login':'317828332@qq.com',
    'password':'alex3714'
}
r2=requests.post('https://github.com/session',data=data,cookies=r1_cookie)
             
login_cookie=r2.cookies.get_dict() # 拿到登录后的cookie

#第三次请求：以后的登录，拿着login_cookie就可以,比如访问一些个人配置
r3=requests.get('https://github.com/settings/emails',cookies=login_cookie)

print('317828332@qq.com' in r3.text) # 查询邮箱，如果为True，说明cookie已登录

requsets.seesion(自动处理cookie信息)

requests.session()基本使用

session=requests.session()                   # 生成request.session()对象
res1=session.post('http://127.0.0.1:8000/index/')    # 假设这个请求登录了
res2=session.get('http://127.0.0.1:8000/order/')    # 现在不需要手动带cookie，session自动处理
--------------------------------------------------------------------------

自动携带cookie，简化上述模拟登录案例
import requests
import re

session=requests.session()
#第一次请求
r1=session.get('https://github.com/login')
authenticity_token=re.findall(r'name="authenticity_token".*?value="(.*?)"',r1.text)[0] #从页面中拿到CSRF TOKEN

#第二次请求
data={
    'commit':'Sign in',
    'utf8':'✓',
    'authenticity_token':authenticity_token,
    'login':'317828332@qq.com',
    'password':'alex3714'
}
r2=session.post('https://github.com/session',data=data,)

#第三次请求
r3=session.get('https://github.com/settings/emails')


print('317828332@qq.com' in r3.text) # True

四、响应Response

1、response属性

respone=requests.post(url, data={'name':'lqz'})
print(respone.text)               # 响应的文本
print(respone.content)            # 响应体的二进制数据
print(respone.status_code)         # 响应状态码
print(respone.headers)            # 响应头
print(respone.cookies)            # cookieJar对象，访问首页后网站设置了cookie，访问其他页面就需要带这个cookie，用这个方法先把cookie拿出来
print(respone.cookies.get_dict()) # 把cookieJar对象转成字典
print(respone.cookies.items())    # cookie字典的key和value键值对，取出来后放在元祖里
print(respone.url)                # 请求的url
print(respone.history)            # 是一个列表，放重定向之前的地址
print(respone.encoding)           # 响应的编码方式

respone.iter_content()            # 获取二进制流：图片，视频，大文件，一点一点循环取出来
for line in respone.iter_content():
     f.write(line)

2、编码问题

res=requests.get('http://www.autohome.com/news')
# 一旦打印出来出现乱码问题
# 方式一：按照网站指定的编码格式把响应对象转码
res.encoding='gb2312'

# 方式二：通用的转码方式
res.encoding=res.apparent_encoding
print(res.text)

3、解析json

import requests
response=requests.get('http://httpbin.org/get')

import json
res1=json.loads(response.text) #太麻烦
res2=response.json() #直接获取json数据

五、高级用法

1、SSL Cert Verification(了解)

# 证书验证(大部分网站都是https)
import requests
respone=requests.get('https://www.12306.cn') #如果是ssl请求,首先检查证书是否合法,不合法则报错,程序终端


# 改进1:去掉报错,但是会报警告
import requests
respone=requests.get('https://www.12306.cn',verify=False) #不验证证书,报警告,返回200
print(respone.status_code)

# 改进2:去掉报错,并且去掉警报信息
import requests
from requests.packages import urllib3
urllib3.disable_warnings() #关闭警告
respone=requests.get('https://www.12306.cn',verify=False)
print(respone.status_code)

# 改进3:加上证书(本地路径配证书)
# 很多网站都是https,但是不用证书也可以访问,大多数情况都是可以携带也可以不携带证书
# 知乎\百度等都是可带可不带
# 有硬性要求的,则必须带，比如对于定向的用户,拿到证书后才有权限访问某个特定网站
import requests
respone=requests.get('https://www.12306.cn', cert=('/path/server.crt','/path/key'))
print(respone.status_code)

2、使用代理(重点)

proxies={
    'http':'http://egon:123@localhost:9743',      #带用户名密码的代理,@符号前是用户名与密码
    'http':'http://localhost:9743',               # 代理ip+端口号
    'https':'https://localhost:9743',
}
respone=requests.get('https://www.12306.cn',proxies=proxies)

# 后端（如，django）在META中，remot-addr中可以看到ip地址，加了代理看到的是代理的ip地址
# 代理池：列表放了一堆代理ip，每次随机取一个，再发请求就不会封ip了
# 高匿和透明代理？如果使用高匿代理，后端无论如何拿不到你的ip，使用透明，后端能够拿到你的ip
# 后端如何查到透明代理背后的ip？  后端META中：X-Forwarded-For这个字段可以拿到，在每次使用代理跳转前的ip都有记录

3、超时设置

#两种超时:float or tuple
timeout=0.1 #代表接收数据的超时时间
timeout=(0.1,0.2) #0.1代表链接超时  0.2代表接收数据的超时时间

import requests
respone=requests.get('https://www.baidu.com',timeout=0.0001)

4、认证设置(了解)

# 老的网站登录,弹出一个框,要求你输入用户名密码（与alter很类似），此时是无法获取html的
r=requests.get(url, auth=('user','password'))
print(r.status_code)

5、异常处理

# 可以查看requests.exceptions获取异常类型
from requests.exceptions import *

# 捕获一个总异常就行了
try:
    res = requests.get('http://www.baidu.com',timeout=0.00001) 
except Exception as e:
    print(e)

6、上传文件

res=requests.post(url, files={'myfile':open('a.jpg','rb')})
print(res.text)


# 后端request.FILES.get('myfile') 获取到上传的文件对象
# requests模块可以用来与后端做交互，如短信接口和支付接口的sdk封装就是用的requests模块，如果没有第三方的sdk包，基于api写第三方交互就用requests模块

posted @ 2022-09-28 17:15 不会钓鱼的猫阅读(101) 评论(0) 编辑收藏举报

刷新页面返回顶部

登录后才能查看或发表评论，立即登录或者逛逛博客园首页

阅读排行：
· 震惊！C++程序真的从main开始吗？99%的程序员都答错了
· winform 绘制太阳，地球，月球运作规律
· 【硬核科普】Trae如何「偷看」你的代码？零基础破解AI编程运行原理
· 上周热点回顾（3.3-3.9）
· 超详细：普通电脑也行Windows部署deepseek R1训练数据并当服务器共享给他人

公告

昵称：不会钓鱼的猫
园龄： 2年8个月
粉丝： 14
关注： 1

+加关注

2025年3月

日

一

二

三

四

五

六

Just so so

请求库之requests模块

一、介绍

二、基于get请求

1 基本请求

2 带参数的get请求

3 请求携带cookie

三、基于post请求

1 基本用法

2 发送post请求，模拟浏览器的登录行为

四、响应Response

1、response属性

2、编码问题

3、解析json

五、高级用法

1、SSL Cert Verification(了解)

2、使用代理(重点)

3、超时设置

4、认证设置(了解)

5、异常处理

6、上传文件

公告

搜索

常用链接

随笔档案

文章分类

阅读排行榜