爬虫基本库的使用之requests库
使用requests
由于处理网页验证和Cookies时,需要写Opener和Handler来处理,为了更方便地实现这些操作,就有了更强大的库requests。requests库功能很强大。能实现Cookies、登录验证、代理设置等操作。
简单使用requests库
复制import requests r = requests.get('http://wwww.baidu.com/') print(type(r), r.status_code, r.text, r.cookies, sep='\n\n')
GET请求
返回相应的请求信息
复制requests.get(url, params) # url表示要捕获的页面链接,params表示url的额外参数(字典或字节流格式)
举例1:
复制import requests r = requests.get('http://httpbin.org/get') print(r.text)
复制# 输出 { "args": {}, "headers": { "Accept": "*/*", "Accept-Encoding": "gzip, deflate", "Host": "httpbin.org", "User-Agent": "python-requests/2.21.0" }, "origin": "120.85.108.192, 120.85.108.192", "url": "https://httpbin.org/get" }
举例2
复制import requests data = { 'name': 'LiYihua', 'age': '21' } r = requests.get('http://httpbin.org/get', params=data) print(r.text)
复制# 输出: { "args": { "age": "21", "name": "LiYihua" }, "headers": { "Accept": "*/*", "Accept-Encoding": "gzip, deflate", "Host": "httpbin.org", "User-Agent": "python-requests/2.21.0" }, "origin": "120.85.108.92, 120.85.108.92", "url": "https://httpbin.org/get?name=LiYihua&age=21" }
举例3
复制import requests r = requests.get('http://httpbin.org/get') print(type(r.text), r.json(), type(r.json()), sep='\n\n')
复制# 输出: <class 'str'> {'args': {}, 'headers': {'Accept': '*/*', 'Accept-Encoding': 'gzip, deflate', 'Host': 'httpbin.org', 'User-Agent': 'python-requests/2.21.0'}, 'origin': '120.85.108.92, 120.85.108.92', 'url': 'https://httpbin.org/get'} <class 'dict'>
举例4
抓取照片
复制import requests r = requests.get('https://github.com/favicon.ico') with open('favicon.ico', 'wb') as f: f.write(r.content)
复制# 运行结束后生成一个名为favicon.ico的图标
POST请求
这是一种比较常见的URL请求方式,举例:
复制import requests data = { 'name': 'LiYihua', 'age': 21 } r = requests.post('http://httpbin.org/post', data=data) print(r.text) # 输出: { "args": {}, "data": "", "files": {}, "form": { "age": "21", "name": "LiYihua" }, "headers": { "Accept": "*/*", "Accept-Encoding": "gzip, deflate", "Content-Length": "19", "Content-Type": "application/x-www-form-urlencoded", "Host": "httpbin.org", "User-Agent": "python-requests/2.21.0" }, "json": null, "origin": "120.85.108.90, 120.85.108.90", "url": "https://httpbin.org/post" } # POST请求成功,获得返回结果,form部分为提交的数据
Response
- text 和 content 获取响应的内容
- status code 属性得到状态码
- headers 属性得到响应头
- cookies属性得到 Cookies
- url属性得到 URL
- history属性得到请求历史
举例:
复制import requests r = requests.get('https://www.cnblogs.com/liyihua/') print(type(r.status_code), r.status_code, type(r.headers), r.headers, type(r.cookies), r.cookies, type(r.url), r.url, type(r.history), r.history, sep='\n\n') # 输出: <class 'int'> 200 <class 'requests.structures.CaseInsensitiveDict'> {'Date': 'Thu, 20 Jun 2019 08:18:00 GMT', 'Content-Type': 'text/html; charset=utf-8', 'Transfer-Encoding': 'chunked', 'Connection': 'keep-alive', 'Vary': 'Accept-Encoding', 'Cache-Control': 'private, max-age=10', 'Expires': 'Thu, 20 Jun 2019 08:18:10 GMT', 'Last-Modified': 'Thu, 20 Jun 2019 08:18:00 GMT', 'X-UA-Compatible': 'IE=10', 'X-Frame-Options': 'SAMEORIGIN', 'Content-Encoding': 'gzip'} <class 'requests.cookies.RequestsCookieJar'> <RequestsCookieJar[]> <class 'str'> https://www.cnblogs.com/liyihua/ <class 'list'> []
requests 的高级用法
-
文件上传
复制
import requests files = { 'file': open('favicon.ico', 'rb') } r = requests.post('http://httpbin.org/post', files=files) print(r.text) # 输出: { "args": {}, "data": "", "files": { "file": "data:application/octetstream;base64,AAABAAIAEBAAAAEAIAAoBQAAJgAAACAgAAABACAAKBQAAE4FAAAoAAAAEAAAACAAAAABACAAAAAAAAAFAAA... }, "form": {}, "headers": { "Accept": "*/*", "Accept-Encoding": "gzip, deflate", "Content-Length": "6665", "Content-Type": "multipart/form-data; boundary=c1b665273fc73e67e57ac97e78f49110", "Host": "httpbin.org", "User-Agent": "python-requests/2.21.0" }, "json": null, "origin": "120.85.108.71, 120.85.108.71", "url": "https://httpbin.org/post" } -
会话维持
-
Session对象,可以方便的维护一个会话
复制
import requests requests.get('http://httpbin.org/cookies/set/number/123456789') r = requests.get('http://httpbin.org/cookies') print(r.text) # 输出: { "cookies": {} } import requests s = requests.Session() s.get('http://httpbin.org/cookies/set/number/123456789') r = s.get('http://httpbin.org/cookies') print(r.text) # 输出: { "cookies": { "number": "123456789" } } -
SSL证书验证
复制
import requests r = requests.get('https://www.12306.cn') print(r.status_code) # 没有出错会输出:200 # 如果请求一个HTTPS站点,但是证书验证错误的页面时,就会错误。 # 为了避免错误,可以将改例子稍作修改 import requests from requests.packages import urllib3 urllib3.disable_warnings() r = requests.get('https://www.12306.cn', verify=False) print(r.status_code) -
代理设置
复制
import requests proxies = { 'http': 'socks5://user:password@10.10.1.10:3128', 'https': 'socks5://user:password@10.10.1.10:1080' } requests.get('https://www.taobao.com', proxies=proxies) # 使用SOCKS协议代理 -
超时设置
复制
import requests r = requests.get('https://taobao.com', timeout=(0.1, 1)) print(r.status_code) # 输出:200 -
身份验证
复制
import requests from requests.auth import HTTPBasicAuth r = requests.get('http://localhost', auth=HTTPBasicAuth('liyihua', 'woshiyihua134')) print(r.status_code) # 输出:200 # 也可以使用OAuth1方法 import requests from requests_oauthlib import OAuth1 url = 'https://api.twitter.com/1.1/account/verify_credentials.json' auth = OAuth1('YOUR_APP_KEY', 'YOUR_APP_SECRET' 'USER_OAUTH_TOKEN', 'USER_OAUTH_TOKEN_SECRET') requests.get(url, auth=auth) -
Prepared Request(准备请求
复制
要获取一个带有状态的 Prepared Request, 需要用Session.prepare_request() 复制
from requests import Request, Session url = 'http://httpbin.org/post' data = { 'name': 'LiYihua' } # 参数 header = { 'User-Agent': 'Mozilla/5.0 (Macintosh; Intel Mac OS X 10_11_4) AppleWebKit/537 (KHTML, like Gecko Chrome/53.0.2785.116 Safari/537.36' } # 伪装浏览器 s = Session() # 会话维持 req = Request('POST', url, data=data, headers=header) prepped = s.prepare_request(req) # Session的prepare_request()方法将req转化为一个 Prepared Request对象 r = s.send(prepped) # send() 发送请求 print(r.text) # 输出: { "args": {}, "data": "", "files": {}, "form": { "name": "LiYihua" }, "headers": { "Accept": "*/*", "Accept-Encoding": "gzip, deflate", "Content-Length": "12", "Content-Type": "application/x-www-form-urlencoded", "Host": "httpbin.org", "User-Agent": "Mozilla/5.0 (Macintosh; Intel Mac OS X 10_11_4) AppleWebKit/537 (KHTML, like Gecko Chrome/53.0.2785.116 Safari/537.36" }, "json": null, "origin": "120.85.108.184, 120.85.108.184", "url": "https://httpbin.org/post" }
-
本文来自博客园,作者:LeeHua,转载请注明原文链接:https://www.cnblogs.com/liyihua/p/11050374.html
【推荐】国内首个AI IDE,深度理解中文开发场景,立即下载体验Trae
【推荐】编程新体验,更懂你的AI,立即体验豆包MarsCode编程助手
【推荐】抖音旗下AI助手豆包,你的智能百科全书,全免费不限次数
【推荐】轻量又高性能的 SSH 工具 IShell:AI 加持,快人一步
· AI与.NET技术实操系列:向量存储与相似性搜索在 .NET 中的实现
· 基于Microsoft.Extensions.AI核心库实现RAG应用
· Linux系列:如何用heaptrack跟踪.NET程序的非托管内存泄露
· 开发者必知的日志记录最佳实践
· SQL Server 2025 AI相关能力初探
· 震惊!C++程序真的从main开始吗?99%的程序员都答错了
· winform 绘制太阳,地球,月球 运作规律
· 【硬核科普】Trae如何「偷看」你的代码?零基础破解AI编程运行原理
· 超详细:普通电脑也行Windows部署deepseek R1训练数据并当服务器共享给他人
· 上周热点回顾(3.3-3.9)