Python Requests库:HTTP for Humans
Python标准库中用来处理HTTP的模块是urllib2,不过其中的API太零碎了,requests是更简单更人性化的第三方库。
用pip下载:
1 | pip install requests |
或者git:
1 | git clone git: / / github.com / kennethreitz / requests.git |
发送请求:
GET方法
1 2 | >>> import requests >>> r = requests.get( 'https://api.github.com/events' ) |
POST方法:
1 | >>> r = requests.post( "http://httpbin.org/post" ) |
也可以使用其它方法:
1 2 3 4 | >>> r = requests.put( "http://httpbin.org/put" ) >>> r = requests.delete( "http://httpbin.org/delete" ) >>> r = requests.head( "http://httpbin.org/get" ) >>> r = requests.options( "http://httpbin.org/get" ) |
也可以将请求方法放在参数中:
1 2 | >>> import requests >>> req = requests.request( 'GET' , 'http://httpbin.org/get' ) |
传递参数或上传文件:
1.如果要将参数放在url中传递,使用params参数,可以是字典或者字符串:
1 2 3 4 | >>> payload = { 'key1' : 'value1' , 'key2' : 'value2' } >>> r = requests.get( "http://httpbin.org/get" , params = payload) >>> r.url u 'http://httpbin.org/get?key2=value2&key1=value1' |
2.如果要将参数放在request body中传递,使用data参数,可以是字典,字符串或者是类文件对象。
使用字典时将发送form-encoded data:
1 2 3 4 5 6 7 8 9 10 11 | >>> payload = { 'key1' : 'value1' , 'key2' : 'value2' } >>> r = requests.post( "http://httpbin.org/post" , data = payload) >>> print (r.text) { ... "form" : { "key2" : "value2" , "key1" : "value1" }, ... } |
使用字符串时将直接发送数据:
1 2 3 4 | >>> import json >>> url = 'https://api.github.com/some/endpoint' >>> payload = { 'some' : 'data' } >>> r = requests.post(url, data = json.dumps(payload)) |
流上传:
1 2 | with open ( 'massive-body' , 'rb' ) as f: requests.post( 'http://some.url/streamed' , data = f) |
Chunk-Encoded上传:
1 2 3 4 5 | def gen(): yield 'hi' yield 'there' requests.post( 'http://some.url/chunked' , data = gen()) |
3.如果要上传文件,可以使用file参数发送Multipart-encoded数据,file参数是{ 'name': file-like-objects}格式的字典 (or {'name':('filename', fileobj)}) :
1 2 3 4 5 6 7 8 9 10 11 | >>> url = 'http://httpbin.org/post' >>> files = { 'file' : open ( 'report.xls' , 'rb' )} >>> r = requests.post(url, files = files) >>> r.text { ... "files" : { "file" : "<censored...binary...data>" }, ... } |
也可以明确设置filename, content_type and headers:
1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 | >>> url = 'http://httpbin.org/post' >>> files = { 'file' : ( 'report.xls' , open ( 'report.xls' , 'rb' ), 'application/vnd.ms-excel' , { 'Expires' : '0' })} >>> r = requests.post(url, files = files) >>> print r.text { "args" : {}, "data" : "", "files" : { "file" : "1\t2\r\n" }, "form" : {}, "headers" : { "Content-Type" : "multipart/form-data; boundary=e0f9ff1303b841498ae53a903f27e565" , "Host" : "httpbin.org" , "User-Agent" : "python-requests/2.2.1 CPython/2.7.3 Windows/7" , }, "url" : "http://httpbin.org/post" } |
一次性上传多个文件,比如可以接受多个值的文件上传:
1 | < input type = "file" name = "images" multiple = "true" required = "true" / > |
只要把文件放到一个元组的列表中,其中元组结构为(form_field_name, file_info):
1 2 3 4 5 6 7 8 9 10 11 | >>> url = 'http://httpbin.org/post' >>> multiple_files = [( 'images' , ( 'foo.png' , open ( 'foo.png' , 'rb' ), 'image/png' )), ( 'images' , ( 'bar.png' , open ( 'bar.png' , 'rb' ), 'image/png' ))] >>> r = requests.post(url, files = multiple_files) >>> r.text { ... 'files' : { 'images' : ' ....' } 'Content-Type' : 'multipart/form-data; boundary=3131623adb2043caaeb5538cc7aa0b3a' , ... } |
设置Headers
1 2 3 4 5 | >>> import json >>> url = 'https://api.github.com/some/endpoint' >>> payload = { 'some' : 'data' } >>> headers = { 'content-type' : 'application/json' } >>> r = requests.post(url, data = json.dumps(payload), headers = headers) |
Response对象:
获取unicode字符串,会自动根据响应头部的字符编码(r.encoding)进行解码,当然也可以自己设定r.encoding:
1 2 3 | >>> r = requests.get( 'https://github.com/timeline.json' ) >>> r.text u'{ "message" :"Hello there, wayfaring stranger... |
获取bytes字符串,会自动解码gzip和deflate数据:
1 2 | >>> r.content '{ "message" :"Hello there, wayfaring stranger. .. |
要存储web图片,可以:
1 2 3 | >>> from PIL import Image >>> from StringIO import StringIO >>> i = Image. open (StringIO(r.content)) |
可以解码json对象:
1 2 | >>> r.json() {u 'documentation_url' : u'https: / / developer... |
返回raw response,需要在requests请求中将stream设为True:
1 2 3 4 5 | >>> r = requests.get( 'https://github.com/timeline.json' , stream = True ) >>> r.raw <requests.packages.urllib3.response.HTTPResponse object at 0x101194810 > >>> r.raw.read( 10 ) '\x1f\x8b\x08\x00\x00\x00\x00\x00\x00\x03' |
如果不想一次性处理全部的数据,可以:
1 2 3 4 5 | tarball_url = 'https://github.com/kennethreitz/requests/tarball/master' r = requests.get(tarball_url, stream = True ) if int (r.headers[ 'content-length' ]) < TOO_LONG: content = r.content ... |
也可以迭代的处理数据:
1 2 3 | with open (filename, 'wb' ) as fd: for chunk in r.iter_content(chunk_size): fd.write(chunk) |
或者:
1 2 3 4 5 6 7 | import json import requests r = requests.get( 'http://httpbin.org/stream/20' , stream = True ) for line in r.iter_lines(): # filter out keep-alive new lines if line: print (json.loads(line)) |
获取响应代码:
1 2 3 | >>> r = requests.get( 'http://httpbin.org/get' ) >>> r.status_code 200 |
获取响应headers:
1 2 3 4 5 6 7 8 9 10 | >>> r.headers { 'content-encoding' : 'gzip' , 'transfer-encoding' : 'chunked' , 'connection' : 'close' , 'server' : 'nginx/1.0.4' , 'x-runtime' : '148ms' , 'etag' : '"e1ca502697e5c9317743dc078f67693f"' , 'content-type' : 'application/json' } |
获取发送的headers
1 2 3 | >>> r.request.headers { 'Accept-Encoding' : 'identity, deflate, compress, gzip' , 'Accept' : '*/*' , 'User-Agent' : 'python-requests/1.2.0' } |
Cookie
获取cookie,返回CookieJar对象:
1 2 3 | >>> url = 'http://www.baidu.com' >>> r = requests.get(url) >>> r.cookies |
将CookieJar转为字典:
1 2 | >>> requests.utils.dict_from_cookiejar(r.cookies) { 'BAIDUID' : '84722199DF8EDC372D549EC56CA1A0E2:FG=1' , 'BD_HOME' : '0' , 'BDSVRTM' : '0' } |
将字典转为CookieJar:
1 | requests.utils.cookiejar_from_dict(cookie_dict, cookiejar = None , overwrite = True ) |
上传自己设置的cookie,使用cookies参数,可以是字典或者CookieJar对象:
1 2 3 4 5 | >>> url = 'http://httpbin.org/cookies' >>> cookies = dict (cookies_are = 'working' ) >>> r = requests.get(url, cookies = cookies) >>> r.text '{"cookies": {"cookies_are": "working"}}' |
如果需要在会话中保留cookie,需要用到后面要说的Session。
Redirection and History重定向
默认情况下,除了 HEAD, Requests 会自动处理所有重定向。
可以使用响应对象的history属性来追踪重定向。
Response.history
是一个 Response
对象的列表。这个对象列表按照从最老到最近的请求进行排序。
1 2 3 4 5 6 7 | >>> r = requests.get( 'http://github.com' ) >>> r.url 'https://github.com/' >>> r.status_code 200 >>> r.history [<Response [ 301 ]>] |
如果你使用的是GET、OPTIONS、POST、PUT、PATCH 或者 DELETE,那么你可以通过 allow_redirects
参数禁用重定向处理:
1 2 3 4 5 6 7 | >>> r = requests.get( 'http://github.com' , allow_redirects = False ) >>> r.status_code 301 >>> r.history [] >>> r.headers[ 'Location' ] 'https://github.com/' |
如果你使用了 HEAD,你也可以启用重定向:
1 2 3 4 5 | >>> r = requests.head( 'http://github.com' , allow_redirects = True ) >>> r.url 'https://github.com/' >>> r.history [<Response [ 301 ]>] |
Session
要在会话中保留状态,可以使用request.Session()。
Session可以使用get,post等方法,Session对象在请求时允许你保留一定的参数和自动设置cookie
1 2 3 4 5 | s = requests.Session() s.get( 'http://httpbin.org/cookies/set/sessioncookie/123456789' ) #cookie保留在s中 r = s.get( "http://httpbin.org/cookies" ) #再次访问时会保留cookie print (r.text) # '{"cookies": {"sessioncookie": "123456789"}}' |
也可以自己设置headers,cookies:
1 2 3 4 | s = requests.Session() s.auth = ( 'user' , 'pass' ) s.headers.update({ 'x-test' : 'true' }) s.get( 'http://httpbin.org/headers' , headers = { 'x-test2' : 'true' }) # 'x-test' and 'x-test2' 都会被发送 |
预设Request
可以在发送request前做些额外的设定
1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 | from requests import Request, Session s = Session() req = Request( 'GET' , url, data = data, headers = header ) prepped = req.prepare() # do something with prepped.body # do something with prepped.headers resp = s.send(prepped, stream = stream, verify = verify, proxies = proxies, cert = cert, timeout = timeout ) print (resp.status_code) |
验证
Basic Authentication
1 2 3 | >>> from requests.auth import HTTPBasicAuth >>> requests.get( 'https://api.github.com/user' , auth = HTTPBasicAuth( 'user' , 'pass' )) <Response [ 200 ]> |
因为HTTP Basic Auth很常用,所以也可以直接验证:
1 2 | >>> requests.get( 'https://api.github.com/user' , auth = ( 'user' , 'pass' )) <Response [ 200 ]> |
Digest Authentication
1 2 3 4 | >>> from requests.auth import HTTPDigestAuth >>> url = 'http://httpbin.org/digest-auth/auth/user/pass' >>> requests.get(url, auth = HTTPDigestAuth( 'user' , 'pass' )) <Response [ 200 ]> |
OAuth 1 Authentication
1 2 3 4 5 6 7 | >>> import requests >>> from requests_oauthlib import OAuth1 >>> url = 'https://api.twitter.com/1.1/account/verify_credentials.json' >>> auth = OAuth1( 'YOUR_APP_KEY' , 'YOUR_APP_SECRET' , 'USER_OAUTH_TOKEN' , 'USER_OAUTH_TOKEN_SECRET' ) >>> requests.get(url, auth = auth) <Response [ 200 ]> |
也可以使用自己写的验证类。比如某个web服务接受将X-Pizza报头设置成密码的验证,可以这样写验证类:
1 2 3 4 5 6 7 8 9 10 | from requests.auth import AuthBase class PizzaAuth(AuthBase): """Attaches HTTP Pizza Authentication to the given Request object.""" def __init__( self , username): # setup any auth-related data here self .username = username def __call__( self , r): # modify and return the request r.headers[ 'X-Pizza' ] = self .username return r |
使用:
1 2 | >>> requests.get( 'http://pizzabin.org/admin' , auth = PizzaAuth( 'kenneth' )) <Response [ 200 ]> |
SSL证书验证
检查主机的ssl证书:
1 2 3 | >>> requests.get( 'https://kennethreitz.com' , verify = True ) raise ConnectionError(e) ConnectionError: HTTPSConnectionPool(host = 'kennethreitz.com' , port = 443 ): Max retries exceeded with url: / (Caused by < class 'socket.error' >: [Errno 10061 ] ) |
github是有的:
1 2 | >>> requests.get( 'https://github.com' , verify = True ) <Response [ 200 ]> |
如果你设置验证设置为False,也可以忽略验证SSL证书:
1 | >>> requests.get( 'https://github.com' , verify = False ) |
会有警告,忽略警告:
1 2 | from requests.packages import urllib3 urllib3.disable_warnings() |
可以指定一个本地证书用作客户端证书,可以是单个文件(包含密钥和证书)或一个包含两个文件路径的元组:
1 | >>> requests.get( 'https://kennethreitz.com' , cert = ( '/path/server.crt' , '/path/key' )) |
或者在session中保持:
1 2 | s = requests.Session() s.cert = '/path/client.cert' |
可以直接信任所有ssl证书:
1 2 | import ssl ssl._create_default_https_context = ssl._create_unverified_context |
代理
使用代理:
1 2 3 4 5 6 | import requests proxies = { "http" : "http://10.10.1.10:3128" , "https" : "http://10.10.1.10:1080" , } requests.get( "http://example.org" , proxies = proxies) |
可以设置环境变量:
1 2 3 4 5 | $ export HTTP_PROXY = "http://10.10.1.10:3128" $ export HTTPS_PROXY = "http://10.10.1.10:1080" $ python >>> import requests >>> requests.get( "http://example.org" ) |
如果代理需要验证:
1 2 3 | proxies = { "http" : "http://user:pass@10.10.1.10:3128/" , } |
【推荐】国内首个AI IDE,深度理解中文开发场景,立即下载体验Trae
【推荐】编程新体验,更懂你的AI,立即体验豆包MarsCode编程助手
【推荐】抖音旗下AI助手豆包,你的智能百科全书,全免费不限次数
【推荐】轻量又高性能的 SSH 工具 IShell:AI 加持,快人一步
· 从 HTTP 原因短语缺失研究 HTTP/2 和 HTTP/3 的设计差异
· AI与.NET技术实操系列:向量存储与相似性搜索在 .NET 中的实现
· 基于Microsoft.Extensions.AI核心库实现RAG应用
· Linux系列:如何用heaptrack跟踪.NET程序的非托管内存泄露
· 开发者必知的日志记录最佳实践
· winform 绘制太阳,地球,月球 运作规律
· TypeScript + Deepseek 打造卜卦网站:技术与玄学的结合
· AI 智能体引爆开源社区「GitHub 热点速览」
· 写一个简单的SQL生成工具
· Manus的开源复刻OpenManus初探