从requests源码分析中学习python(一)
v2ex同步更新:https://www.v2ex.com/t/500081
微信公众号:python学习开发
分析源码,看大神的代码是一种学习的好方法,让我从中学到很多以前不知道的知识,这次打算从大家熟悉的Kenneth Reitz大神的request入手,对该模块应用的一些技巧进行一次探究。
从get方法入手
我们知道使用requests的get方法传入url就可以访问此网站,但是这个过程是怎么做的呢,今天就带着这个疑问对其进行进一步探究。
打开pycharm,然后创建demo.py
输入一下代码即可。
import requests
url="https://www.baidu.com"
req=requests.get(url)
在pycharm中通过ctrl(command)+🖱️左键我们可以定位到方法的位置。
我们首先进入api.py文件,看到get方法如下:
def get(url, params=None, **kwargs):
r"""Sends a GET request.
:param url: URL for the new :class:`Request` object.
:param params: (optional) Dictionary or bytes to be sent in the query string for the :class:`Request`.
:param \*\*kwargs: Optional arguments that ``request`` takes.
:return: :class:`Response <Response>` object
:rtype: requests.Response
"""
kwargs.setdefault('allow_redirects', True)
return request('get', url, params=params, **kwargs)
可以发现该方法就两句话
先看第一句,kwargs.setdefault('allow_redirects', True)
,下面我们来说说kwargs在这里的用处
kwargs
kwargs是字典类型,setdefault的作用是给字典键名allow_redirects赋值,如果该键不存在,赋给其默认值,也就是第二参数True。
用**kwargs可在方法间的传递大量参数,不需要自己每次都初始化一个dict用来传参
下面看一个简单例子
# -*- coding: utf-8 -*-
# @Time : 2018/10/16 下午10:07
# @Author : cxa
# @File : kwargsDemo.py
# @Software: PyCharm
import requests
def print_text(r, *args, **kwargs):
print(r.text)
# **kwargs 的妙用省去了一堆参数
def foo(url, **kwargs):
data = kwargs.pop('data', dict()) or kwargs.pop('params', dict())
headers = kwargs.pop('headers', {})
print("data", data)
print('headers', headers)
req = requests.get(url, headers=headers, data=data,hooks=dict(response=print_text))
if __name__ == '__main__':
url = "https://www.baidu.com"
headers = {
'User-Agent': 'Mozilla/5.0 (Macintosh; Intel Mac OS X 10_13_6) AppleWebKit/537.36 (KHTML, like Gecko) Chrome/69.0.3497.100 Safari/537.36'}
kwargs={}
kwargs.setdefault('headers', headers)
foo(url, **kwargs)
foo函数定义了两个参数一个是固定的url,一个是kwargs,键值对类型的参数。
kwags.pop([key],default)
通过pop函数我们可以获取指定键的值,如果不存在会给定默认参数。
然后看第二句,返回一个request对象,我们继续跟进request,此时看到api.py文件的request方法内容,下面我会挑出我认为重要的部分
# By using the 'with' statement we are sure the session is closed, thus we
# avoid leaving sockets open which can trigger a ResourceWarning in some
# cases, and look like a memory leak in others.
with sessions.Session() as session:
return session.request(method=method, url=url, **kwargs)
跟进Session()来到session.py文件,摘取其中第一部分,并跟进default_headers()
def __init__(self):
#: A case-insensitive dictionary of headers to be sent on each
#: :class:`Request <Request>` sent from this
#: :class:`Session <Session>`.
self.headers = default_headers()
跟进default_headers()到了utils.py
def default_headers():
"""
:rtype: requests.structures.CaseInsensitiveDict
"""
return CaseInsensitiveDict({
'User-Agent': default_user_agent(),
'Accept-Encoding': ', '.join(('gzip', 'deflate')),
'Accept': '*/*',
'Connection': 'keep-alive',
})
该方法返回了一个叫做CaseInsensitiveDict的方法,继续跟进我们来到structures.py
知识点来了,我们对该文件的第一句话进行解析
from .compat import OrderedDict, Mapping, MutableMapping
第一句话的作用我们都知道是从compat模块中导入OrderedDict, Mapping, MutableMapping模块,继续跟进可知这三个模块来自python的collections库。
from collections import Callable, Mapping, MutableMapping
from urllib3.packages.ordered_dict import OrderedDict
ok开始分析
OrderedDict
很多人认为python中的字典是无序的,因为它是按照hash来存储的,但是OrderedDict,实现了对字典对象中元素的排序。但是我在查看官网的时候发现了这句话。。。
Changed in version 3.7: Dictionary order is guaranteed to be insertion order. This behavior was implementation detail of CPython from 3.6.
这。。。ok,OrderedDict我觉的可以不会用了。就看其中一个dict没有的方法吧
from collections import OrderedDict
from collections.abc import MutableMapping
# move_to_end(指定一个key,把对应的key-value移到最后)
dic = OrderedDict()
dic['k1'] = 'v1'
dic['k2'] = 'v2'
dic['k3'] = 'v3'
dic.move_to_end('k1')
print(dic)
print(isinstance(dic, MutableMapping)) #映射类型
下面开始分析结构
# -*- coding: utf-8 -*- # @Time : 2018/10/16 10:34 # @Author : cxa # @File : dictMethod.py # @Software: PyCharm # -*- coding: utf-8 -*- """ requests.structures ~~~~~~~~~~~~~~~~~~~ Data structures that power Requests. """ from collections import OrderedDict from collections.abc import Mapping, MutableMapping from collections import Iterable class CaseInsensitiveDict(MutableMapping): """A case-insensitive ``dict``-like object. Implements all methods and operations of ``MutableMapping`` as well as dict's ``copy``. Also provides ``lower_items``. All keys are expected to be strings. The structure remembers the case of the last key to be set, and ``iter(instance)``, ``keys()``, ``items()``, ``iterkeys()``, and ``iteritems()`` will contain case-sensitive keys. However, querying and contains testing is case insensitive:: cid = CaseInsensitiveDict() cid['Accept'] = 'application/json' cid['aCCEPT'] == 'application/json' # True list(cid) == ['Accept'] # True For example, ``headers['content-encoding']`` will return the value of a ``'Content-Encoding'`` response header, regardless of how the header name was originally stored. If the constructor, ``.update``, or equality comparison operations are given keys that have equal ``.lower()``s, the behavior is undefined. """ def __init__(self, data=None, **kwargs): # 初始化的时候进入,初始化一个 OrderedDict() self._store = OrderedDict() if data is None: data = {} self.update(data, **kwargs) # 把属性加入到 self 的__dict__里,也是一个字典操作。 def __setitem__(self, key, value): # key.lower() 把字符串转换成小写 # 这句话在属性赋值的时候会被调用。实现的无视字母大小写进行赋值 self._store[key.lower()] = (key, value) # setattr(self,key.lower(),(key, value)) def __getitem__(self, key): return self._store[key.lower()][1] def __delitem__(self, key): del self._store[key.lower()] def __iter__(self): return (casedkey for casedkey, mappedvalue in self._store.values()) #调用父类的__iter__ def __len__(self): return len(self._store) def lower_items(self): """Like iteritems(), but with all lowercase keys.""" return ( (lowerkey, keyval[1]) for (lowerkey, keyval) in self._store.items() ) def __eq__(self, other): if isinstance(other, Mapping): other = CaseInsensitiveDict(other) else: return NotImplemented # Compare insensitively return dict(self.lower_items()) == dict(other.lower_items()) # Copy is required def copy(self): return CaseInsensitiveDict(self._store.values()) def __repr__(self): # print 的时候会进入 print(isinstance(self.items(), Iterable)) # 输入可迭代对象,此时 ##内部实际 # dict(iterable) # d = {} # for k, v in iterable: #会调用__iter__ # d[k] = v return str(dict(self.items())) if __name__ == '__main__': dic = CaseInsensitiveDict() dic["name"] = "lisa" print(dic)
对于魔法函数的几个用法参考 python进阶之魔法函数,其他的内容看注释。
回到session.py我们定位到361行也就是self.hooks的那一行
hooks
requests中有一个钩子函数,那就是hooks其作用类似一个回调函数,会在成功请求之后再去执行这个钩子函数。上面的kwargs部分的时候用到过。下面我们就跟进看看hooks是怎实现这个回调功能的。
首先self.hooks = default_hooks()
,跟进发现有个默认的hooks
定位到了hooks.py的17行
HOOKS = ['response']
def default_hooks():
return dict((event, []) for event in HOOKS)
可以得知 self.hooks={"response":[]}也就是其初始值。
接下来models.py 233行我们找到
self.hooks = default_hooks()
for (k, v) in list(hooks.items()):
self.register_hook(event=k, hook=v)
对于前两行我们可以总结一个小例子:
HOOKS=["res"]
def default_hooks():
return dict((event, []) for event in HOOKS)
hooks=default_hooks()
for (k, v) in list(hooks.items()):
print(k,v)
输出
res []
我们继续看上面的for循环内的 self.register_hook(event=k, hook=v)
找到register_hook方法
def register_hook(self, event, hook):
"""Properly register a hook."""
if event not in self.hooks: #如果该键不在字典
raise ValueError('Unsupported event specified, with event name "%s"' % (event))#抛出异常
if isinstance(hook, Callable):#是否可调用,hook是个函数进入条件
self.hooks[event].append(hook)
elif hasattr(hook, '__iter__'):
self.hooks[event].extend(h for h in hook if isinstance(h, Callable))
到此处完成了self.hooks["response"]=[function<print_text style="font-size: inherit; color: inherit; line-height: inherit; margin: 0px; padding: 0px;">].
接下来
def merge_hooks(request_hooks, session_hooks, dict_class=OrderedDict):
"""request_hooks=上面的self.hooks
session_hooks={'response': []}
"""
if session_hooks is None or session_hooks.get('response') == []:
return request_hooks #返回
if request_hooks is None or request_hooks.get('response') == []:
return session_hooks
return merge_setting(request_hooks, session_hooks, dict_class)