Dictionaries and Sets

1. Handling missing keys with setdefault

import sys
import re

WORD_RE = re.compile('\w+')

index = {}

print(sys.argv)

# Example 3-2
with open(sys.argv[1], encoding='utf-8') as fp:
    for line_no, line in enumerate(fp, 1):
        for match in WORD_RE.finditer(line):
            # finditer 返回的格式: <_sre.SRE_Match object; span=(0, 4), match='User'> ;
            # 既有匹配到的内容,也有该内容的位置, match.start() 和 match.end()分别表起始位置和结束位置
            word = match.group()
            # match.group() 返回匹配到的内容: 如  User
            column_no = match.start() + 1
            location = (line_no, column_no)

            # 以下为常规写法:
            occurrences = index.get(word, [])
            occurrences.append(location)
            index[word] = occurrences

for word in sorted(index, key=str.upper):   # 对字典进行排序
    print(word, index[word])

print("-----------------------")

# Example 3-4:handling missing keys with setdefault
index2 = {}

with open(sys.argv[1], encoding='utf-8') as fp:
    for line_no, line in enumerate(fp, 1):
        for match in WORD_RE.finditer(line):
            word = match.group()
            column_no = match.start() + 1
            occurrences = (line_no, column_no)
            # Missing keys with setdefault
            index2.setdefault(word, []).append(occurrences)
            # setdefault :有就用它原来的,没有则设置
            # Get the list of occurrences for word, or set it to [] if not found;
            # setdefault returns the value, so it can be updated without requiring a second search.

for word in sorted(index2, key=str.upper):
    print(word, index2[word])

# Output 示例:
# flasgger [(3, 6), (4, 6)]
# flask [(2, 6)]
# Flask [(2, 19)]
# from [(2, 1), (3, 1), (4, 1)]
# import [(1, 1), (2, 12), (3, 15), (4, 21)]
# jsonify [(2, 26)]
# random [(1, 8)]
# request [(2, 35)]
# Swagger [(3, 22)]
# swag_from [(4, 28)]
# utils [(4, 15)]

"""
The result of this line ...
    my_dict.setdefault(key, []).append(new_value)
... is the same as running ...
    if key not in my_dict:
        my_dict[key] = []
    my_dict[key].append(new_value)
... except that the latter code performs at least two searches for key --- three if not found --- while setdefault
does it all with a single lookup.
"""

2. Mapping with Flexible Key Lookup

2.1 defaultdict: Another Take on Missing Keys

示例代码如下:

import re
import sys
import collections

WORD_RE = re.compile('\w+')

index = collections.defaultdict(list)

with open(sys.argv[1], encoding='utf-8') as fp:
    for line_no, line in enumerate(fp, 1):
        for match in WORD_RE.finditer(line):
            word = match.group()
            column_no = match.start() + 1
            occurrences = (line_no, column_no)

            # defaultdict 示例:
            index[word].append(occurrences)

for word in sorted(index, key=str.upper):
    print(word, index[word])

# Output:
# flasgger [(3, 6), (4, 6)]
# flask [(2, 6)]
# Flask [(2, 19)]
# from [(2, 1), (3, 1), (4, 1)]
# import [(1, 1), (2, 12), (3, 15), (4, 21)]
# jsonify [(2, 26)]
# random [(1, 8)]
# request [(2, 35)]
# Swagger [(3, 22)]
# swag_from [(4, 28)]
# utils [(4, 15)]

"""
defaultdict:
How defaultdict works:
    When instantiating a defaultdict, you provide a callable that is used to produce default value whenever __getitem__
    is passed a nonexistent key argument.
    For example, given an empty defaultdict created as dd = defaultdict(list), if 'new_key' is not in dd, the 
    expression dd['new_key'] does the following steps:
        1. Call list() to create a new list.
        2. Inserts the list into dd using 'new_key' as key.
        3. Returns a reference to that list.
        
The callable that produces the default values is held in an instance attribute called default_factory.
If no default_factory is provided, the usual KeyError is raised for missing keys.

The default_factory of a defaultdict is only invoked to provide default values for __getitem__ calls, and not for the
other methods. For example, if dd is a defaultdict, and k is a missing key, dd[k] will call the default_factory to 
create a default value, but dd.get(k) still returns None.

The mechanism that makes defaultdict work by calling default_factory is actually the __missing__ special method, a
feature supported by all standard mapping.
"""

2.2 The __missing__ Method

示例代码如下:

""" StrKeyDict0 converts nonstring keys to str on lookup """


class StrKeyDict0(dict):

    def __missing__(self, key):
        if isinstance(key, str):    # 如果没有这个判断,self[k] 在没有的情况下会无限递归调用 __missing__
            raise KeyError(key)
        return self[str(key)]

    def get(self, key, default=None):
        """
        The get method delegates to __getitem__ by using the self[key] notation; that gives the opportunity for
        our __missing__ to act.
        :param key:
        :param default:
        :return:
        """
        try:
            return self[key]
        except KeyError:
            return default

    def __contains__(self, key):
        # 此时不能用 key in self (self 指 StrKeyDict0 的实例,就是一个字典)进行判断,
        # 因为 k in dict 也会调用 __contains__ ,所以会出现无限递归调用 __contains__
        return key in self.keys() or str(key) in self.keys()

# A better way to create a user-defined mapping type is to subclass collections.UserDict instead of dict.


"""
Underlying the way mappings deal with missing keys is the aptly named __missing__ method. This method is not defined in
the base dict class, but dict is aware of it: if you subclass dict and provide a __missing__ method, the standard 
dict.__getitem__ will call it whenever a key is not found, instead of raising KeyError.

The __missing__ method is just called by __getitem__ (i.e., for the d[k] operator). The presence of a __missing__ method
has no effect on the behavior of other methods that look up keys, such as get or __contains__ .
"""

小结: 对于字典中不存在的 key ,有三种方式进行处理: 1. setdefault  2. collections.defaultdict  3. __missing__ 方法 

3. Variations of dict: UserDict

UserDict is designed to be subclassed.

示例代码:

""" convert non-string keys to str -- on insertion, update and lookup """
import collections


class StrKeyDict(collections.UserDict):

    def __missing__(self, key):
        if isinstance(key, str):
            raise KeyError(key)
        return self[str(key)]

    def __contains__(self, key):
        # self.data : UserDict 并不继承 dict,但它内部有一个 dict 的实例,叫 data, 这个 data 保存着 UserDict 实例的真正数据
        return str(key) in self.data

    def __setitem__(self, key, value):
        # UserDict 实例中的数据存放在 data 属性中
        # This method is easier to overwrite when we can delegate to the self.data attribute.
        self.data[str(key)] = value


"""
It's almost always easier to create a new mapping type by extending UserDict rather than dict. The main reason is that
the built-in has some implementation shortcuts that end up forcing us to override methods that we can just inherit
from UserDict with no problem.

UserDict does not inherit from dict, but has an internal dict instance, call data, which holds the actual items. This
avoids undesired recursion when coding special methods like __setitem__ , and simplify the coding of __contains__ .
"""

4. Immutable Mappings

示例代码如下:

>>> from types import MappingProxyType
>>> 
>>> d = {1: 'A'}
>>> d_proxy = MappingProxyType(d)
>>> d_proxy
mappingproxy({1: 'A'})
>>> d_proxy[1]              # Items in d can be seen through d_proxy
'A'
>>> d_proxy[2] = 'x'        # Changes cannot be made through d_proxy
Traceback (most recent call last):
  File "<stdin>", line 1, in <module>
TypeError: 'mappingproxy' object does not support item assignment
>>> d[2] = 'B'
>>> d_proxy                 # d_proxy is dynamic: any changes in d is reflected.
mappingproxy({1: 'A', 2: 'B'})
>>> 


"""
The mapping types provided by the standard library are all mutable, but you may need to guarantee that a user cannot 
change a mapping by mistake.

Since Python3.3, the types module provides a wrapper class called MappingProxyType, which, given a mapping, returns
a mappingproxy instance that is a read-only but dynamic view of the original mapping. So updates to the original
mapping can be seen in the mappingproxy, but changes cannot be made through it.
"""

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

end

posted @ 2020-01-12 00:52  neozheng  阅读(171)  评论(0编辑  收藏  举报