Python - 字典

字典推导
- 映射拆包
合并映射
字典视图
自动处理缺失的健

字典推导

dial_codes = [
    (880,'Bangladesh'),
    (55,'Brazil'),
    (86,'China'),
    (91,'India'),
    (62,'Indonesia'),
    (81,'Japan'),
    (234,'Nigeria'),
    (92,'Pakistan'),
    (7,'Russia'),
    (1,'United States'),
]

country_dial = {country: code for code, country in dial_codes}
print(country_dial) #out:{'Bangladesh': 880, 'Brazil': 55, 'China': 86, 'India': 91, 'Indonesia': 62, 'Japan': 81, 'Nigeria': 234, 'Pakistan': 92, 'Russia': 7, 'United States': 1}

print({code: country.upper()
       for country,code in sorted(country_dial.items())
       if code < 70})
#out: {55: 'BRAZIL', 62: 'INDONESIA', 7: 'RUSSIA', 1: 'UNITED STATES'}

像dial_codes 这种包含键值对的可迭代对象可以直接传给dict构造函数，但是这里我们对调了键和值的位置，以country为健，以code为值

映射拆包

首先，调用函数时，不止一个参数可以使用**。但是，所有健都要是字符串，而且在所有参数中是唯一的（因为关键字参数不可重复）

>>> def dump(**kwargs):
...     return kwargs
...
>>> dump(**{'x':1},y=2,**{'z':3})
{'x': 1, 'y': 2, 'z': 3}
>>> d1 = {'name':'zhangsan','age':23}
>>> dump(**d1,y=2,**{'z':3})
{'name': 'zhangsan', 'age': 23, 'y': 2, 'z': 3}
>>> dump(**{1:2},y=2,**{'z':3})  # 健必须是字符串
Traceback (most recent call last):
  File "<stdin>", line 1, in <module>
TypeError: keywords must be strings

其次，**可在dict字面量中使用，同样可以使用多次

>>> {'a':0,**{'x':1},'y':2,**{'z':3,'x':4}}
{'a': 0, 'x': 4, 'y': 2, 'z': 3}

这种情况允许健重复，后面的健覆盖前面的健，比如本例中x映射的值。
这种句法也可用于合并映射，但是合并映射还有其他形式

合并映射

Python 3.9 支持使用 | 和 |= 合并映射。二者是并集运算符，| 运算符创建一个新映射

>>> d1 = {'a':1,'b':3}
>>> d2 = {'a':2,'b':4,'c':6}
>>> d1 | d2
{'a': 2, 'b': 4, 'c': 6}
>>>

如果想就地更新现有映射，则使用 |= ，续前例，当时d1 没有变化，但是现在变了

>>> d1
{'a': 1, 'b': 3}
>>> d1 |= d2
>>> d1
{'a': 2, 'b': 4, 'c': 6}

字典视图

dict 的实例方法.keys()、.values()、.items()分别返回dict_keys、dict_values 和 dict_items类的实例。这些字典视图是dict内部实现使用的数据结构的只读投影。Python 2 种对应的方法返回列表，重复dict 中已有的数据，有一定的内存开销。另外，视图还取代了返回迭代器的旧方法

>>> d = dict(a=10,b=20,c=30)
>>> values = d.values()
>>> values
dict_values([10, 20, 30])  # 1 
>>> len(values) # 2
3
>>> list(values) # 3
[10, 20, 30]
>>> reversed(values) # 4
<dict_reversevalueiterator object at 0x000001F87E844540>
>>> values[0] # 5
Traceback (most recent call last):
  File "<stdin>", line 1, in <module>
TypeError: 'dict_values' object is not subscriptable
>>>

1.通过视图对象的字符串表示形式查看视图的内容
2.可以查询视图的长度
3.视图是可迭代对象，方便构建列表
4.视图实现了__reversed__方法，返回一个自定义迭代器
5.不能使用[]获取视图中的项

dict_keys、dict_values 和 dict_items 是内部类，不能通过__builtins__或标准库中的任何模块获取，尽管可以得到实例，但是
在Python代码中不能通过手动创建

>>> d['z'] = 40
>>> d
{'a': 10, 'b': 20, 'c': 30, 'z': 40}
>>> values
dict_values([10, 20, 30, 40])

dict_keys 类是最简单的字典视图类，只实现了__len__、__iter__和__reversed__这三个特殊方法

自动处理缺失的健

setdefault使用

"""
字典setdefault方法的使用:
d.setdefault(k, [default]):
如果有k，则返回。如果没有则让 k = default，然会返回default

案例： 统计文本中每个单词出现的位置
格式: {'单词‘，[(行号， 列号)，(行号，列号)]}
"""

import re
import sys

# 返回Pattern对象
WORD_RE = re.compile(r'\w+')  # [A-Za-z0-9_]

index = {}
with open('./zen.txt', encoding='utf-8') as fp:
    for line_no, line in enumerate(fp, 1):  # 获得行号和一行数据
        for match in WORD_RE.finditer(line):  # 匹配每一行中的每个单词
            word = match.group()
            column_no = match.start() + 1   # start() 该单词在整个串中的位置
            location = (line_no, column_no)

            # occurrences = index.get(word, [])  # <1>
            # occurrences.append(location)       # <2>
            # index[word] = occurrences          # <3>
            
            index.setdefault(word, []).append(location)


print(index)

for word in sorted(index, key=str.upper):  # <4>
    print(word, index[word])

zen.txt:

The Zen of Python, by Tim Peters

Beautiful is better than ugly.
Explicit is better than implicit.

输出：

{'The': [(1, 1)], 'Zen': [(1, 5)], 'of': [(1, 9)], 'Python': [(1, 12)], 'by': [(1, 20)], 'Tim': [(1, 23)], 'Peters': [(1, 27)], 'Beautiful': [(3, 1)], 'is': [(3, 11), (4, 10)], 'better': [(3, 14), (4, 13)], 'than': [(3, 21), (4, 20)], 'ugly': [(3, 26)], 'Explicit': [(4, 1)], 'implicit': [(4, 25)]}
Beautiful [(3, 1)]
better [(3, 14), (4, 13)]
by [(1, 20)]
Explicit [(4, 1)]
implicit [(4, 25)]
is [(3, 11), (4, 10)]
of [(1, 9)]
Peters [(1, 27)]
Python [(1, 12)]
than [(3, 21), (4, 20)]
The [(1, 1)]
Tim [(1, 23)]
ugly [(3, 26)]
Zen [(1, 5)]

Process finished with exit code 0

defaultdict

dd = defaultdict(list) ：如果new-key在dd中不存在，则表达式dd['new-key'] 会执行如下步骤：

调用list () 来新建立一个列表
2）将新列表作为值，new-key作为键，放入到字典中
3）返回列这个列表的引用
以上步骤对dd.get('new-key') 无效

>>> from collections import defaultdict
>>> dd = defaultdict(list)
>>> dd['name']
[]
>>> dd
defaultdict(<class 'list'>, {'name': []})
>>> dd2 = defaultdict(list)
>>> dd2.get('name')
>>> dd2
defaultdict(<class 'list'>, {})

missing 特殊方法

___missing__: 只会被__getitem__ 找不到键的时候(比如: d[k]) 调用，该方法对get或者__contains__(in 运算符会用到) 没有影响
它会在defaultdict 找不到键的时候调用default_factory(即创建defaultdict 传入的可调用对象)

应用场景：在查询的时候将键转化为str：

"""
d.get(key) | d[key] 都会触犯__getitem__()方法
但是d[key] 方式找不到键的时候如果存在__missing__()方法，会触发该方法执行
"""
class MyDict(dict):

    def __missing__(self, key):
        # print('__missing__ is run......')
        if isinstance(key, str):  # 如果本事就是字符串类型的键，凡是未找到
            raise KeyError  # 抛异常

        return self[str(key)]

    def get(self, key, default=None):
        try:
            return self[key]  # 如果找不到该键就会委托给__missing__，默认的get方法不具备该功能
        except KeyError:
            return default  # 处理异常，返回默认值

    def __contains__(self, key):
        return key in self.keys() or str(key) in self.keys()


if __name__ == '__main__':
    d = MyDict({'1': 1, '2': 2})

    print(d[1]) # 以非字符串的形式访问键 ，输出1
    print(d.get(1))  # 以非字符串的形式访问键 ，输出1

推荐实现方式：


"""
自定义映射类：推荐以UserDict为基类
"""
from collections import UserDict

class MyDict(UserDict):

    def __missing__(self, key):
        if isinstance(key, str):  # 如果本事就是字符串类型的键，凡是未找到
            raise KeyError  # 抛异常

        return self[str(key)]

    def __contains__(self, key):
        return str(key) in self.data

    def __setitem__(self, key, value):
        self.data[str(key)] = value


if __name__ == '__main__':
    d = MyDict({'1': 1, '2': 2})

    print(d[1])  # 以非字符串的形式访问键 ，输出1
    print(d.get(1))  # 以非字符串的形式访问键 ，输出1

posted @ 2023-05-03 20:54 chuangzhou 阅读(36) 评论(0) 编辑收藏举报

刷新页面返回顶部

认真的活在当下