基于RNN的NLP学习（二）

首先，根据上一节说的，我对学习基于RNN的NLP有了一个大致的规划，现在我将从第一节开始学习：

1. 复习Python基础：

确保你对Python的基础语法、数据结构（如列表、字典、集合）、控制流（如循环、条件语句）、函数和类有扎实的理解。
学习使用Python的标准库，特别是与数据处理相关的库，如re（正则表达式）、collections和itertools。

我假装我已经确保对Python的基础语法、数据结构（如列表、字典、集合）、控制流（如循环、条件语句）、函数和类有扎实的理解。
我在日常生产工作中，经常会用到re库，所以这里也不再赘述，具体关于re的学习，大家可以参考菜鸟教程。

所以这节主要学习collections和itertools这两个库：

1. collections库

这个模块实现了一些专门化的容器，提供了对 Python 的通用内建容器 dict、list、set 和 tuple 的补充。

namedtuple() : 一个工厂函数，用来创建元组的子类，子类的字段是有名称的。

 from collections import namedtuple
Point = namedtuple('Point', ['x', 'y'])
p = Point(11, y=22)
# Point(x=11, y=22)

deque() : 类似列表的容器，但 append 和 pop 在其两端的速度都很快。
deque（双端队列）是 collections 模块中的一个容器类型，它支持在两端快速地添加或删除元素。这意味着你可以在队列的头部或尾部进行 append 和 pop 操作。

下面是 deque 的一些基本操作：

append(x)：在 deque 的右侧添加一个元素 x。

 from collections import deque
d = deque([1, 2, 3])
d.append(4)  # deque([1, 2, 3, 4])

appendleft(x)：在 deque 的左侧添加一个元素 x。

d.appendleft(0) # deque([0, 1, 2, 3, 4])

pop()：从 deque 的右侧移除并返回一个元素。如果 deque 为空，则引发 IndexError。

d.pop() # 4, deque 现在是 deque([0, 1, 2, 3])

popleft()：从 deque 的左侧移除并返回一个元素。如果 deque 为空，则引发 IndexError。

d.popleft() # 0, deque 现在是 deque([1, 2, 3])

deque 特别适合用于需要频繁在两端添加或删除元素的场景，因为它提供了 O(1) 时间复杂度的操作，而列表在这种情况下会有 O(n) 的时间复杂度。这使得 deque 成为实现队列和栈等数据结构的理想选择。

ChainMap : 类似字典的类，用于创建包含多个映射的单个视图。

Counter : 用于计数 hashable 对象的字典子类

 from collections import Counter
c = Counter('abracadabra')
# Counter({'a': 5, 'b': 2, 'r': 2, 'c': 1, 'd': 1})

OrderedDict : 字典的子类，能记住条目被添加进去的顺序。

OrderedDict 是 collections 模块中的一个类，它提供了一个字典的子类，它记住了元素添加的顺序。在 Python 3.7 及以上版本中，所有的字典都是有序的，但是 OrderedDict 仍然存在，因为它提供了一些额外的功能，比如重新排序等。

下面是 OrderedDict 的一些基本用法：

初始化：创建一个空的 OrderedDict。

 from collections import OrderedDict
ordered_dict = OrderedDict()

添加元素：与普通字典一样，可以通过键值对添加元素。

 ordered_dict['apple'] = 1
ordered_dict['banana'] = 2

顺序维护：OrderedDict 会按照元素添加的顺序维护键的位置。

 # 输出：OrderedDict([('apple', 1), ('banana', 2)])
print(ordered_dict)

重新排序：可以使用 move_to_end() 方法将元素移动到字典的开头或结尾。

 # 将 'apple' 移动到字典的末尾
ordered_dict.move_to_end('apple')
# 输出：OrderedDict([('banana', 2), ('apple', 1)])
print(ordered_dict)

排序：可以对 OrderedDict 进行排序，排序后的字典会按照指定的顺序排列。

 # 按照键进行排序
ordered_dict = OrderedDict(sorted(ordered_dict.items(), key=lambda t: t[0]))

等值判断：OrderedDict 在比较时也会考虑元素的顺序。

 # 如果两个OrderedDict的顺序和元素都相同，则它们相等
if ordered_dict == another_ordered_dict:
    print("The dictionaries are equal.")

在 Python 3.7 之前，如果你需要维护字典元素的插入顺序，OrderedDict 是非常有用的。但是，在 Python 3.7 及以后的版本中，普通字典类型 dict 也保证了插入顺序，所以 OrderedDict 的使用频率有所下降。不过，如果你需要利用 OrderedDict 的特定功能，如重新排序等，它仍然是一个有用的工具。

defaultdict : 字典的子类，通过调用用户指定的工厂函数，为键提供默认值。

 from collections import defaultdict
d = defaultdict(int)
# defaultdict(<class 'int'>, {})

UserDict : 封装了字典对象，简化了字典子类化

UserDict 是 collections 模块中的一个类，它提供了一个字典对象的包装器。UserDict 是一个方便的基类，当你想要创建你自己的字典子类时，可以使用它。它简化了字典子类的实现，因为你不需要担心所有字典需要的方法，UserDict 已经为你处理了大部分。

下面是 UserDict 的一些基本用法：

初始化：创建一个空的 UserDict。

 from collections import UserDict
user_dict = UserDict()

使用字典初始化：可以用一个字典来初始化 UserDict。

user_dict = UserDict({'a': 1, 'b': 2})

访问元素：与普通字典一样，可以通过键来访问元素。

print(user_dict['a']) # 输出：1

修改元素：可以修改 UserDict 中的元素。

user_dict['a'] = 3

添加元素：可以向 UserDict 中添加新的键值对。

user_dict['c'] = 4

删除元素：可以使用 del 关键字删除元素。

del user_dict['a']

使用 data 属性：UserDict 实际上使用了一个名为 data 的内部字典来存储数据。

print(user_dict.data) # 输出：{'b': 2, 'c': 4}

UserList : 封装了列表对象，简化了列表子类化

UserList 是 collections 模块中的一个类，它提供了一个列表对象的包装器。与 UserDict 类似，UserList 是一个方便的基类，当你想要创建你自己的列表子类时，可以使用它。它简化了列表子类的实现，因为你不需要担心所有列表需要的方法，UserList 已经为你处理了大部分。

下面是 UserList 的一些基本用法：

初始化：创建一个空的 UserList。

 from collections import UserList
user_list = UserList()

使用列表初始化：可以用一个列表来初始化 UserList。

user_list = UserList([1, 2, 3])

访问元素：与普通列表一样，可以通过索引来访问元素。

print(user_list[0]) # 输出：1

修改元素：可以修改 UserList 中的元素。

user_list[0] = 4

添加元素：可以使用 append() 方法向 UserList 中添加新的元素。

user_list.append(5)

删除元素：可以使用 remove() 方法删除元素。

user_list.remove(4)

使用 data 属性：UserList 实际上使用了一个名为 data 的内部列表来存储数据。

print(user_list.data) # 输出：[2, 3, 5]

UserString : 封装了字符串对象，简化了字符串子类化

UserString 是 collections 模块中的一个类，它提供了一个字符串对象的包装器。UserString 是一个方便的基类，当你想要创建你自己的字符串子类时，可以使用它。它简化了字符串子类的实现，因为你不需要担心所有字符串需要的方法，UserString 已经为你处理了大部分。

下面是 UserString 的一些基本用法：

初始化：创建一个空的 UserString。

 from collections import UserString
user_string = UserString()

使用字符串初始化：可以用一个字符串来初始化 UserString。

user_string = UserString("hello")

访问字符：与普通字符串一样，可以通过索引来访问字符。

print(user_string[0]) # 输出：h

修改字符：可以修改 UserString 中的字符。

user_string[0] = 'H'

切片操作：可以使用切片来获取字符串的子串。

print(user_string[1:4]) # 输出：ell

使用 data 属性：UserString 实际上使用了一个名为 data 的内部字符串来存储数据。

print(user_string.data) # 输出：Hello

2. itertools库

Python 的 itertools 库是一个非常有用的模块，它提供了一系列用于生成和操作迭代器的函数。这些函数可以用来创建复杂的迭代模式，并且可以帮助你高效地处理数据。

下面是 itertools 库中一些常用的函数：

count(start=0, step=1)：创建一个无限迭代器，从 start 开始，每次增加 step。

 from itertools import count
for i in count(10, 3):
    if i > 20:
        break
    print(i, end=' ')  # 输出：10 13 16 19

cycle(iterable)：无限重复 iterable 中的元素。

 from itertools import cycle
count = 0
for item in cycle(['a', 'b', 'c']):
    if count > 7:
        break
    print(item, end=' ')  # 输出：a b c a b c a b
    count += 1

repeat(object, times)：重复 object times 次。

 from itertools import repeat
for item in repeat('hello', 3):
    print(item)  # 输出：hello hello hello

chain(*iterables)：将多个可迭代对象连接起来。

 from itertools import chain
for item in chain([1, 2, 3], ['a', 'b', 'c']):
    print(item, end=' ')  # 输出：1 2 3 a b c

islice(iterable, start, stop[, step])：返回 iterable 的切片，类似于 list 的切片操作。

 from itertools import islice
for item in islice(range(10), 2, 8, 2):
    print(item, end=' ')  # 输出：2 4 6

takewhile(predicate, iterable)：从 iterable 中取出元素直到 predicate 为假。

 from itertools import takewhile
numbers = [1, 2, 3, 4, 5, 6, 7, 8, 9, 10]
for number in takewhile(lambda x: x < 5, numbers):
    print(number, end=' ')  # 输出：1 2 3 4

dropwhile(predicate, iterable)：从 iterable 中删除元素直到 predicate 为假。

 from itertools import dropwhile
numbers = [1, 2, 3, 4, 5, 6, 7, 8, 9, 10]
for number in dropwhile(lambda x: x < 5, numbers):
    print(number, end=' ')  # 输出：5 6 7 8 9 10

filterfalse(predicate, iterable)：返回 iterable 中 predicate 为假的元素。

 from itertools import filterfalse
numbers = [1, 2, 3, 4, 5, 6, 7, 8, 9, 10]
for number in filterfalse(lambda x: x % 2 == 0, numbers):
    print(number, end=' ')  # 输出：1 3 5 7 9

groupby(iterable, key=None)：按照 key 函数对 iterable 进行分组。

 from itertools import groupby
people = [{'name': 'John', 'age': 25},
          {'name': 'Jane', 'age': 25},
          {'name': ' Doe', 'age': 30}]
people.sort(key=lambda x: x['age'])
for age, group in groupby(people, key=lambda x: x['age']):
    print(f"Age {age}: {list(group)}")

tee(iterable, n=2)：从 iterable 创建 n 个独立的迭代器。

 from itertools import tee
numbers = [1, 2, 3, 4, 5]
iter1, iter2 = tee(numbers)
for item in iter1:
    print(item, end=' ')  # 输出：1 2 3 4 5
print()
for item in iter2:
    print(item, end=' ')  # 输出：1 2 3 4 5

zip_longest(*iterables, fillvalue=None)：将多个迭代器中最长的那个的元素与 fillvalue 填充的元素组合起来。

 from itertools import zip_longest
for item in zip_longest([1, 2, 3], ['a', 'b'], fillvalue='N/A'):
    print(item, end=' ')  # 输出：(1, 'a') (2, 'b') (3, 'N/A')

product(*iterables, repeat=1)：计算多个迭代器的笛卡尔积。

 from itertools import product
for item in product([1, 2], ['a', 'b']):
    print(item, end=' ')  # 输出：(1, 'a') (1, 'b') (2, 'a') (2, 'b')

permutations(iterable, r=None)：生成 iterable 中元素的所有可能的排列组合。

 from itertools import permutations
for item in permutations([1, 2, 3], 2):
    print(item, end=' ')  # 输出：(1, 2) (1, 3) (2, 1) (2, 3) (3, 1) (3, 2)

combinations(iterable, r)：生成 iterable 中元素的所有可能的组合，不考虑顺序。

 from itertools import combinations
for item in combinations([1, 2, 3], 2):
    print(item, end=' ')  # 输出：(1, 2) (1, 3) (2, 3)

combinations_with_replacement(iterable, r)：生成 iterable 中元素的所有可能的组合，考虑重复元素。

 from itertools import combinations_with_replacement
for item in combinations_with_replacement([1, 2, 3], 2):
    print(item, end=' ')  # 输出：(1, 1) (1, 2) (1, 3) (2, 2) (2, 3) (3, 3)

参考资料：

posted @ 2024-04-18 10:32 哑巴老六阅读(15) 评论(0) 编辑收藏举报

刷新页面返回顶部

登录后才能查看或发表评论，立即登录或者逛逛博客园首页

相关博文：

· 基于RNN的NLP学习（一）

· 基于RNN的NLP学习（四）

· [编程基础] Python内置模块collections使用笔记

· Python之collections模块

· Python强大的内置模块collections

阅读排行：
· 震惊！C++程序真的从main开始吗？99%的程序员都答错了
· 【硬核科普】Trae如何「偷看」你的代码？零基础破解AI编程运行原理
· 单元测试从入门到精通
· 上周热点回顾（3.3-3.9）
· winform 绘制太阳，地球，月球运作规律

阅读目录(Content)

此页目录为空

哑巴老六

"总以为没走过的道路开满了鲜花"

念两句诗

基于RNN的NLP学习（二）

公告

搜索

常用链接

随笔分类

随笔档案

阅读排行榜

	from collections import namedtuple
	Point = namedtuple('Point', ['x', 'y'])
	p = Point(11, y=22)
	# Point(x=11, y=22)

	from collections import deque
	d = deque([1, 2, 3])
	d.append(4) # deque([1, 2, 3, 4])

	from collections import Counter
	c = Counter('abracadabra')
	# Counter({'a': 5, 'b': 2, 'r': 2, 'c': 1, 'd': 1})

	from collections import OrderedDict
	ordered_dict = OrderedDict()

	# 输出：OrderedDict([('apple', 1), ('banana', 2)])
	print(ordered_dict)

	# 将 'apple' 移动到字典的末尾
	ordered_dict.move_to_end('apple')
	# 输出：OrderedDict([('banana', 2), ('apple', 1)])
	print(ordered_dict)

	# 按照键进行排序
	ordered_dict = OrderedDict(sorted(ordered_dict.items(), key=lambda t: t[0]))

	# 如果两个OrderedDict的顺序和元素都相同，则它们相等
	if ordered_dict == another_ordered_dict:
	print("The dictionaries are equal.")

	from collections import defaultdict
	d = defaultdict(int)
	# defaultdict(<class 'int'>, {})

	from collections import UserString
	user_string = UserString()

	from itertools import count
	for i in count(10, 3):
	if i > 20:
	break
	print(i, end=' ') # 输出：10 13 16 19

	from itertools import cycle
	count = 0
	for item in cycle(['a', 'b', 'c']):
	if count > 7:
	break
	print(item, end=' ') # 输出：a b c a b c a b
	count += 1

	from itertools import repeat
	for item in repeat('hello', 3):
	print(item) # 输出：hello hello hello

	from itertools import chain
	for item in chain([1, 2, 3], ['a', 'b', 'c']):
	print(item, end=' ') # 输出：1 2 3 a b c

	from itertools import islice
	for item in islice(range(10), 2, 8, 2):
	print(item, end=' ') # 输出：2 4 6

	from itertools import takewhile
	numbers = [1, 2, 3, 4, 5, 6, 7, 8, 9, 10]
	for number in takewhile(lambda x: x < 5, numbers):
	print(number, end=' ') # 输出：1 2 3 4

	from itertools import dropwhile
	numbers = [1, 2, 3, 4, 5, 6, 7, 8, 9, 10]
	for number in dropwhile(lambda x: x < 5, numbers):
	print(number, end=' ') # 输出：5 6 7 8 9 10

	from itertools import filterfalse
	numbers = [1, 2, 3, 4, 5, 6, 7, 8, 9, 10]
	for number in filterfalse(lambda x: x % 2 == 0, numbers):
	print(number, end=' ') # 输出：1 3 5 7 9

	from itertools import groupby
	people = [{'name': 'John', 'age': 25},
	{'name': 'Jane', 'age': 25},
	{'name': ' Doe', 'age': 30}]
	people.sort(key=lambda x: x['age'])
	for age, group in groupby(people, key=lambda x: x['age']):
	print(f"Age {age}: {list(group)}")

	from itertools import tee
	numbers = [1, 2, 3, 4, 5]
	iter1, iter2 = tee(numbers)
	for item in iter1:
	print(item, end=' ') # 输出：1 2 3 4 5
	print()
	for item in iter2:
	print(item, end=' ') # 输出：1 2 3 4 5

	from itertools import zip_longest
	for item in zip_longest([1, 2, 3], ['a', 'b'], fillvalue='N/A'):
	print(item, end=' ') # 输出：(1, 'a') (2, 'b') (3, 'N/A')

	from itertools import product
	for item in product([1, 2], ['a', 'b']):
	print(item, end=' ') # 输出：(1, 'a') (1, 'b') (2, 'a') (2, 'b')

	from itertools import permutations
	for item in permutations([1, 2, 3], 2):
	print(item, end=' ') # 输出：(1, 2) (1, 3) (2, 1) (2, 3) (3, 1) (3, 2)

	from itertools import combinations
	for item in combinations([1, 2, 3], 2):
	print(item, end=' ') # 输出：(1, 2) (1, 3) (2, 3)

	from itertools import combinations_with_replacement
	for item in combinations_with_replacement([1, 2, 3], 2):
	print(item, end=' ') # 输出：(1, 1) (1, 2) (1, 3) (2, 2) (2, 3) (3, 3)