Python之heapq模块的使用
heapq模块的作用
堆是一个树形的数据结构,其中子节点与父节点是一种有序关系。二叉堆:可以使用一个有组织的列表或数据表示,其中元素N的子元素位于2*N+1和2*N+2(索引从0开始)。这种布局允许原地重新组织堆,
从而不必在增加或删除元素时重新分配大量内存。
最大堆:确认父节点大于或等于两个子节点。
最小堆:要求父节点大于或等于子节点。
Python的heapq模块实现了一个最大堆。
1、准备演示的数据
data = [19, 9, 4, 10, 11]
2、准备演示的显示数据
import math from io import StringIO def show_tree(tree, total_width=36, fill=' '): """Pretty-print a tree.""" output = StringIO() last_row = -1 for i, n in enumerate(tree): if i: row = int(math.floor(math.log(i + 1, 2))) else: row = 0 if row != last_row: output.write('\n') columns = 2 ** row col_width = int(math.floor(total_width / columns)) output.write(str(n).center(col_width, fill)) last_row = row print(output.getvalue()) print('-' * total_width) print()
3、创建堆(第一种方式:heappush),从数据源增加新元素时会保持元素的堆排序顺序
import heapq from heapq_showtree import show_tree from heapq_heapdata import data heap = [] print('random :', data) print() for n in data: print('add {:>3}:'.format(n)) heapq.heappush(heap, n) show_tree(heap)
运行效果
random : [19, 9, 4, 10, 11] add 19: 19 ------------------------------------ add 9: 9 19 ------------------------------------ add 4: 4 19 9 ------------------------------------ add 10: 4 10 9 19 ------------------------------------ add 11: 4 10 9 19 11 ------------------------------------
4、创建堆(第二种方式:heapify),如果数据已经在内存中,那么使用heapify()原地重新组织列表中的元素会更高效
import heapq from heapq_showtree import show_tree from heapq_heapdata import data print('random :', data) heapq.heapify(data) print('heapified :') show_tree(data)
运行效果
random : [19, 9, 4, 10, 11]
heapified :
4
9 19
10 11
------------------------------------
5、删除最小的元素,heappop()
import heapq from heapq_showtree import show_tree from heapq_heapdata import data print('random :', data) heapq.heapify(data) print('heapified :') show_tree(data) print() for i in range(2): smallest = heapq.heappop(data) print('pop {:>3}:'.format(smallest)) show_tree(data)
random : [19, 9, 4, 10, 11] heapified : 4 9 19 10 11 ------------------------------------ pop 4: 9 10 19 11 ------------------------------------ pop 9: 10 11 19 ------------------------------------
6、删除现有的值,并且增加新元素替换
import heapq from heapq_showtree import show_tree from heapq_heapdata import data heapq.heapify(data) print('start:') show_tree(data) for n in [0, 13]: smallest = heapq.heapreplace(data, n) print('replace {:>2} with {:>2}:'.format(smallest, n)) show_tree(data)
运行效果
start: 4 9 19 10 11 ------------------------------------ replace 4 with 0: 0 9 19 10 11 ------------------------------------ replace 0 with 13: 9 10 19 13 11 ------------------------------------
7、堆数据的极限值
import heapq from heapq_heapdata import data print('all :', data) print('3 largest :', heapq.nlargest(3, data)) print('from sort :', list(reversed(sorted(data)[-3:]))) print('3 smallest:', heapq.nsmallest(3, data)) print('from sort :', sorted(data)[:3])
运行效果
all : [19, 9, 4, 10, 11] 3 largest : [19, 11, 10] from sort : [19, 11, 10] 3 smallest: [4, 9, 10] from sort : [4, 9, 10]
8、高效合并有序序列
传统的合并方法
list(sorted(itertools.chain(*data))) # 这个技术可能会占用大量内存。
import heapq import random random.seed(2016) data = [] for i in range(4): new_data = list(random.sample(range(1, 101), 5)) new_data.sort() data.append(new_data) for i, d in enumerate(data): print('{}: {}'.format(i, d)) print('\nMerged:') for i in heapq.merge(*data): print(i, end=' ') print()
运行效果
0: [33, 58, 71, 88, 95] 1: [10, 11, 17, 38, 91] 2: [13, 18, 39, 61, 63] 3: [20, 27, 31, 42, 45] Merged: 10 11 13 17 18 20 27 31 33 38 39 42 45 58 61 63 71 88 91 95