数据结构之堆heapq

为什么快速排序要比堆排序性能好?

堆排序数据访问的方式没有快速排序友好;堆排序的数据交换次数多

 

堆 priorityqueue -->删除最大值或者最小值

堆的应用:事件驱动模拟、图搜索、操作系统、计算机网络、任何需要得到最大值/最小值/优先级的

堆的性质:堆是一个完全二叉树;

堆中每一个节点的值都必须大于等于(或小于等于)其子树中每个节点的值。

 

完全二叉树比较适合用数组来存储。用数组来存储完全二叉树是非常节省存储空间的。因为不需要存储左右子节点的指针,单纯地通过数组的下标,就可以找到一个节点的左右子节点和父节点。

 

heapify/堆化:向上和向下两种

建堆:

 

堆是一种特殊的完全二叉树

如果父亲节点为i --> left child: 2i+1 right child: 2i+2

如果一个节点为i-->父节点:(i-1)//2

大顶堆:一棵完全二叉树,满足任一节点都比其孩子节点大

小顶堆:一棵完全二叉树,满足任一节点都比孩子节点小是一个

堆的向下调整性质(堆化)和向上调整性质:前提-->在二叉树中,只有一个位置不满足堆的性质,其他位置都满足堆的性质。

向下调整是让调整的节点与其孩子节点进行比较

向上调整是让调整的节点与其父亲节点进行比较

堆的向下调整性质:假设左右子树都是堆,但自身不是堆。当根节点的左右子树都是堆时,可以通过一次向下的调整来将其变换成一个堆。

构造堆:从下往上(从最后一个有孩子的节点开始调整)

# 大顶堆  建堆-->O(n) 挨个出数(挨个删除)-->O(nlogn)
def sift(li, low, high): #向下调整性质
    # li-->数组/树 low-->树根 high-->树最后一个节点的位置
    tmp = li[low]
    i = low
    j = low * 2 + 1
    #i指向空位,j指向两个孩子
    while j <= high: # 循环退出的第二种情况:j>high说明空位i是叶子节点
        if j + 1 <= high and li[j]<li[j+1]:
            j += 1
        if li[j] > tmp:
            li[i] = li[j]
            i = j
            j = 2 * i + 1
        else: # 循环退出的第一种情况:j位置的值比tmp小
              break
    li[i] = tmp    #注意是i,根节点  

# 堆排序!
def heap_sort(li):
         n = len(li)
         # 构造堆-->从最后一个有孩子节点的节点开始构造
         for low in range(n//2-1, -1, -1):#好坑:n//2-1!high每次都 
         #为-1,整个树的最后一个位置,只是为了防止越界,不会影响结果
                sift(li, low, n-1)
         # 挨个出数
         for high in range(n-1, 0, -1): ?需要确定一下
              li[0], li[high] = li[high], li[0]
              sift(li, 0, high-1)
         return li

堆排序-->平均时间复杂度:O(nlogn)  最好情况:O(nlogn) 最坏情况:O(nlogn)

    空间复杂度:O(1) 不稳定排序

对于一个node而言,insert的时间复杂度为O(logn),最多交换logn次,heap的插入操作是logn的。当有了一个heap的时,当来了一个新的元素的时候,永远是放到最后,然后去进行交换。

对于一个堆而言,里面有很多很多小堆。-->满足递归条件。

如何去创建一个heap?-->时间复杂度O(n)!

对于一个堆:

有n/2个leaf,有n/4个小heap -->小堆sift,每个最多只需要交换一次

对于高度为1的节点,有n/4个 -->最多需要交换两次

对于高度为2的节点,有n/8个 -->最多需要交换三次

......-->

n/2 + n/4 * 2 + n/8 * 3 + ......+ n/2^(i+1) * (i+1) = n

大顶堆-->查询/找最大值 时间复杂度O(1)

删除-->O(logn) 删除最大值,把最后一个元素移上去,再恢复堆的性质。

 1 # 自己的堆-->堆里面放的是字典  min heap
 2 class PriorityQueueBase:
 3     # Abstract base class for a priority queue
 4     
 5     class Item:
 6         #Lightweight composite to store priority queue items
 7         __slots__ = '_key', '_value'
 8         
 9         def __init__(self, k, v):
10             self._key = k
11             self._value = v
12 
13         def __lt__(self, other):  # 能够比较大小 实现__eq__方法或该方    
14 15             return self._key < other._key
16 
17         def is_empty(self):
18             return len(self) == 0
19 
20         def __str__(self):
21             return str(self._key)    
22 
23 class HeapPriorityQueue(PriorityQueueBase):
24 
25     def __init__(self):
26         self._data = []
27 
28     def __len__(self):
29         return len(self._data)
30 
31     def is_empty(self):
32         return len(self) == 0
33 
34     def add(self, key, value):
35         self._data.append(self.Item(key, value))
36         self._upheap(len(self._data)-1)
37 
38     def min(self):
39         if self.is_empty():
40             raise ValueError("Priority queue is empty")
41         item = self._data[0]
42         return (item._key, item._value)
43 
44     def remove_min(self):
45         if self.is_empty():
46             raise ValueError("Priority queue is empty")
47         self._swap(0, len(self._data)-1)
48         item = self._data.pop()
49         self._downheap(0)
50         return (item._key, item._value)
51 
52     def _parent(self, j):
53         return (j-1) // 2
54 
55     def _left(self, j):
56         return 2 * j + 1
57 
58     def _right(self, j):
59         return 2 * j + 2
60 
61     def _has_left(self, j):
62         return self._left(j) < len(self._data)
63 
64     def _has_right(self, j):
65         return self._right(j) < len(self._data)
66 
67     def _swap(self, i, j):
68         self._data[i], self._data[j] = self._data[j], self._data[i]
69     
70     def _upheap(j):
71         parent = self._parent(j)
72         if j > 0 and self._data[j] < self._data[parent]:
73             self._swap(j, parent)
74             self._upheap(parent)
75 
76     def _downheap(self, j):
77         if self._has_left(j):
78             left = self._data[j]
79             small_child = left
80             if self._has_right(j):
81                 right = self._right(j)
82                 if self._data[right] < self._data[left]:
83                     small_child = right
84             if self._data[small_child] < self._data[j]:
85                 self._swap(j, small_child)
86                 self._downheap(small_child)

 Python内置的heap

To create a heap, use a list initialized to [], or you can transform a populated list into a heap via function heapify(). 有两种创建heap的方式:heappush()-->一个一个push    heapify()-->数组

  • heapq.heappush(heap, item)

    Push the value item onto the heap, maintaining the heap invariant.

  • heapq.heappop(heap)

    Pop and return the smallest item from the heap, maintaining the heap invariant. If the heap is empty, IndexError is raised.

  • heapq.heappushpop(heap, item)

    Push item on the heap, then pop and return the smallest item from the heap. The combined action runs more efficiently than heappush() followed by a separate call to heappop().

  • heapq.heapify(x)

    Transform list x into a heap, in-place, in linear time.

  • heapq.heapreplace(heap, item)

    Pop and return the smallest item from the heap, and also push the new item. The heap size doesn’t change. If the heap is empty, IndexError is raised. This is more efficient than heappop() followed by heappush(), and can be more appropriate when using a fixed-size heap. Note that the value returned may be larger than item! That constrains reasonable uses of this routine unless written as part of a conditional replacement:

    if item > heap[0]: item = heapreplace(heap, item)

Tuples

The priority queue can store objects such as tuples:

The module also offers three general purpose functions based on heaps.

  • heapq.merge(*iterables)

    Merge multiple sorted inputs into a single sorted output (for example, merge timestamped entries from multiple log files). Returns an iterator over the sorted values.

    Similar to sorted(itertools.chain(*iterables)) but returns an iterable, does not pull the data into memory all at once, and assumes that each of the input streams is already sorted (smallest to largest).

  • heapq.nlargest(n, iterable[, key])

    Return a list with the n largest elements from the dataset defined by iterable. key, if provided, specifies a function of one argument that is used to extract a comparison key from each element in the iterable: key=str.lower Equivalent to: sorted(iterable, key=key, reverse=True)[:n]

  • heapq.nsmallest(n, iterable[, key]) Return a list with the n smallest elements from the dataset defined by iterable. key, if provided, specifies a function of one argument that is used to extract a comparison key from each element in the iterable: key=str.lower Equivalent to: sorted(iterable, key=key)[:n]

The latter two functions perform best for smaller values of n. For larger values, it is more efficient to use the sorted() function. Also, when n==1, it is more efficient to use the builtin min() and max() functions.

Class Objects

Python isn't strongly typed, so we can save anything we like: just as we stored a tuple of (priority,thing) in previous section. We can also store class objects if we override cmp() method:

 1 # Override __lt__ in Python 3, __cmp__ only in Python 2
 2 
 3 class Skill(object):
 4     def __init__(self, priority, description):
 5         self.priority = priority
 6         self.description = description
 7         print('New Level:', description)
 8         return
 9     def __cmp__(self, other):
10         return cmp(self.priority, other.priority)
11     def __lt__(self, other):
12         return self.priority < other.priority
13     def __repr__(self):
14         return str(self.priority) + ": " + self.descripti

 堆的应用:

优先级队列;(合并有序小文件)

 

posted @ 2020-02-23 21:16  LinBupt  阅读(249)  评论(0编辑  收藏  举报