二叉堆
参考资料
维基百科:https://en.wikipedia.org/wiki/Binary_heap
基本概念
• Binary Heap
▶ CLRS
The (binary) heap data structure is an array object that we can view as a nearly complete binary tree (see Section B.5.3), as shown in Figure 6.1. Each node of the tree corresponds to an element of the array. The tree is completely filled on all levels except possibly the lowest, which is filled from the left up to a point.
(二叉)堆的数据结构是一个数组对象,它可以被看成是一个近似的满二叉树结构(实际是一个完全二叉树结构)。树上的每一个结点都对应于数组中的一个元素。除了最底层外,该树的每一层都是满的,并且结点都是从左向右依次填充。
▶ Wikipedia
A binary heap is a heap data structure created using a binary tree. A binary heap is a complete binary tree; that is, all levels of the tree, except possibly the last one (deepest) are fully filled, and, if the last level of the tree is not complete, the nodes of that level are filled from left to right.
二叉堆是一种利用二叉树创建的堆数据结构。二叉堆是一棵完全二叉树,即除了最底层外,该树的每一层都是满的。如果最后一层不满,那么最后一层的结点从左向右依次填充。
▶ 符号标记
我们默认二叉堆是一个数组,表示该堆的数组我们记做A。任意一个结点的下标记做i,对于任意一个结点它的父结点、左孩子和右孩子的下标分别记做:PARENT(i)、LEFT(i)和RIGHT(i)。对于任意一个结点i,其结点的值为A[i],其父结点、左孩子和右孩子的结点值分别记做:A[PARENT(i)]、A[LEFT(i)]和A[RIGHT(i)]。数组A包括两个属性:A.length表示数组可以容纳元素的个数,A.heap-size表示当前有多少个堆元素存放在数组中,且0 ≤ A.heap-size ≤ A.length。
• Binary Heap Implementation
二叉堆通常用数组实现其数据结构。任意一个二叉树都可以用数组存储,由于二叉堆总是一棵完全二叉树,因此用数组存储将会更加紧凑。使用数组存储不需要额外的空间来存储指针信息,相反,任意一个结点的父节点和孩子结点可以通过算数方法计算来得到其在数组中存储的索引位置。索引位置的计算取决于根结点的位置,即取决于实现二叉堆的编程语言约束或编程人员的偏好。假设二叉堆元素个数为n,存储二叉堆的数组元素下标索引为i,那么有如下两种情况:
(1) 如果根节点的数组下标索引为0,那么下标索引为i的元素具有如下属性:
• 其孩子结点的数组下标索引为2 * i + 1和2 * i + 2,即LEFT(i) = 2 * i + 1,RIGHT(i) = 2 * i + 2
• 其父亲结点的数组下标索引为floor((i - 1) / 2),即PARENT(i) = floor((i - 1) / 2)
(2) 如果根节点的数组下标索引为1,那么下标索引为i的元素具有如下属性:
• 其孩子结点的数组下标索引为2 * i和2 * i + 1,即LEFT(i) = 2 * i,RIGHT(i) = 2 * i + 1
• 其父亲结点的数组下标索引为floor(i / 2),即PARENT(i) = floor(i / 2)
• Max Heap & Min Heap
▶ CLRS
There are two kinds of binary heaps: max-heaps and min-heaps. In both kinds, the values in the nodes satisfy a heap property, the specifics of which depend on the kind of heap. In a max-heap, the max-heap property is that for every node i other than the root, A[PARENT(i)] ≥ A[i], that is, the value of a node is at most the value of its parent. Thus, the largest element in a max-heap is stored at the root, and the subtree rooted at a node contains values no larger than that contained at the node itself. A min-heap is organized in the opposite way; the min-heap property is that for every node i other than the root, A[PARENT(i)] ≤ A[i]. The smallest element in a min-heap is at the root.
有这样两种形式的二叉堆:最大堆和最小堆。在这两种形式中,所有结点的值都要满足一个堆的性质,该性质根据最大堆和最小堆的差异有一些细节定义上的不同。在最大堆中,最大堆性质是指除了根结点以外的所有结点i都要满足:A[PARENT(i)] ≥ A[i],也就是说,某个结点的值不大于其父结点的值。因此,最大堆中的最大元素存储在根结点中,并且,在任一子树中,该子树所包含的所有结点的值都不大于该子树的根结点值。最小堆的组织方式正好相反,最小堆性质是指除了除了根结点以外的所有结点i都要满足:A[PARENT(i)] ≤ A[i]。最小堆中的最小元素存储在根结点中。
▶ Wikipedia
All nodes are either greater than or equal to or less than or equal to each of its children, according to a comparison predicate defined for the heap. Heaps with a mathematical "greater than or equal to" (≥) comparison predicate are called max-heaps; those with a mathematical "less than or equal to" (≤) comparison predicate are called min-heaps. Min-heaps are often used to implement priority queues.
堆中所有结点的值根据一个已定义的比较谓词,要么大于等于其孩子结点的值,要么小于等于其孩子结点的值。带有大于等于(≥)比较谓词的堆称为最大堆,带有小于等于(≤)比较谓词的堆称为最小堆。最小堆通常用于实现优先队列。
二叉堆维护操作
• MAX-HEAPIFY Procedure
▶ CLRS
In order to maintain the max-heap property, we call the procedure MAX-HEAPIFY. Its inputs are an array A and an index i into the array. When it is called, MAX-HEAPIFY assumes that the binary trees rooted at LEFT(i) and RIGHT(i) are max-heaps, but that A[i] might be smaller than its children, thus violating the max-heap property. MAX-HEAPIFY lets the value at A[i]“float down”in the max-heap so that the subtree rooted at index i obeys the max-heap property.
维护最大堆性质的过程称为MAX-HEAPIFY。它的输入是一个数组A和数组的一个下标i。当MAX-HEAPIFY过程被调用时,它假定根结点为LEFT(i)和RIGHT(i)的二叉树都是最大堆,但此时A[i]有可能小于其左右孩子,这就违背了最大堆性质。MAX-HEAPIFY过程通过让A[i]的值在最大堆中“逐级下降”,从而使以下标i为根结点的子树重新遵循最大堆性质。
下面是MAX-HEAPIFY过程的伪代码:
MAX-HEAPIFY (A, i): left = LEFT(i) right = RIGHT(i) largest = i if left ≤ A.heap-size and A[left] > A[largest] largest = left if right ≤ A.heap-size and A[right] > A[largest] largest = right if largest ≠ i SWAP A[i] and A[largest] Max-Heapify(A, largest)
• INSERT Procedure
▶ Wikipedia
To add an element to a heap we must perform an up-heap operation (also known as bubble-up, percolate-up, sift-up, trickle-up, heapify-up, or cascade-up), by following this algorithm:
1. Add the element to the bottom level of the heap.
2. Compare the added element with its parent; if they are in the correct order, stop.
3. If not, swap the element with its parent and return to the previous step.
为了向堆中添加一个元素X,我们必须执行一个向上冒泡的操作,具体算法描述如下:
1. 将元素X添加到堆的最底层;
2. 将元素X结点与其父亲结点进行比较,如果两者处于正确的次序,算法停止;
3. 否则,交换元素X结点和父亲结点并返回2的步骤。
以最大堆为例,插入过程如下图示,我们假定插入的元素X的值为15:
下面是INSERT过程的伪代码:
// 根节点的数组下标索引为1 INSERT (A, x) if A.heap-size ≥ A.length return FALSE A[++A.heap-size] = x i = A.heap-size while i ≠ 1 j = PARENT(i) if A[i] ≤ A[j] break SWAP A[i] and A[j] i = j return TRUE