“堆”的入门知识

一种特殊的二叉树——堆（Binary Heaps）

堆是具有以下两种特性（限制）的二叉树。

一、属于完全二叉树（complete binary tree）

二、每个节点的值必须大于或等于（或者小于等于）子节点的值

首先看下完全二叉树的定义：除最底层节点外，所有层节点都完全填充且必须从左侧开始填充。简单说就是节点必须从左向右，从上到下“连续”分布，中间不能出现空位。

例如以下三个都不是完全二叉树，为了方便演示我们用数字从小到大来代表节点顺序，虚线节点表示应该出现在这里，但未出现。

而这种顺序的结构是可以用数组来存储的。我们就能根据节点所处的数组下标位置，计算出对应父节点和子节点的位置。

假设节点D对应的数组下标为3，则D的

父节点B=(3-1)/2=1

左子节点H=3*2+1=7

右子节点I=3*2+2=8

根据这三个公式可以很方便地获取节点位置，比较节点值的大小。如图：

*为简化计算，数组下标也可从1开始，公式就调整为：父节点=i/2，左子节点=i*2，右子节点=i*2+1（i是当前节点对应的数组下标）。

再看下特性二

节点都要小于或等于子节点，这种被称为“小顶堆（min-heap）”。对应的“大顶堆（max-heap）”是节点都要大于或等于子节点。

当新增或者删除节点时，为保持顶堆特性必然要进行节点间的对比和交换，即“堆化（heapify）”，看下是如何操作的，代码如下：

*本示例中对应数组的存储从1开始，构建的是小顶堆

  1 import java.util.Arrays;
  2 
  3 public class Heap {
  4     public static void main(String[] args) {
  5         int[] numbers = {56, 26, 72, 42, 11, 12, 27, 99, 33, 66, 17, 30, 13, 90, 28};
  6         System.out.println("array " + Arrays.toString(numbers) + " total " + numbers.length);
  7 
  8         BinaryHeap binaryHeap = new BinaryHeap(numbers.length);
  9         for (int num : numbers) {
 10             System.out.printf("add %s->", num);
 11             binaryHeap.insert(num);
 12             System.out.println(binaryHeap + " count " + binaryHeap.count);
 13         }
 14         //测试堆满后的写入
 15         binaryHeap.insert(999);
 16         System.out.println("heap " + binaryHeap + " count " + binaryHeap.count);
 17         for (int i = 0; i < numbers.length; i++) {
 18             System.out.printf("remove %s<-", binaryHeap.data[1]);
 19             binaryHeap.removeMin();
 20             System.out.println(binaryHeap + " count " + binaryHeap.count);
 21         }
 22         //测试堆为空时的删除
 23         binaryHeap.removeMin();
 24         System.out.println("heap " + binaryHeap + " count " + binaryHeap.count);
 25     }
 26 
 27     private static class BinaryHeap {
 28         //当前堆里的节点个数
 29         int count = 0;
 30         //堆大小，即最多可保存的节点个数
 31         int size;
 32         //用数组来保存堆
 33         int[] data;
 34 
 35         public BinaryHeap(int size) {
 36             this.size = size;
 37             //注意！这里是从1开始保存的节点，因此要多申请一个空间
 38             data = new int[size + 1];
 39         }
 40 
 41         public void removeMin() {
 42             if (count == 0) {
 43                 System.out.println("heap is empty!");
 44                 return;
 45             }
 46             //用最后一个节点覆盖第一个节点
 47             data[1] = data[count];
 48             //覆盖完节点个数要-1，相当于把最后一个节点删掉了
 49             count--;
 50 
 51             //左侧子节点位置
 52             int leftChildIndex;
 53             //右侧子节点位置
 54             int rightChildIndex;
 55             //由于直接把最后一个节点替换掉第一个，所以从第一个节点开始向下处理
 56             int currentIndex = 1;
 57             //三个节点比较大小后，需要记录最小节点的位置
 58             int smallestIndex;
 59             while (true) {
 60                 //左侧子节点计算公式 i*2
 61                 leftChildIndex = currentIndex * 2;
 62                 //右侧子节点计算公式 i*2+1
 63                 rightChildIndex = leftChildIndex + 1;
 64                 //最小节点值默认成当前节点位置
 65                 smallestIndex = currentIndex;
 66 
 67                 //左侧子节点位置未超过最后一个节点位置，且小于当前最小节点值时，更新最小节点位置为左侧子节点位置
 68                 if (leftChildIndex <= count && data[leftChildIndex] < data[smallestIndex]) {
 69                     smallestIndex = leftChildIndex;
 70                 }
 71                 //右侧子节点位置未超过最后一个节点位置，且小于当前最小节点值时，更新最小节点位置为右侧子节点位置
 72                 if (rightChildIndex <= count && data[rightChildIndex] < data[smallestIndex]) {
 73                     smallestIndex = rightChildIndex;
 74                 }
 75 
 76                 //只要当前节点不是最小节点，就交换两者
 77                 if (currentIndex != smallestIndex) {
 78                     swap(currentIndex, smallestIndex);
 79                     //把当前位置调整为最小节点的位置，以进行下一轮的判断
 80                     currentIndex = smallestIndex;
 81                 } else {
 82                     break;
 83                 }
 84             }
 85         }
 86 
 87         public void insert(int num) {
 88             //确保节点个数不超过堆上限
 89             if (count >= size) {
 90                 System.out.println("heap is full!");
 91                 return;
 92             }
 93             //新增一个节点，计数就+1
 94             count++;
 95             //默认把新增的节点存储在当前堆中最后一个节点的后面
 96             data[count] = num;
 97 
 98             //从新增的这个节点的位置开始，向上逐个判断，如果小于父节点值，则交换两者位置，所以这是一个“小顶堆”
 99             int currentIndex = count;
100             //父节点位置
101             int parentIndex;
102             while (true) {
103                 //父节点计算公式=i/2
104                 parentIndex = currentIndex / 2;
105                 //由于我们从1开始计算第一个节点，0这个位置已经是不存在的节点了，退出即可
106                 if (parentIndex == 0) {
107                     break;
108                 }
109                 //父节点值比插入的值还大，由于这里是一个“小顶堆”，所以交换两者位置
110                 if (data[parentIndex] > num) {
111                     //这里直接交换了位置，其实是有优化空间的，可以想想如何减少交换次数？
112                     swap(currentIndex, parentIndex);
113                     //同时要把当前节点位置设置为父节点所在的位置，这样才能再取这个父节点的父节点，才能让循环进行下去
114                     currentIndex = parentIndex;
115                 } else {
116                     //父节点小于或等于插入值，就无需再处理了（后续父节点的值肯定比当前父节点值还要小）
117                     break;
118                 }
119             }
120         }
121 
122         //重写toString，来打印堆节点
123         @Override
124         public String toString() {
125             StringBuilder str = new StringBuilder("[");
126             for (int i = 1; i <= count; i++) {
127                 str.append(data[i]);
128                 if (i != count) {
129                     str.append(", ");
130                 }
131             }
132             return str.append("]").toString();
133         }
134 
135         //交换位置
136         private void swap(int pos1, int pos2) {
137             int tmp = data[pos1];
138             data[pos1] = data[pos2];
139             data[pos2] = tmp;
140         }
141     }
142 }

对应输出

array [56, 26, 72, 42, 11, 12, 27, 99, 33, 66, 17, 30, 13, 90, 28] total 15
add 56->[56] count 1
add 26->[26, 56] count 2
add 72->[26, 56, 72] count 3
add 42->[26, 42, 72, 56] count 4
add 11->[11, 26, 72, 56, 42] count 5
add 12->[11, 26, 12, 56, 42, 72] count 6
add 27->[11, 26, 12, 56, 42, 72, 27] count 7
add 99->[11, 26, 12, 56, 42, 72, 27, 99] count 8
add 33->[11, 26, 12, 33, 42, 72, 27, 99, 56] count 9
add 66->[11, 26, 12, 33, 42, 72, 27, 99, 56, 66] count 10
add 17->[11, 17, 12, 33, 26, 72, 27, 99, 56, 66, 42] count 11
add 30->[11, 17, 12, 33, 26, 30, 27, 99, 56, 66, 42, 72] count 12
add 13->[11, 17, 12, 33, 26, 13, 27, 99, 56, 66, 42, 72, 30] count 13
add 90->[11, 17, 12, 33, 26, 13, 27, 99, 56, 66, 42, 72, 30, 90] count 14
add 28->[11, 17, 12, 33, 26, 13, 27, 99, 56, 66, 42, 72, 30, 90, 28] count 15
heap is full!
heap [11, 17, 12, 33, 26, 13, 27, 99, 56, 66, 42, 72, 30, 90, 28] count 15
remove 11<-[12, 17, 13, 33, 26, 28, 27, 99, 56, 66, 42, 72, 30, 90] count 14
remove 12<-[13, 17, 27, 33, 26, 28, 90, 99, 56, 66, 42, 72, 30] count 13
remove 13<-[17, 26, 27, 33, 30, 28, 90, 99, 56, 66, 42, 72] count 12
remove 17<-[26, 30, 27, 33, 42, 28, 90, 99, 56, 66, 72] count 11
remove 26<-[27, 30, 28, 33, 42, 72, 90, 99, 56, 66] count 10
remove 27<-[28, 30, 66, 33, 42, 72, 90, 99, 56] count 9
remove 28<-[30, 33, 66, 56, 42, 72, 90, 99] count 8
remove 30<-[33, 42, 66, 56, 99, 72, 90] count 7
remove 33<-[42, 56, 66, 90, 99, 72] count 6
remove 42<-[56, 72, 66, 90, 99] count 5
remove 56<-[66, 72, 99, 90] count 4
remove 66<-[72, 90, 99] count 3
remove 72<-[90, 99] count 2
remove 90<-[99] count 1
remove 99<-[] count 0
heap is empty!
heap [] count 0

可以看出在插入元素时，我们直接写入到了堆的最后一个位置，然后再向上不断取它的父节点来进行比较，如果小于父节点，就交换两者，一直重复这个过程到根节点。流程示意图如下，新增一个节点2。整个流程是从底到顶处理的（由下至上）。

删除顶端节点时也是把最后一个位置的节点直接替换到顶部位置，再对顶部节点堆化。顶部节点和它的左右子节点比较，并和最小节点交换位置。之后在以交换后的位置为节点，继续取它的子节点，再比较大小交换位置，一直到达最后一个叶子节点后结束。流程示意图如下，删除堆顶节点（根节点）1。整个流程是由上至下处理的。

从输出的红框内容中可以看出，移除的节点值是从小到大的。我们可以利用这个特性来进行堆排序。

由于整个堆构建完毕后，顶部节点必然是最大（最小）的，只要取出顶部节点放置到数组最右侧，重新堆化后再取顶部节点，再放置到右侧第二个位置，一直重复直到处理完所有顶部节点。这样数组最终排列就是从小到大（或者从大到小）。

*顶部节点取出过程就是堆顶的删除过程，示意图可参考“堆顶节点删除”。

代码如下

 1 import java.util.Arrays;
 2 
 3 public class HeapSort {
 4     public static void main(String[] args) {
 5         int[] numbers = {7, 9, 8, 4, 6, 1, 3, 2, 5};
 6         sort(numbers);
 7         System.out.println("sort " + Arrays.toString(numbers));
 8     }
 9 
10     private static void sort(int[] numbers) {
11         //必须先构建好堆
12         buildMinHeap(numbers);
13         System.out.println("heap " + Arrays.toString(numbers));
14         int lastNodeIndex = numbers.length - 1;
15         while (lastNodeIndex > 0) {
16             //把当前的最后一个节点交换到第一个节点，即数组最后一个位置总是本轮最小的值，最终数组就是从大到小排列
17             swap(numbers, 0, lastNodeIndex);
18             //最右侧已经是最小值了，所以最后的节点位置左移一位
19             lastNodeIndex--;
20             //重新堆化
21             minHeapify(numbers, 0, lastNodeIndex);
22         }
23     }
24 
25     //构建堆
26     private static void buildMinHeap(int[] numbers) {
27         //最后一个节点的位置
28         int lastNodeIndex = numbers.length - 1;
29         //最后一个内部节点的位置，也就是最后一个节点的父节点
30         int lastInnerNodeIndex = (lastNodeIndex - 1) / 2;
31         //从最后一个内部节点来构建，而不是从最后一个叶子节点，想想为何？（叶子节点还有左右子节点吗）
32         for (int i = lastInnerNodeIndex; i >= 0; i--) {
33             minHeapify(numbers, i, lastNodeIndex);
34         }
35     }
36 
37     //实现一个小顶堆
38     private static void minHeapify(int[] numbers, int currentIndex, int lastNodeIndex) {
39         int leftChildIndex, rightChildIndex, smallestIndex;
40         while (true) {
41             leftChildIndex = currentIndex * 2 + 1;
42             rightChildIndex = leftChildIndex + 1;
43             smallestIndex = currentIndex;
44 
45             //有可能不存在左侧子节点 存在则判断两者大小
46             if (leftChildIndex <= lastNodeIndex && numbers[smallestIndex] > numbers[leftChildIndex]) {
47                 smallestIndex = leftChildIndex;
48             }
49             //有可能不存在右侧子节点 存在则判断两者大小
50             if (rightChildIndex <= lastNodeIndex && numbers[smallestIndex] > numbers[rightChildIndex]) {
51                 smallestIndex = rightChildIndex;
52             }
53 
54             if (smallestIndex != currentIndex) {
55                 //交换两者位置
56                 swap(numbers, smallestIndex, currentIndex);
57                 //由于该节点的数据可能是上一层置换下来的，不一定小于置换位置后的子节点值，所以需要再次比较。把当前位置重置为此轮中的最小节点的位置，这样进入下一轮时，才能继续判断三者中最小的。
58                 currentIndex = smallestIndex;
59             } else {
60                 break;
61             }
62         }
63     }
64 
65     //交换位置
66     private static void swap(int[] numbers, int pos1, int pos2) {
67         int tmp = numbers[pos1];
68         numbers[pos1] = numbers[pos2];
69         numbers[pos2] = tmp;
70     }
71 }

输出如下

heap [1, 2, 3, 4, 6, 8, 7, 9, 5]
sort [9, 8, 7, 6, 5, 4, 3, 2, 1]

内部节点

是至少有一个子节点的节点。在完全二叉树中，内部节点的的个数等于节点总数除2。最后一个内部节点的位置就是最后一个叶子节点的父节点的位置。

扩展思考

如何用堆实现求一组数的topN？

优先队列是否也可以用堆来实现？

堆排序相比快速排序有何优缺点，是否适用于实际开发？

堆的构建、节点新增、顶节点删除的可视化过程见

https://www.cs.usfca.edu/~galles/visualization/Heap.html

posted @ 2022-07-06 14:07 binary220615 阅读(177) 评论(0) 编辑收藏举报

刷新页面返回顶部

“堆”的入门知识

公告