The two most common operations performed on data stored in a computer
are sorting and searching. This has been true since the beginning of the computing
industry, which means that sorting and searching are also two of the
most studied operations in computer science. Many of the data structures discussed
in this book are designed primarily to make sorting and/or searching
easier and more efficient on the data stored in the structure.
This chapter introduces you to the fundamental algorithms for sorting
and searching data. These algorithms depend on only the array as a data
structure and the only “advanced” programming technique used is recursion.
This chapter also introduces you to the techniques we’ll use throughout the
book to informally analyze different algorithms for speed and efficiency.
在计算机中数据存取最常用的两个操作是查找和排序。从计算机工业最开始就是如此。
意思是说计算机科学中查找和排序是被研究最多的两种操作。本书中很多数据都是为使查找和排序更简单有效。
本章将介绍查找和排序算法的内涵,只有当数组作为一种数据结构并且使用了递归时才依赖这些算法。
本章还介绍了一种分析各种不同算法的速度和效率的技术。
SORTING ALGORITHMS
Most of the data we work with in our day-to-day lives is sorted. We look up
definitions in a dictionary by searching alphabetically. We look up a phone
number by moving through the last names in the book alphabetically. The
post office sorts mail in several ways—by zip code, then by street address,
and then by name. Sorting is a fundamental process in working with data and
deserves close study.
As was mentioned earlier, there has been quite a bit of research performed
on different sorting techniques. Although some very sophisticated sorting
algorithms have been developed, there are also several simple sorting algorithms
you should study first. These sorting algorithms are the insertion sort,
the bubble sort, and the selection sort. Each of these algorithms is easy to
understand and easy to implement. They are not the best overall algorithms
for sorting by any means, but for small data sets and in other special circumstances,
they are the best algorithms to use.
排序算法
生活中日复一日的工作是有序的,我们在字典中按abc查找内容,在电话中按姓名查找号码,邮局按邮编发信。可见排序与人活息息相关。
前面提到过,关于排序技术的研究很多,尽管一些复杂的算法已经被开发出来,但是一些你得先学会简单的算法。
这些算法是 插入排序,冒泡排序,选择排序,这些都很容易理解,实现也很简单,但某种意义上不是最好的算法,
但在数据量很小的时候这些算法都挺不错的。
An Array Class Test Bed
To examine these algorithms, we will first need a test bed in which to implement
and test them.We’ll build a class that encapsulates the normal operations
performed with an array—element insertion, element access, and displaying
the contents of the array. Here’s the code:
一个数组类的测试框架
我们需要用一个测试去检验这些算法。这里建立了一个类并且实现了一些常用的操作:插入,存取,显示元素
class CArray {
private int [] arr;
private int upper;
private int numElements;
public CArray(int size) {
arr = new int[size];
upper = size-1;
numElements = 0;
}
public void Insert(int item) {
arr[numElements] = item;
numElements++;
}
public void DisplayElements() {
for(int i = 0; i <= upper; i++)
Console.Write(arr[i] + " ");
}
public void Clear() {
for(int i = 0; i <= upper; i++)
44 BASIC SORTING ALGORITHMS
arr[i] = 0;
numElements = 0;
}
}
static void Main() {
CArray nums = new CArray();
for(int i = 0; i <= 49; i++)
nums.Insert(i);
nums.DisplayElements();
}
Before leaving the CArray class to begin the examination of sorting and
searching algorithms, let’s discuss how we’re going to actually store data in a
CArray class object. In order to demonstrate most effectively how the different
sorting algorithms work, the data in the array needs to be in a random order.
This is best achieved by using a random number generator to assign each
array element to the array.
Random numbers can be created in C# using the Random class. An object of
this type can generate random numbers. To instantiate a Random object, you
have to pass a seed to the class constructor. This seed can be seen as an upper
bound for the range of numbers the random number generator can create.
Here’s another look at a program that uses the CArray class to store numbers,
using the random number generator to select the data to store in the
array:
开始之前,先说明一下这个类的对象怎么实现存储数据。为了展示不同的算法如何工作。
数组中的数据是随机序的。
随机数是C#中是用Random类生成的,一个Random的对象可以生成很多随机数字,实例化一个对象时,
需要给构造函数传入一个种子,种子可以视为随机数的上限。顺便也看一下下面的代码怎么存储数字吧。
static void Main() { CArray nums = new CArray(); Random rnd = new Random(100); for(int i = 0; i < 10; i++) nums.Insert((int)(rnd.NextDouble() * 100)); nums.DisplayElements(); }
输出结果是:
72 54 59 30 31 78 2 77 82 72
Bubble Sort
The first sorting algorithm to examine is the bubble sort. The bubble sort is
one of the slowest sorting algorithms available, but it is also one of the simplest
sorts to understand and implement, which makes it an excellent candidate
for our first sorting algorithm.
The sort gets its name because values “float like a bubble” from one end of
the list to another. Assuming you are sorting a list of numbers in ascending
order, higher values float to the right whereas lower values float to the left.
This behavior is caused by moving through the list many times, comparing
adjacent values and swapping them if the value to the left is greater than the
value to the right.
Figure 3.1 illustrates how the bubble sort works. Two numbers from the
numbers inserted into the array (2 and 72) from the previous example are
highlighted with circles. You can watch how 72 moves from the beginning of
the array to the middle of the array, and you can see how 2 moves from just
past the middle of the array to the beginning of the array.
冒泡法
冒泡法是各种算法中最慢的一种。但是他是实现起来最简单的一种。
叫冒泡是因为就算一个接一个浮起的泡泡。假设你想升序排序这些数字。大数在右,小数左边。
这个算法要在这些数字中比例很多次,比较相邻的两个值,如果左边大于右边的就交换位置。
图3.1展示了冒泡法的全过程。
你可以看一下72是如何从最开始移动到中间的。你也可以看看2是怎么从中后面移到开始的。
代码如下:
public void BubbleSort() {
int temp;
for(int outer = upper; outer >= 1; outer--) {
for(int inner = 0; inner <= outer-1;inner++)
if ((int)arr[inner] > arr[inner+1]) {
temp = arr[inner];
arr[inner] = arr[inner+1];
arr[inner+1] = temp;
}
}
}
There are several things to notice about this code. First, the code to swap
two array elements is written in line rather than as a subroutine. A swap
subroutine might slow down the sorting since it will be called many times.
Since the swap code is only three lines long, the clarity of the code is not
sacrificed by not putting the code in its own subroutine.
More importantly, notice that the outer loop starts at the end of the array
and moves toward the beginning of the array. If you look back at Figure 3.1,
the highest value in the array is in its proper place at the end of the array.
This means that the array indices that are greater than the value in the outer
loop are already in their proper place and the algorithm doesn’t need to access
these values any more.
The inner loop starts at the first element of the array and ends when it
gets to the next to last position in the array. The inner loop compares the
two adjacent positions indicated by inner and inner +1, swapping them if
necessary.
需要注意几点。
数组索引为outer,外层循环次数为outer,内层循环次数为inner
第一,交换两个数组元素的代码并没有写成子函数。交换的子函数可能会使程序变慢,
因为他要被调用很多次。而且交换的代码只有三行。清晰的代码是在牺牲不建立子函数的基础上的。
更重要的,注意外层循环是从数组末尾开始向开头前行的。
再看下图3.1,最大值是在数组里最后的位置上,这意味着当数组索引大于外层循环次数的值已经在合适的位置上,就不用再动它了。
内层循环从第一个值开始直到他去了合适的位置。内层循环用数组索引为inner和inner+1比较相邻的数字,如果有必要就交换。
Examining the Sorting Process
One of the things you will probably want to do while developing an algorithm
is viewing the intermediate results of the code while the program is running.
When you’re using Visual Studio.NET, it’s possible to do this using the Debugging
tools available in the IDE. However, sometimes, all you really want to see
is a display of the array (or whatever data structure you are building, sorting,
or searching). An easy way to do this is to insert a displaying method in the
appropriate place in the code.
For the aforementioned BubbleSort method, the best place to examine how
the array changes during the sorting is between the inner loop and the outer
loop. If we do this for each iteration of the two loops, we can view a record
of how the values move through the array while they are being sorted.
For example, here is the BubbleSort method modified to display intermediate
results:
考查过程
写一个算法时必须要做的事是当程序运行时马上看看结果。
当你用visualstudio.net时,可以用IDE的调试工具,有时,你需要看一下这个数组的值。
一个简单的方法就是在合适的地方加一个显示数据的函数。
冒泡法中最好的地方是在内层循环和外层循环中,如果我们每次迭代都显示的话,
我们就可以观察这些值是怎么排序。代码如下:
public void BubbleSort() {
int temp;
for(int outer = upper; outer >= 1; outer--) {
for(int inner = 0; inner <= outer-1;inner++) {
if ((int)arr[inner] > arr[inner+1]) {
temp = arr[inner];
arr[inner] = arr[inner+1];
arr[inner+1] = temp;
}
}
this.DisplayElements();
}
}
The DisplayElements() method is placed between the two For loops. If the
main program is modified as follows:
static void Main() {
CArray nums = new CArray(10);
Random rnd = new Random(100);
for(int i = 0; i < 10; i++)
nums.Insert((int)(rnd.NextDouble() * 100));
Console.WriteLine("Before sorting: ");
nums.DisplayElements();
Console.WriteLine("During sorting: ");
nums.BubbleSort();
Console.WriteLine("After sorting: ");
nums.DisplayElements();
}
Selection Sort
The next sort to examine is the Selection sort. This sort works by starting at
the beginning of the array, comparing the first element with the other elements
in the array. The smallest element is placed in position 0, and the sort then
begins again at position 1. This continues until each position except the last
position has been the starting point for a new loop.
Two loops are used in the SelectionSort algorithm. The outer loop moves
fromthe first element in the array to the next to last element, whereas the inner
loop moves from the second element of the array to the last element, looking
for values that are smaller than the element currently being pointed at by the
outer loop. After each iteration of the inner loop, the most minimum value
in the array is assigned to its proper place in the array. Figure 3.2 illustrates
how this works with the CArray data used before.
The code to implement the SelectionSort algorithm is shown as follows:
选择排序
此算法从数组的最前端进行,比较第一个元素和其他元素。
最小的放在位置0上,然后开始从位置1开始,直到每个位置都是期望值。
选择排序要用两个循环,外层循环从第一个元素开始依次移到到最后一个元素。
内层循环从第二个元素可始,找到比当前外层循环指向的小的值。
内层循环迭代完了后,数组中的最小值已经在了合适的位置上,图3.2所示。
代码实际如下:
public void SelectionSort() {
int min, temp;
for(int outer = 0; outer <= upper; outer++) {
min = outer;
for(int inner = outer + 1; inner <= upper; inner++)
if (arr[inner] < arr[min])
min = inner;
temp = arr[outer];
arr[outer] = arr[min];
arr[min] = temp;
}
}
72 54 59 30 31 78 2 77 82 72
2 54 59 30 31 78 72 77 82 72
2 30 59 54 31 78 72 77 82 72
2 30 31 54 59 78 72 77 82 72
2 30 31 54 59 78 72 77 82 72
2 30 31 54 59 78 72 77 82 72
2 30 31 54 59 72 78 77 82 72
2 30 31 54 59 72 72 77 82 78
2 30 31 54 59 72 72 77 82 78
2 30 31 54 59 72 72 77 78 82
Insertion Sort
The Insertion sort is an analog to the way we normally sort things numerically
or alphabetically. Let’s say that I have asked a class of students to turn in index
card with their names, id numbers, and a short biographical sketch. The
students return the cards in random order, but I want them to be alphabetized
so I can build a seating chart.
I take the cards back to my office, clear off my desk, and take the first card.
The name on the card is Smith. I place it at the top left position of the desk
and take the second card. It is Brown. I move Smith over to the right and
put Brown in Smith’s place. The next card is Williams. It can be inserted at
the right without having to shift any other cards. The next card is Acklin.
It has to go at the beginning of the list, so each of the other cards must be
shifted one position to the right to make room. That is how the Insertion sort
works.
The code for the Insertion sort is shown here, followed by an explanation
of how it works:
插入排序
插入排序是一种模拟我们平常按数字或字母排序的方法。
比如我让班上的同学们上交他们的名字,身份证号,以及座右铭。
学生交上来的卡的数字是乱序的,但是我想按字母排列一张坐次表。
我把卡片带回办公室,收拾好桌子,拿出第一张卡。卡上的名字是Smith.我把他放在最左上边。
第二张卡是Brown,我把Smith移到Brown的右边。接下来是Williams,这张不用换位置就放在最后。
再是Acklin.它要放在最前面。依次类推。这就是插入排序。
代码如下
public void InsertionSort() {
int inner, temp;
for(int outer = 1; outer <= upper; outer++) {
temp = arr[outer];
inner = outer;
while(inner > 0 && arr[inner-1] >= temp) {
arr[inner] = arr[inner-1];
inner -= 1;
}
arr[inner] = temp;
}
}
This display clearly shows that the Insertion sort works not by making
exchanges, but by moving larger array elements to the right to make room for
smaller elements on the left side of the array.
结果显示插入排序并不做很多交换,只是把大元素移到右边,把左边的位置空给小元素。
TIMING COMPARISONS OF THE BASIC SORTING
ALGORITHMS
These three sorting algorithms are very similar in complexity and theoretically,
at least, should perform similarly when compared with each other. We can
use the Timing class to compare the three algorithms to see if any of them
stand out from the others in terms of the time it takes to sort a large set of
numbers.
To perform the test, we used the same basic code we used earlier to
demonstrate how each algorithm works. In the following tests, however,
the array sizes are varied to demonstrate how the three algorithms perform
with both smaller data sets and larger data sets. The timing tests are run for
array sizes of 100 elements, 1,000 elements, and 10,000 elements. Here’s the
code:
时间比较各排序算法
这三种算法的复杂度和原理很相似,最后,时间上也应该表现得相似。
我们用那个时间类来比较三种算法在比较大数据集的情况。
分别是100,1000,10000个数据的情况
static void Main() {
Timing sortTime = new Timing();
Random rnd = new Random(100);
int numItems = 1000;
CArray theArray = new CArray(numItems);
for(int i = 0; i < numItems; i++)
theArray.Insert((int)(rnd.NextDouble() * 100));
sortTime.startTime();
theArray.SelectionSort();
sortTime.stopTime();
Console.WriteLine("Time for Selection sort: " +
sortTime.getResult().
TotalMilliseconds);
theArray.Clear();
for(int i = 0; i < numItems; i++)
theArray.Insert((int)(rnd.NextDouble() * 100));
sortTime.startTime();
theArray.BubbleSort();
sortTime.stopTime();
Console.WriteLine("Time for Bubble sort: " +
sortTime.getResult().
TotalMilliseconds);
theArray.Clear();
for(int i = 0; i < numItems; i++)
theArray.Insert((int)(rnd.NextDouble() * 100));
sortTime.startTime();
theArray.InsertionSort();
sortTime.stopTime();
Console.WriteLine("Time for Selection sort: " +
sortTime.getResult().
TotalMilliseconds);
}
The output from this program is:
Selection Sort:10.0144
Bubble Sort:10.0144
Insertion Sort:20.0288
showing that the Selection and Bubble sorts perform at the same speed and
the Insertion sort is about half as fast (or twice as slow).
100个元素时选择和冒泡是一样的。插入是慢两倍时间
Now let’s compare the algorithms when the array size is 1,000 elements:
Selection Sort:40.0576
Bubble Sort:500.72
Insertion Sort:871.2528
Here we see that the size of the array makes a big difference in the performance
of the algorithm. The Selection sort is over 100 times faster than the Bubble
sort and over 200 times faster than the Insertion sort.
1000个元素时区别很大。选择排序比冒泡快100倍,比选择快200倍。
When we increase the array size to 10,000 elements, we can really see the
effect of size on the three algorithms:
Selection Sort:2864
Bubble Sort:53607
Insertion Sort:84751
The performance of all three algorithms degrades considerably, though the
Selection sort is still many times faster than the other two. Clearly, none of
these algorithms is ideal for sorting large data sets. There are sorting algorithms,
though, that can handle large data sets more efficiently.We’ll examine
their design and use in Chapter 16.
10000个元素时选择排序还是最快。但是这些都不适合处理大数据。我们将在16章讨论其他更快更有效的算法。
SUMMARY
In this chapter, we discussed three algorithms for sorting data—the Selection
sort, the Bubble sort, and the Insertion sort. All of these algorithms are fairly
easy to implement and they all work well with small data sets. The Selection
sort is the most efficient of the algorithms, followed by the Bubble sort
and the Insertion sort. As we saw at the end of the chapter, none of these
algorithms is well suited for larger data sets (i.e., more than a few thousand
elements).
小结
这一章我们讨论了三种排序算法--选择,冒泡,插入。这些算法很容易实现并且适合小数据量的排序。
选择排序是最快的。接下来是冒泡,最后是插入。章末提到了,这三个算法不适合处理大数据量。