Searching for data is a fundamental computer programming task and one
that has been studied for many years. This chapter looks at just one aspect of
the search problem—searching for a given value in a list (array).
There are two fundamental ways to search for data in a list: the sequential
search and the binary search. Sequential search is used when the items in the
list are in random order; binary search is used when the items are sorted in
the list.
搜索数据是计算机程序的根本任务,也是被研究了很多年的课题。
本章看一下搜索问题的某一个方面--搜索给定大小的数组。
这里有两种方法:顺序查找和二分查找。当数组里的数据是乱序时用顺序查找,排好序时用二分。
SEQUENTIAL SEARCHING
The most obvious type of search is to begin at the beginning of a set of
records and move through each record until you find the record you are
looking for or you come to the end of the records. This is called a sequential
search.
A sequential search (also called a linear search) is very easy to implement.
Start at the beginning of the array and compare each accessed array element
to the value you’re searching for. If you find a match, the search is over. If you
get to the end of the array without generating a match, then the value is not
in the array.
Here is a function that performs a sequential search:
顺序查找
是从数据集地开头挨个地搜索,直到找到你想要的。顺序查找也叫线性查找,是一种非常容易实现的算法。
从数组开头依次比较每一具元素的值是不是想要的,如果找到了,查找结束。如果数组遍历完了都找不到,那说明值不在数组里。
以下是代码实现:
bool SeqSearch(int[] arr, int sValue) {
for (int index = 0; index < arr.Length-1; index++)
if (arr[index] == sValue)
return true;
return false;
}
If a match is found, the function immediately returns True and exits.
If the end of the array is reached without the function returning True,
then the value being searched for is not in array and the function returns
False.
Here is a program to test our implementation of a sequential search:
如果匹配了,那么函数立即返回true,如果遍历到数组末尾都没有返回true,则返回false.
这里有一个程序来测试上面的实现。
using System;
using System.IO;
public class Chapter4 {
static void Main() {
int [] numbers = new int[100];
StreamReader numFile =
File.OpenText("c:\\numbers.txt");
for (int i = 0; i < numbers.Length-1; i++)
numbers[i] =
Convert.ToInt32(numFile.ReadLine(), 10);
int searchNumber;
Console.Write("Enter a number to search for: ");
searchNumber = Convert.ToInt32(Console.ReadLine(),
10);
bool found;
found = SeqSearch(numbers, searchNumber);
if (found)
Console.WriteLine(searchNumber + " is in the
array.");
else
Console.WriteLine(searchNumber + " is not in the
array.");
Sequential Searching 57
}
static bool SeqSearch(int[] arr, int sValue) {
for (int index = 0; index < arr.Length-1; index++)
if (arr[index] == sValue)
return true;
return false;
}
}
The program works by first reading in a set of data from a text file. The data
consists of the first 100 integers, stored in the file in a partially random order.
The program then prompts the user to enter a number to search for and calls
the SeqSearch function to perform the search.
You can also write the sequential search function so that the function returns
the position in the array where the searched-for value is found or a −1 if the
value cannot be found. First, let’s look at the new function:
程序先从一个txt文本中读取一组数据,数据由100个整数构成,在文件中乱序存储。
然后程序提示用户输入一个要查找的数字。
你也可以写一个顺序查找的函数去返回值在数组中的位置
static int SeqSearch(int[] arr, int sValue) {
for (int index = 0; index < arr.Length-1; index++)
if (arr[index] == sValue)
return index;
return -1;
}
The following program uses this function:
using System;
using System.IO;
public class Chapter4 {
static void Main() {
int [] numbers = new int[100];
StreamReader numFile =_
File.OpenText("c:\\numbers.txt");
for (int i = 0; i < numbers.Length-1; i++)
numbers[i] = Convert.ToInt32(numFile.ReadLine(),
10);
58 BASIC SEARCHING ALGORITHMS
int searchNumber;
Console.Write("Enter a number to search for: ");
searchNumber = Convert.ToInt32(Console.ReadLine(),
10);
int foundAt;
foundAt = SeqSearch(numbers, searchNumber);
if (foundAt >= 0)
Console.WriteLine(searchNumber + " is in the_
array at position " + foundAt);
else
Console.WriteLine(searchNumber + " is not in the
array.");
}
static int SeqSearch(int[] arr, int sValue) {
for (int index = 0; index < arr.Length-1; index++)
if (arr[index] == sValue)
return index;
return -1;
}
}
The following program uses this function:
还有一个程序用这个函数
using System;
using System.IO;
public class Chapter4 {
static void Main() {
int [] numbers = new int[100];
StreamReader numFile =_
File.OpenText("c:\\numbers.txt");
for (int i = 0; i < numbers.Length-1; i++)
numbers[i] = Convert.ToInt32(numFile.ReadLine(),
10);
int searchNumber;
Console.Write("Enter a number to search for: ");
searchNumber = Convert.ToInt32(Console.ReadLine(),
10);
int foundAt;
foundAt = SeqSearch(numbers, searchNumber);
if (foundAt >= 0)
Console.WriteLine(searchNumber + " is in the_
array at position " + foundAt);
else
Console.WriteLine(searchNumber + " is not in the
array.");
}
static int SeqSearch(int[] arr, int sValue) {
for (int index = 0; index < arr.Length-1; index++)
if (arr[index] == sValue)
return index;
return -1;
}
}
Searching for Minimum and Maximum Values
Computer programs are often asked to search an array (or other data structure)
for minimum and maximum values. In an ordered array, searching for these
values is a trivial task. Searching an unordered array, however, is a little more
challenging.
Let’s start by looking at how to find the minimum value in an array. The
algorithm is:
1. Assign the first element of the array to a variable as the minimum value.
2. Begin looping through the array, comparing each successive array element
with the minimum value variable.
3. If the currently accessed array element is less than the minimum value,
assign this element to the minimum value variable.
4. Continue until the last array element is accessed.
5. The minimum value is stored in the variable.
Let’s look at a function, FindMin, which implements this algorithm:
查找最大值和最小值
计算机时常去一个数据集中的最大和最小值,在一个排序了的数组中,查找这些值是一件琐碎的任务。
查找一个乱序的数组,就有点挑战了。
让我们开始看看怎么找最小值吧。算法如下:
1,把数组中的第一个元素分配给一个变量做为最小值。
2,开始遍历数组,用那个最小值依次向后比较。
3,如果当前元素比最小值小的话,把当前元素的值分配给那个存储最小值的变量。
4,继续第3步,直到最后一个元素。
5,最小值就存在那个变量里。
以下是代码:
static int FindMin(int[] arr) {
int min = arr[0];
for(int i = 0; i < arr.Length-1; i++)
if (arr[index] < min)
min = arr[index];
return min;
}
Notice that the array search starts at position 1 and not at position 0. The
0th position is assigned as the minimum value before the loop starts, so we
can start making comparisons at position 1.
The algorithm for finding the maximum value in an array works in the same
way. We assign the first array element to a variable that holds the maximum
amount. Next we loop through the array, comparing each array element with
the value stored in the variable, replacing the current value if the accessed
value is greater. Here’s the code:
注意一下:查找是从数组的第1个位置开始的,而不是第0个。第0个位置在循环开始前被分配给了最小值的变量,
那样我们就可以从第1个位置开始比较了。
查找最大值的方法是一样的。代码如下:
static int FindMax(int[] arr) {
int max = arr[0];
for(int i = 0; i < arr.Length-1; i++)
if (arr[index] > max)
max = arr[index];
return max;
}
An alternative version of these two functions could return the position of
the maximum or minimum value in the array rather than the actual value.
这两个函数的替代版本可以返回最大值和最小值的索引。
Making Sequential Search Faster: Self-Organizing Data
The fastest successful sequential searches occur when the data element being
searched for is at the beginning of the data set. You can ensure that a successfully
located data item is at the beginning of the data set by moving it there
after it has been found.
The concept behind this strategy is that we can minimize search times
by putting frequently searched-for items at the beginning of the data set.
Eventually, all the most frequently searched-for data items will be located at
the beginning of the data set. This is an example of self-organization, in that
the data set is organized not by the programmer before the program runs, but
by the program while the program is running.
It makes sense to allow your data to organize in this way since the data being
searched probably follows the “80–20” rule, meaning that 80% of the searches
conducted on your data set are searching for 20% of the data in the data set.
Self-organization will eventually put that 20% at the beginning of the data set,
where a sequential search will find them quickly.
Probability distributions such as this are called Pareto distributions, named
for Vilfredo Pareto, who discovered these distributions studying the spread of
income and wealth in the late nineteenth century. See Knuth (1998, pp. 399–
401) for more on probability distributions in data sets.
We can modify our SeqSearch method quite easily to include selforganization.
Here’s a first stab at the method:
加速顺序查找:自组织数据
最快的顺序查找成功发生在当数据出现在数据集的开头时。
这种策略的背后是我们可以通过把被频繁搜索的数据项放在数据集开头来最小化查找次数。
事实上,最频繁搜索的数据项会被放在开头。这是一个自组织的例子。
数据集不是通过程序员在程序运行前调整,而是程序运行时调整。
这么做是有道理的,因为数据的查找遵循80-20原则,意思是80%的查找工作 查找着20%的数据。
这叫做帕雷托分布,因维弗雷多·帕雷托命名的。我们可以轻易地修改这个函数,加入自组织代码:
The problem with the SeqSearch method as we’ve modified it is that frequently
accessed items might be moved around quite a bit during the course
of many searches. We want to keep items that are moved to the first of the
data set there and not moved farther back when a subsequent item farther
down in the set is successfully located.
There are two ways we can achieve this goal. First, we can only swap found
items if they are located away from the beginning of the data set. We only
have to determine what is considered to be far enough back in the data set to
warrant swapping. Following the “80–20” rule again, we can make a rule that
a data item is relocated to the beginning of the data set only if its location is
outside the first 20% of the items in the data set. Here’s the code for this first
rewrite:
这个函数的问题是我们修改了频繁查找的数据项可能要在搜索的过程中要移动很多元素。
我们想保持移到开头的元素在那不动,并且当随后的元素定位时不移动它。
这里有两个方法让我们可以达到目的。一,我们只与开头的元素交换找到的元素。
我们只需要决定是不是足够远到需要交换。遵循8-2原则,我们可以制定一个规则看被搜索的数据项是否在20%的位置以外。
代码如下:
static int SeqSearch(int sValue) {
for(int index = 0; i < arr.Length-1; i++)
if (arr[index] == sValue && index > (arr.Length *_
0.2)) {
swap(index, index-1);
return index;
} else
if (arr[index] == sValue)
return index;
return -1;
}
The If–Then statement is short-circuited because if the item isn’t found in
the data set, there’s no reason to test to see where the index is in the data set.
The other way we can rewrite the SeqSearch method is to swap a found item
with the element that precedes it in the data set. Using this method, which
is similar to how data is sorted using the Bubble sort, the most frequently
accessed items will eventually work their way up to the front of the data set.
This technique also guarantees that if an item is already at the beginning of
the data set, it won’t move back down.
The code for this new version of SeqSearch is shown as follows:
if-then结构是短路的,因为如果数据项没有找到的话,没理由去测试看到底数据集里的索引在哪。
用其他方法重写这个算法,将找到的元素与前面的元素交换。用这个函数,就像冒泡法,最常存取的数据项会放在数据集的最前面。
这项技术也确保了如果一个数据项已经在数据集的开头,就不会移动到后面了。代码如下:
static int SeqSearch(int sValue) {
for(int index = 0; i < arr.Length-1; i++)
if (arr[index] == sValue) {
swap(index, index-1);
return index;
}
return -1;
}
Either of these solutions will help your searches when, for whatever reason,
you must keep your data set in an unordered sequence. In the next section, we
will discuss a search algorithm that is more efficient than any of the sequential
algorithms mentioned, but that only works on ordered data—the binary
search.
If the search is successful, the item found is switched with the element at
the first of the array using a swap function, shown as follows:
这些方案都可以解决问题,你必须保持数据集是乱序的。下一节讨论二分查找。
如果查找成功,找到的元素与第一个元素交换的函数如下:
static void swap(ref int item1, ref int item2) {
int temp = arr[item1];
arr[item1] = arr[item2];
arr[item2] = temp;
}
Binary Search
When the records you are searching through are sorted into order, you can
perform a more efficient search than the sequential search to find a value. This
search is called a binary search.
To understand how a binary search works, imagine you are trying to guess
a number between 1 and 100 chosen by a friend. For every guess you make,
the friend tells you if you guessed the correct number, or if your guess is too
high, or if your guess is too low. The best strategy then is to choose 50 as
the first guess. If that guess is too high, you should then guess 25. If 50 is to
low, you should guess 75. Each time you guess, you select a new midpoint
by adjusting the lower range or the upper range of the numbers (depending
on if your guess is too high or too low), which becomes your next guess.
As long as you follow that strategy, you will eventually guess the correct
number. Figure 4.1 demonstrates how this works if the number to be chosen
is 82.
We can implement this strategy as an algorithm, the binary search algorithm.
To use this algorithm, we first need our data stored in order (ascending,
preferably) in an array (though other data structures will work as well). The
first steps in the algorithm are to set the lower and upper bounds of the search.
At the beginning of the search, this means the lower and upper bounds of the
array. Then, we calculate the midpoint of the array by adding the lower and
upper bounds together and dividing by 2. The array element stored at this
position is compared to the searched-for value. If they are the same, the value
has been found and the algorithm stops. If the searched-for value is less than
the midpoint value, a new upper bound is calculated by subtracting 1 from the
midpoint. Otherwise, if the searched-for value is greater than the midpoint
value, a new lower bound is calculated by adding 1 to the midpoint. The
algorithm iterates until the lower bound equals the upper bound, which indicates
the array has been completely searched. If this occurs, a -1 is returned,
indicating that no element in the array holds the value being searched
for.
二分查找
当记录是已排序时,用二分查找更快。
原理是这样的:猜1到100间的数字时,每一次猜,都只知道大了还是小了。
最好的策略是第一次猜50,如果大了就猜25,不然猜75.每次只猜中间数。
用二分查找算法之前要保证数据集是已排序的。
第一步是设置上限和下限。然后通过除2求计算中间数。数组存储在这个中间数位置的元素与要查找的值比较。
如果相同,则结束。如果查找的值小于中间数位置元素,新上限是中间数减1。如果查找的值大于中间数中间数位置元素,新下限是中间数加1。
算法一直迭代到下限等于上限。这样数组被完全查找,返回失败。
static int binSearch(int value) {
int upperBound, lowerBound, mid;
upperBound = arr.Length-1;
lowerBound = 0;
while(lowerBound <= upperBound) {
mid = (upperBound + lowerBound) / 2;
64 BASIC SEARCHING ALGORITHMS
if (arr[mid] == value)
return mid;
else
if (value < arr[mid])
upperBound = mid - 1;
else
lowerBound = mid + 1;
}
return -1;
}
Here’s a program that uses the binary search method to search an array:
下面是用二分查找找一个数组。
static void Main(string[] args)
{
Random random = new Random();
CArray mynums = new CArray(9);
for(int i = 0; i <= 9; i++)
mynums.Insert(random.next(100));
mynums.SortArr();
mynums.showArray();
int position = mynums.binSearch(77, 0, 0);
if (position >= -1)
{
Console.WriteLine("found item");
mynums.showArray();
} else
Console.WriteLine("Not in the array");
Console.Read();
}
A Recursive Binary Search Algorithm
Although the version of the binary search algorithm developed in the previous
section is correct, it’s not really a natural solution to the problem. The
binary search algorithm is really a recursive algorithm because, by constantly
subdividing the array until we find the item we’re looking for (or run out of
room in the array), each subdivision is expressing the problem as a smaller
version of the original problem. Viewing the problem this ways leads us to
discover a recursive algorithm for performing a binary search.
In order for a recursive binary search algorithm to work, we have to make
some changes to the code. Let’s take a look at the code first and then we’ll
discuss the changes we’ve made:
递归二分查找算法
尽管前面的二分查找代码是对的,但不是一个真正的解决方案。
二分查找是一个递归算法,因为在不断地做除法,直到找到那个值。
第一个除法表达式是问题的一个子问题,
为了实现递归二分查找算法,需要重构代码:
public int RbinSearch(int value, int lower, int upper) {
if (lower > upper)
return -1;
else {
int mid;
mid = (int)(upper+lower) / 2;
if (value < arr[mid])
RbinSearch(value, lower, mid-1);
else if (value = arr[mid])
return mid;
else
RbinSearch(value, mid+1, upper)
}
}
The main problem with the recursive binary search algorithm, as compared
to the iterative algorithm, is its efficiency.Whena 1,000-element array is sorted
using both algorithms, the recursive algorithm is consistently 10 times slower
than the iterative algorithm:
递归算法的问题是当有1000个元素时比迭代算法慢10倍。
Of course, recursive algorithms are often chosen for other reasons than efficiency,
but you should keep in mind that anytime you implement a recursive
algorithm, you should also look for an iterative solution so that you can
compare the efficiency of the two algorithms.
Finally, beforewe leave the subject of binary search, we should mention that
the Array class has a built-in binary search method. It takes two arguments,
an array name and an item to search for, and it returns the position of the item
in the array, or -1 if the item can’t be found.
To demonstrate how the method works, we’ve written yet another binary
search method for our demonstration class. Here’s the code:
当然,递归算法常常因为非效率原因被选用。但是你要保证自己把递归算法的实现铭记在心。
你还应该找一个迭代的方案那样可以比较一下效率。
最后,完结之前,我们应该提及那个Array有内建的二分查找方法,有两个参数,数组名和要找的元素,返回元素的位置或-1.
展示算法的流程,我们写了另一个二分查找方案类,代码如下:
public int Bsearh(int value) {
return Array.BinarySearch(arr, value)
}
When the built-in binary search method is compared with our custombuilt
method, it consistently performs 10 times faster than the custom-built
method, which should not be surprising. A built-in data structure or algorithm
should always be chosen over one that is custom-built, if the two can be used
in exactly the same ways.
当内建的二分查找方法与我们自己写的方法比较时,内建的要快10倍。
不用吃惊,内建的数据结构或算法肯定比我们自己写的快了,平时也要首选。
SUMMARY
Searching a data set for a value is a ubiquitous computational operation. The
simplest method of searching a data set is to start at the beginning and search
for the item until either the item is found or the end of the data set is reached.
This searching method works best when the data set is relatively small and
unordered.
If the data set is ordered, the binary search algorithm is a better choice.
Binary search works by continually subdividing the data set until the item
being searched for is found. You can write the binary search algorithm using
both iterative and recursive codes. The Array class in C# includes a built-in
binary search
小结
查找数据是无处无在的操作,最简单的方法是逐个遍历。这个办法是在数据量小且乱序的情况下。
如果是已排序的,二分查找要好些。可以用迭代和递归两种方法实现代码。Array类也实现了这个功能。
static bool SeqSearch(int sValue) {
for(int index = 0; i < arr.Length-1; i++)
if (arr[index] == sValue) {
swap(index, index-1);
return true;
}
return false;
}
If the search is successful, the item found is switched with the element at
the first of the array using a swap function, shown as follows:
static void swap(ref int item1, ref int item2) {
int temp = arr[item1];
arr[item1] = arr[item2];
arr[item2] = temp;
}