C#数据结构和算法[An Introduction to Collections, Generics, and the Timing Class]
Posted on 2010-06-27 23:41 淡如水wp 阅读(791) 评论(0) 编辑 收藏 举报An Introduction to Collections, Generics,and the Timing Class
集合,泛型,时间类的介绍
This book discusses the development and implementation of data structures
and algorithms using C#. The data structures we use in this book are found
in the .NET Framework class library System.Collections. In this chapter, we
develop the concept of a collection by first discussing the implementation of
our own Collection class (using the array as the basis of our implementation)
and then by covering the Collection classes in the .NET Framework.
本书讨论地是用C#开发或实现数据结构及算法,书中提及的数据结构都可以在System.Collections里找到。
本章,我们首先用自己的方式实现集合(用数组),然后再说.net Framework。
An important addition to C# 2.0 is generics. Generics allow the C# programmer
to write one version of a function, either independently or within a
class, without having to overload the function many times to allow for different
data types. C# 2.0 provides a special library, System.Collections.Generic,
that implements generics for several of the System.Collections data structures.
This chapter will introduce the reader to generic programming.
Finally, this chapter introduces a custom-built class, theTiming class, which
we will use in several chapters to measure the performance of a data structure
and/or algorithm. This class will take the place of Big O analysis, not because
Big O analysis isn’t important, but because this book takes a more practical
approach to the study of data structures and algorithms.
泛型是C#2.0的一个新特性,泛型允许程序员只需写一个函数(不管是独立地还是成员函数),不用重载很多次去适应各种数据类型。
C#2.0提供了一个库,System.Collections.Generic,它实现了一些集合类,本章介绍如何泛型编程。最后,本章介绍一个自定义类,
时间相关的类,用来代替O方法来测试算法的效率,不是说O方法不重要,是因为本书更注重实际的结果。
COLLECTIONS DEFINED
集合的定义
A collection is a structured data type that stores data and provides operations
for adding data to the collection, removing data from the collection, updating
data in the collection, as well as operations for setting and returning the values
of different attributes of the collection.
Collections can be broken down into two types: linear and nonlinear. A
linear collection is a list of elements where one element follows the previous
element. Elements in a linear collection are normally ordered by position
(first, second, third, etc.). In the real world, a grocery list is a good example
of a linear collection; in the computer world (which is also real), an array is
designed as a linear collection.
Nonlinear collections hold elements that do not have positional order
within the collection. An organizational chart is an example of a nonlinear
collection, as is a rack of billiard balls. In the computer world, trees, heaps,
graphs, and sets are nonlinear collections.
集合是一种存储数据并提供一些诸如增,删,查,改操作的数据结构。集合分线性和非线性的。
线性集合是一串连续的元素,元素间以位置排序。比如现实生活中的小卖部,比如计算机中的数组。
非线性集合内的元素位置并不一定连续,比如现实生活中的组织结构图,比如计算机中的树,堆,图,set。
Collections, be they linear or nonlinear, have a defined set of properties that
describe them and operations that can be performed on them. An example
of a collection property is the collections Count, which holds the number of
items in the collection. Collection operations, called methods, include Add
(for adding a new element to a collection), Insert (for adding a new element
to a collection at a specified index), Remove (for removing a specified element
from a collection), Clear (for removing all the elements from a collection),
Contains (for determining if a specified element is a member of a collection),
and IndexOf (for determining the index of a specified element in a
collection).
不管线性非线性集合,会定义一些属性和方法,比如count,保存集合内元素的个数,
比如添加,插入(在特定位置添加),删除,清空,查找是否存在,查找元素位置。
COLLECTIONS DESCRIBED
Within the two major categories of collections are several subcategories.
Linear collections can be either direct access collections or sequential access
collections, whereas nonlinear collections can be either hierarchical or
grouped. This section describes each of these collection types.
集合的描述
两个主要的集合分类下还有一些小分类。
线性集合可以是直接存储集合或序列化存储集合,非线性集合可以是层级或分组。本节将逐一讨论。
Direct Access Collections
The most common example of a direct access collection is the array.We define
an array as a collection of elements with the same data type that are directly
accessed via an integer index, as illustrated in Figure 1.1.
Item ø Item 1 Item 2 Item 3 . . . Item j Item n−1
FIGURE 1.1. Array.
直接存储集合
最常见的直接存储集合就是数组,用同样的数据类型定义一个数组时可以直接通过整数的索引来存取。
像 Item ø Item 1, Item 2, Item 3, . . . Item j, Item n−1
Arrays can be static so that the number of elements specified when the array
is declared is fixed for the length of the program, or they can be dynamic, where
the number of elements can be increased via the ReDim or ReDim Preserve
statements.
In C#, arrays are not only a built-in data type, they are also a class. Later
in this chapter, when we examine the use of arrays in more detail, we will
discuss how arrays are used as class objects.
数组可以是静态的,这样定义数组时需要指定数组的长度。也可以是动态的,可以通过redim来增加元素。
C#里。数组不只是一个内建数据类型,还是一个类,后面的章节就讨论做为类时是如何使用的。
We can use an array to store a linear collection. Adding new elements to an
array is easy since we simply place the new element in the first free position
at the rear of the array. Inserting an element into an array is not as easy (or
efficient), since we will have to move elements of the array down in order
to make room for the inserted element. Deleting an element from the end of
an array is also efficient, since we can simply remove the value from the last
element. Deleting an element in any other position is less efficient because,
just as with inserting, we will probably have to adjust many array elements
up one position to keep the elements in the array contiguous.We will discuss
these issues later in the chapter. The .NET Framework provides a specialized
array class, ArrayList, for making linear collection programming easier. We
will examine this class in Chapter 3.
数组可以存储线性集合,数组内添加元素很简单,就需要把新元素放到数组末尾的第一个空位置上。
插入元素倒没这么方便,因为需要移动很多元素去给新元素腾出位置来。从末尾删除元素很方便,只需要移除最后一个即可,
但是从不具体的位置删除就稍麻烦了,原理同插入元素一样,不多解释。
NET Framework专门提供了一个数组类,ArrayList,能为我们操作线性集合更简单,具体留到第三章说
Another type of direct access collection is the string. A string is a collection
of characters that can be accessed based on their index, in the same manner we
access the elements of an array. Strings are also implemented as class objects
in C#. The class includes a large set of methods for performing standard
operations on strings, such as concatenation, returning substrings, inserting
characters, removing characters, and so forth.We examine the String class in
Chapter 8.
另一个直接存储集合的类型是string,一个string对象是被索引的字符串的集合,我们可以像数组一样操作string。
C#也把string现成类了。类里有大量的方法来操作字符串,比如连接,返回字串,插入字符,删除字符,具体留到第八章说
C# strings are immutable, meaning once a string is initialized it cannot be
changed. When you modify a string, a copy of the string is created instead of
changing the original string. This behavior can lead to performance degradation
in some cases, so the .NET Framework provides a StringBuilder class that
enables you to work with mutable strings.We’ll examine the StringBuilder in
Chapter 8 as well.
C#里的string是不可变的,意思是一旦这个string对象初使化了值就不能改变了。
当你想要修改这个string对象时,其实是创建了这个string对象的副本,这样使得效率就降低了。
所以.net提供了一个StringBuilder类来操作可变的字符串。同样,具体的第八章再细说。
The final direct access collection type is the struct (also called structures
and records in other languages). A struct is a composite data type that holds
data that may consist of many different data types. For example, an employee
record consists of employee’ name (a string), salary (an integer), identification
number (a string, or an integer), as well as other attributes. Since storing each
of these data values in separate variables could become confusing very easily,
the language provides the struct for storing data of this type.
A powerful addition to the C# struct is the ability to define methods for
performing operations stored on the data in a struct. This makes a struct
somewhat like a class, though you can’t inherit or derive a new type from
a structure. The following code demonstrates a simple use of a structure
in C#:
最后一个直接存储集合是struct(在其他语言里貌似叫结构体和记录)。struct是一个包含了很多不同数据类型的复杂数据类型。
比如,一条雇员记录包含 雇员姓名(string型),薪水(int型),工号(string或int型),等等。如果把这些变量分开存储的话很容易搞混。
所以C#提供了struct。C#里的struct的新特性是可以在struct里面定义方法.这样一来struct就有点像class了,但是不能从struct继承或派生新struct.
下面是一个struct的例子
using System;
public struct Name
{
private string fname, mname, lname;
public Name(string first, string middle, string last){
fname = first;
mname = middle;
lname = last;
}
public string firstName{
get{
return fname;
}
set{
fname = firstName;
}
}
public string middleName {
get {
return mname;
}
set {
mname = middleName;
}
}
public string lastName {
get {
return lname;
}
set {
lname = lastName;
}
}
public override string ToString() {
return (String.Format("{0} {1} {2}", fname, mname,
lname));
}
public string Initials() {
return (String.Format("{0}{1}{2}",fname.Substring(0,1),
mname.Substring(0,1), lname.Substring(0,1)));
}
}
("My initials are {0}.", inits);
}
}
}
}
Sequential Access Collections
A sequential access collection is a list that stores its elements in sequential
order. We call this type of collection a linear list. Linear lists are not limited
by size when they are created, meaning they are able to expand and contract
dynamically. Items in a linear list are not accessed directly; they are referenced
by their position, as shown in Figure 1.2. The first element of a linear list is
at the front of the list and the last element is at the rear of the list.
Because there is no direct access to the elements of a linear list, to access an
element you have to traverse through the list until you arrive at the position
of the element you are looking for. Linear list implementations usually allow
two methods for traversing a list—in one direction from front to rear, and
from both front to rear and rear to front.
序列化存储集合
序列化存储集合是一串排好序的元素的集合,称之为线性list。线性list不限制元素个数,意思是可以动态扩大。
线性list里的元素不能直接存取的。
1st, 2nd,3rd, Nth
front rear
第一个元素在最前面,最后一个元素在末尾。因为不能直接存取,所以要取某一个元素的话,就得遍历这个list,直到找到为止。
线性list的实现一直有两种方式来遍历:单向的只能从头到尾,或者双向都行。
当商店被list里的元素
A simple example of a linear list is a grocery list. The list is created by
writing down one item after another until the list is complete. The items are
removed from the list while shopping as each item is found.
Linear lists can be either ordered or unordered. An ordered list has values
in order in respect to each other, as in:
Beata Bernica David Frank Jennifer Mike Raymond Terrill
An unordered list consists of elements in any order. The order of a list makes
a big difference when performing searches on the data on the list, as you’ll see
in Chapter 2 when we explore the binary search algorithm versus a simple
linear search.
FIGURE 1.3. Stack Operations.
一个简单地例子是杂货店。
Some types of linear lists restrict access to their data elements. Examples
of these types of lists are stacks and queues. A stack is a list where access is
restricted to the beginning (or top) of the list. Items are placed on the list
at the top and can only be removed from the top. For this reason, stacks are
known as Last-in, First-out structures. When we add an item to a stack, we
call the operation a push. When we remove an item from a stack, we call that
operation a pop. These two stack operations are shown in Figure 1.3.
The stack is a very common data structure, especially in computer systems
programming. Stacks are used for arithmetic expression evaluation and for
balancing symbols, among its many applications.
一些线性list对存取元素会做一些限制,比如栈和队列,栈是只能在顶部存取的list.
所以,栈被叫做后进先出的结构。当添加元素时,叫做push,删除时,叫pop。
Bernica Push Bernica Pop
David David
Raymond Raymond
Mike Mike
栈是很常见的数据结构,尤其在系统编程时,栈被用来计算数学表达式和运算符号。
A queue is a list where items are added at the rear of the list and removed
from the front of the list. This type of list is known as a First-in, First-out structure.
Adding an item to a queue is called an EnQueue, and removing an item
from a queue is called a Dequeue. Queue operations are shown in Figure 1.4.
Queues are used in both systems programming, for scheduling operating
system tasks, and for simulation studies. Queues make excellent structures
for simulating waiting lines in every conceivable retail situation. A special
type of queue, called a priority queue, allows the item in a queue with the
highest priority to be removed from the queue first. Priority queues can be
used to study the operations of a hospital emergency room, where patients
with heart trouble need to be attended to before a patient with a broken arm,for example.
队列是一个只能在末尾删除在前面添加的list,被称为先进先出的结构,添加元素叫做 入列,删除元素叫做 出列。
队列也常用在系统编程里,比如任务调度,模拟研究。队列可以很好地模拟一些可以想象的情况。
有一种队列比较特别,叫优先队列,允许队列中优先级高的元素先出列,例如优先队列被用来研究医院急救室的操作。
当有心脏病的人需要比外伤病人优先治疗。
The last category of linear collections we’ll examine are called generalized
indexed collections. The first of these, called a hash table, stores a set of data
values associated with a key. In a hash table, a special function, called a hash
function, takes one data value and transforms the value (called the key) into
an integer index that is used to retrieve the data. The index is then used to
access the data record associated with the key. For example, an employee
record may consist of a person’s name, his or her salary, the number of years
the employee has been with the company, and the department he or she works
in. This structure is shown in Figure 1.5. The key to this data record is the
employee’s name. C# has a class, called HashTable, for storing data in a hash
table. We explore this structure in Chapter 10.
最后一个线性集合是泛型索引集合。首先是hash表,有一个函数叫hash函数,把数值(叫做key)转换成整数的索引去关联数据。
这个整数索引用来存取key关联的数据记录。举个例子,一条雇员记录包含姓名,工资,工龄,部门。这条记录的key就是雇员姓名。
C#有一个类叫HashTable,在第10章再细说。
Another generalized indexed collection is the dictionary. A dictionary is
made up of a series of key–value pairs, called associations. This structure
is analogous to a word dictionary, where a word is the key and the word’s
definition is the value associated with the key. The key is an index into the
value associated with the key. Dictionaries are often called associative arrays
because of this indexing scheme, though the index does not have to be an
integer. We will examine several Dictionary classes that are part of the .NET
Framework in Chapter 11.
另一个泛型的索引集合是dictionary。dictionary是一些键(key)值(value)对的集合,这个数据结构就像个字典,
单词是key,解释是value,key就是索引。因为索引的方式,dictionary常常被叫做组合数组。但是索引不用是整数,具体的在11章细说。
Hierarchical Collections
Nonlinear collections are broken down into two major groups: hierarchical
collections and group collections. A hierarchical collection is a group of items
divided into levels. An item at one level can have successor items located at
the next lower level.
One common hierarchical collection is the tree. A tree collection looks like
an upside-down tree, with one data element as the root and the other data
values hanging below the root as leaves. The elements of a tree are called
nodes, and the elements that are below a particular node are called the node’s
children. A sample tree is shown in Figure 1.6.
Trees have applications in several different areas. The file systems of most
modern operating systems are designed as a tree collection, with one directory
as the root and other subdirectories as children of the root.
层级集合
非线性集合被分成两组,层级集合和分组集合,层级集合是按级别分组的。某一层的元素在下一级有继承者。
一个常见的层级集合是树。树看起来乱七八糟的,在一个根结点上挂了很多了子结点。
树有不同的领域有不同的应用。在大多数现代操作系统里文件系统被设计成树结构。目录做为根结点,子目录做为子结点。
A binary tree is a special type of tree collection where each node has no
more than two children. A binary tree can become a binary search tree, making
searches for large amounts of data much more efficient. This is accomplished
by placing nodes in such a way that the path from the root to a node where
the data is stored is along the shortest path possible.
Yet another tree type, the heap, is organized so that the smallest data value
is always placed in the root node. The root node is removed during a deletion,
and insertions into and deletions from a heap always cause the heap to reorganize
so that the smallest value is placed in the root. Heaps are often used
for sorts, called a heap sort. Data elements stored in a heap can be kept sorted
by repeatedly deleting the root node and reorganizing the heap.
Several different varieties of trees are discussed in Chapter 12.
二叉树是一种特别的树,每个结点至多有两个子结点,二叉树可以变成二叉搜索树,搜索大量数据时比较高效。
这是这样子设计来存储数据的:从根结点到某一个结点的path尽可能短。
还有些其他的树,比如 堆,最小的数据在根结点,删除操作会移除根结点,插入或删除操作会重新组织堆,让最小值放在根结点。
堆常用来排序,叫做堆排序,通过不停地删除根结点和重新组织堆来排序。其他的树第12章再说。
A nonlinear collection of items that are unordered is called a group. The three
major categories of group collections are sets, graphs, and networks.
A set is a collection of unordered data values where each value is unique.
The list of students in a class is an example of a set, as is, of course, the integers.
Operations that can be performed on sets include union and intersection.
非线性集合的元素是没有排序的,叫做聚合(group),三种聚合是set,图(graph),网络(network).
set是无序且无重复元素的聚合,全班同学是一个set的例子。当然,整数也可以的。set可以求并或交。
An example of set operations is shown in Figure 1.7.
A graph is a set of nodes and a set of edges that connect the nodes. Graphs
are used to model situations where each of the nodes in a graph must be visited,
sometimes in a particular order, and the goal is to find the most efficient way
to “traverse” the graph. Graphs are used in logistics and job scheduling and
are well studied by computer scientists and mathematicians. You may have
heard of the “Traveling Salesman” problem. This is a particular type of graph
problem that involves determining which cities on a salesman’s route should
be traveled in order to most efficiently complete the route within the budget
allowed for travel.
图是点和这些通过点相连的边的聚合。图被用来模拟图中各个点都要被遍历的情形时,用特定的顺序,
目的是用最高效的方法去遍历图。图被用在后勤和工作的调度上,在计算机科学和数学上研究得很多。
你应该听过遗传算法(旅行推销员算法),这就是一个图的算法:以怎样的顺序以最少地预算走完所有的城市。
A sample graph of this problem is shown in Figure 1.8.
This problem is part of a family of problems known as NP-complete problems.
This means that for large problems of this type, an exact solution is not
known. For example, to find the solution to the problem in Figure 1.8, 10
factorial tours, which equals 3,628,800 tours. If we expand the problem to
100 cities, we have to examine 100 factorial tours, which we currently cannot
do with current methods. An approximate solution must be found instead.
这个问题是NP-complete问题集中的一部分,意思是解决问题的方法有很多种,并没有确定的解决方案。
举个例子,10的阶乘=3628800.如果把这个问题扩展到100个城市,我们需要算100的阶乘,这样现在的方法就不行了,
需要有个类似的方法来替代。
A network is a special type of graph where each of the edges is assigned a
weight. The weight is associated with a cost for using that edge to move from
one node to another. Figure 1.9 depicts a network of cities where the weights
are the miles between the cities (nodes).
网络(network)是一种特别的图,这种图的每条边有权重。
We’ve now finished our tour of the different types of collections we are going
to discuss in this book. Now we’re ready to actually look at how collections
are implemented in C#.We start by looking at how to build a Collection class
using an abstract class from the .NET Framework, the CollectionBase class.
现在介绍完将就要在书中讨论的结构了,接下来看C#里是怎么实现的,首先看一下怎么建立一个抽象类,这些集合类的基类。
THE COLLECTIONBASE CLASS
The .NET Framework library does not include a generic Collection class
for storing data, but there is an abstract class you can use to build your
own Collection class—CollectionBase. The CollectionBase class provides the
programmer with the ability to implement a custom Collection class. The
class implicitly implements two interfaces necessary for building a Collection
class, ICollection and IEnumerable, leaving the programmer with having to
implement just those methods that are typically part of a Collection class.
集合类的基类
.net库里没有用来存储数据的泛型集合类,但是有一个抽象类CollectionBase,可以通过它来实现自己的集合类。
CollectionBase可以让程序员自己定义集合类,实现集合类需要隐式实现两个接口,ICollection和IEnumerable。其他的方法自己按需要实现。
A Collection Class Implementation Using ArrayLists
In this section, we’ll demonstrate how to use C# to implement our own Collection
class. This will serve several purposes. First, if you’re not quite up
to speed on object-oriented programming (OOP), this implementation will
show you some simple OOP techniques in C#.We can also use this section to
discuss some performance issues that are going to come up as we discuss the
different C# data structures. Finally, we think you’ll enjoy this section, as well
as the other implementation sections in this book, because it’s really a lot of
fun to reimplement the existing data structures using just the native elements
of the language. As Don Knuth (one of the pioneers of computer science)
says, to paraphrase, you haven’t really learned something well until you’ve
taught it to a computer. So, by teaching C# how to implement the different
data structures, we’ll learn much more about those structures than if we just
choose to use the classes from the library in our day-to-day programming.
用ArrayLists实现集合类
本节讨论如何实现自己的集合类,这样做是为了:首先如果你还不了解OOP,这个实现可以向你展示简单的OOP技术。
其次还会讨论一些其他数据结构的性能问题。最后,你会发现重新实现本地语言既有的数据结构的乐趣。 Don Knuth说过:
只会套用,说明你还没有真正学会它。所以,我们要用这些类的同时要知道他们是如何实现的。
Defining a Collection Class
The easiest way to define a Collection class in C# is to base the class on an
abstract class already found in the System.Collections library—the Collection-
Base class. This class provides a set of abstract methods you can implement
to build your own collection. The CollectionBase class provides an underlying
data structure, InnerList (an ArrayList), which you can use as a base for
your class. In this section, we look at how to use CollectionBase to build a
Collection class.
定义一个集合类
定义一个集合类最简单的方法是继承System.Collections里的CollectionBase类。
这个类提供一些你可以实现的抽象方法。CollectionBase类提供了一个底层的数据结构:InnerList。
我们来看一下如何用CollectionBase来建一个集合类。
Implementing the Collection Class
The methods that will make up the Collection class all involve some type of
interaction with the underlying data structure of the class—InnerList. The
methods we will implement in this first section are the Add, Remove, Count,
and Clear methods. These methods are absolutely essential to the class, though
other methods definitely make the class more useful.
Let’s start with the Add method. This method has one parameter – an
Object variable that holds the item to be added to the collection. Here is the
code:
实现集合类
这些将要创建的方法涉及InnerList类的底层数据结构。这些要实现的方法有添加,删除,计数,清空。
这些方法是绝对必要的,其他方法锦上添花。先说添加方法,只有一个入参:要添加进去的对象。
代码如下
public void Add(Object item){
InnerList.Add(item);
}
ArrayLists store data as objects (the Object data type), which is why we
have declared item as Object. You will learn much more about ArrayLists
in Chapter 2.
The Remove method works similarly:
ArrayLists用object存储数据,删除方法如下
public void Remove(Object item) {
InnerList.Remove(item);
}
The next method is Count. Count is most often implemented as a property,
but we prefer to make it a method. Also, Count is implemented in the
of Count found in CollectionBase:
接下来是计数,计数通常以属性方式实现,但是这里做成方法。
public new int Count() {
return InnerList.Count;
}
The Clear method removes all the items from InnerList. We also have to use
the new keyword in the definition of the method:
清空方法在InnerList里删除所有元素。这里还用这个关键字。
public new void Clear() {
InnerList.Clear();
}
This is enough to get us started. Let’s look at a program that uses the
Collection class, along with the complete class definition:
可以开始了,来看看这个类。
There are several other methods you can implement in order to create a
more useful Collection class. You will get a chance to implement some of
these methods in the exercises.
Generic Programming
One of the problems with OOP is a feature called “code bloat.” One type of
code bloat occurs when you have to override a method, or a set of methods,
to take into account all of the possible data types of the method’s parameters.
One solution to code bloat is the ability of one value to take on multiple data
types, while only providing one definition of that value. This technique is
called generic programming.
A generic program provides a data type “placeholder” that is filled in by a
specific data type at compile-time. This placeholder is represented by a pair
of angle brackets (< >), with an identifier placed between the brackets. Let’s
look at an example.
A canonical first example for generic programming is the Swap function.
Here is the definition of a generic Swap function in C#:
泛型编程
OOP的一个问题叫做“代码膨胀”,一种情况发生在你要重载一个方法,或一些方法时,
using System;
using System.Collections;
public class Collection : CollectionBase<T> {
public void Add(Object item) {
InnerList.Add(item);
}
public void Remove(Object item) {
InnerList.Remove(item);
}
public new void Clear() {
InnerList.Clear();
}
public new int Count() {
return InnerList.Count;
}
}
class chapter1 {
static void Main() {
Collection names = new Collection();
names.Add("David");
names.Add("Bernica");
names.Add("Raymond");
names.Add("Clayton");
foreach (Object name in names)
Console.WriteLine(name);
Console.WriteLine("Number of names: " + names.
Count());
names.Remove("Raymond");
Console.WriteLine("Number of names: " + names.
Count());
names.Clear();
Console.WriteLine("Number of names: " + names.
Count());
}
}
要考虑该方法的参数的所有可能的数据类型,一个解决方法是用一个值来替代所有可能的数据类型。
只需要提供一种定义,这种技术叫泛型编程。
泛型程序提供一个在编译时数据类型“占位符”,这个占位符是一对尖括号(<>),在尖括里有一个标识符。
来看一个例子。一个典型的例子是第一个泛型编程的交换功能。
static void Swap<T>(ref T val1, ref T val2) {
T temp;
temp = val1;
val1 = val2;
val2 = temp;
}
The placeholder for the data type is placed immediately after the function
name. The identifier placed inside the angle brackets is now used whenever a
generic data type is needed. Each of the parameters is assigned a generic data
type, as is the temp variable used to make the swap. Here’s a program that
tests this code:
点位符在函数名之后,标识符在尖括号内,每个参数指定一个泛型数据类型,临时变量用来交换。程序如下
using System;
class chapter1 {
static void Main() {
int num1 = 100;
int num2 = 200;
Console.WriteLine("num1: " + num1);
Console.WriteLine("num2: " + num2);
Swap<int>(ref num1, ref num2);
Console.WriteLine("num1: " + num1);
Console.WriteLine("num2: " + num2);
string str1 = "Sam";
string str2 = "Tom";
Console.WriteLine("String 1: " + str1);
Console.WriteLine("String 2: " + str2);
Swap<string>(ref str1, ref str2);
Console.WriteLine("String 1: " + str1);
Console.WriteLine("String 2: " + str2);
}
static void Swap<T>(ref T val1, ref T val2) {
T temp;
temp = val1;
val1 = val2;
val2 = temp;
}
}
The output from this program is:
Generics are not limited to function definitions; you can also create generic
classes. A generic class definition will contain a generic type placeholder after
the class name. Anytime the class name is referenced in the definition, the type
placeholder must be provided. The following class definition demonstrates
how to create a generic class:
泛型不仅可以用在函数定义,也可以创建泛型类,泛型类的定义在类名后包含一个占位符,
引用时类名是必须的,类型占位符也必须要。
public class Node<T> {
T data;
Node<T> link;
public Node(T data, Node<T> link) {
this.data = data;
this.link = link;
}
}
This class can be used as follows:
这个类可以这样用
Node<string> node1 = new Node<string>("Mike", null);
Node<string> node2 = new Node<string>("Raymond", node1);
We will be using the Node class in several of the data structures we examine
in this book.
While this use of generic programming can be quite useful, C# provides a
library of generic data structures already ready to use. These data structures
are found in the System.Collection.Generics namespace and when we discuss
a data structure that is part of this namespace, we will examine its use. Generally,
though, these classes have the same functionality as the nongeneric data
structure classes, so we will usually limit the discussion of the generic class
to how to instantiate an object of that class, since the other methods and their
use are no different.
泛型编程很有用,C#提供了一个泛型数据结构的库。可以在System.Collection.Generics里找到。
虽然这些类和非泛型类有相同的功能,所以我们只讨论如何实例化一个对象,因为其他方法的用法不同。
Timing Tests
Because this book takes a practical approach to the analysis of the data structures
and algorithms examined, we eschew the use of BigOanalysis, preferring
instead to run simple benchmark tests that will tell us how long in seconds
(or whatever time unit) it takes for a code segment to run.
Our benchmarks will be timing tests that measure the amount of time it
takes an algorithm to run to completion. Benchmarking is as much of an art
as a science and you have to be careful how you time a code segment in order
to get an accurate analysis. Let’s examine this in more detail.
时间测试
因为本书侧重于研究数据结构和算法的实用性,所以避开了大O分析法,宁愿用简单的运行时间来测试效率。
An Oversimplified Timing Test
First, we need some code to time. For simplicity’s sake, we will time a subroutine
that writes the contents of an array to the console. Here’s the code:
一个简单地时间测试
首先,需要写一时间相关的代码,简单期间,我们测试把数组内容的子程序输出控制台的时间。代码如下:
static void DisplayNums(int[] arr) {
for(int i = 0; i <= arr.GetUpperBound(0); i++)
Console.Write(arr[i] + " ");
}
The array is initialized in another part of the program, which we’ll examine
later.
To time this subroutine, we need to create a variable that is assigned the
system time just as the subroutine is called, and we need a variable to store
the time when the subroutine returns. Here’s how we wrote this code:
这个数组的实例化在程序的另一部分里,后面将会测试。
测试这个子程序的运行时间,需要建一个变量来存放子程序调用时的系统时间。
然后还需要一个变量来存储子程序返回时的时间,下记是代码。
DateTime startTime;
TimeSpan endTime;
startTime = DateTime.Now;
endTime = DateTime.Now.Subtract(startTime);
Running this code on my laptop (running at 1.4 mHz on Windows XP
Professional), the subroutine ran in about 5 seconds (4.9917). Although this
code segment seems reasonable for performing a timing test, it is completely
inadequate for timing code running in the .NET environment. Why?
First, the code measures the elapsed time from when the subroutine was
called until the subroutine returns to the main program. The time used by
other processes running at the same time as the C# program adds to the time
being measured by the test.
Second, the timing code doesn’t take into account garbage collection performed
in the .NET environment. In a runtime environment such as .NET,
the system can pause at any time to perform garbage collection. The sample
timing code does nothing to acknowledge garbage collection and the resulting
time can be affected quite easily by garbage collection. So what do we do
about this?
这串代码在我的电脑(1.4GHZ的XP下)运行了5秒。虽然这串代码的表现看起来还算合理,
但是在.net环境下,完全不行。为啥?首先,代码测试的时间是从子程序被调用到返回给main程序,
其他进程运行的时候也被加到此次测试的时间里了。其次,时间测试代码没有考虑到.net的垃圾回收。
在.net运行环境里,系统随时有可能停下来进行垃圾回收,这个例子没有考虑到垃圾回收的影响,那怎么办呢?
Timing Tests for the .NET Environment
In the .NET environment, we need to take into account the thread our program
is running in and the fact that garbage collection can occur at any time. We
need to design our timing code to take these facts into consideration.
Let’s start by looking at how to handle garbage collection. First, let’s discuss
what garbage collection is used for. In C#, reference types (such as strings,
arrays, and class instance objects) are allocated memory on something called
the heap. The heap is an area of memory reserved for data items (the types
mentioned previously). Value types, such as normal variables, are stored on
the stack. References to reference data are also stored on the stack, but the
actual data stored in a reference type is stored on the heap.
.net环境下的运行时间测试
.net环境里,我们需要考虑线程和垃圾回收存在的事实,这样就设计这个测试代码时去考虑这些因素。
看下垃圾回收吧,首先,讨论一下垃圾回收在.net里的作用,C#里,引用类型(如string,数组,类的实例化对象)
要在堆里申请内存,堆是一块留给数据元素的内存区域。值类型,比如简单变量,存储在栈里,引用数据的引用也存在栈里,
但是实际的数据存在堆里。
Variables that are stored on the stack are freed when the subprogram in
which the variables are declared completes its execution. Variables stored on
the heap, on the other hand, are held on the heap until the garbage collection
process is called. Heap data is only removed via garbage collection when there
is not an active reference to that data.
Garbage collection can, and will, occur at arbitrary times during the execution
of a program. However, we want to be as sure as we can that the
garbage collector is not run while the code we are timing is executing.We can
head off arbitrary garbage collection by calling the garbage collector explicitly.
The .NET environment provides a special object for making garbage
collection calls, GC. To tell the system to perform garbage collection, we
simply write:
GC.Collect();
存储在栈里的变量声明这个变量所在的子程序执行完成时释放。存储在堆里的变量一直存在堆里,直到垃圾回收。
堆数据只会当没有动态的引用那些数据时才通过垃圾回收来清除,垃圾回收可以发生在程序执行的任何时候,所以,
我们需要确定垃圾回收是否有在我们测试时运行。我们可以通过显示调用来关闭垃圾回收。.net提供一个专门的对象
来调用垃圾回收,告诉系统去垃圾回收。GC.Collect();
That’s not all we have to do, though. Every object stored on the heap has
a special method called a finalizer. The finalizer method is executed as the
last step before deleting the object. The problem with finalizer methods is
that they are not run in a systematic way. In fact, you can’t even be sure an
object’s finalizer method will run at all, but we know that before we can be
sure an object is deleted, it’s finalizer method must execute. To ensure this,
we add a line of code that tells the program to wait until all the finalizer
methods of the objects on the heap have run before continuing. The line of
code is:
GC.WaitForPendingFinalizers();
这就是我们需要做的。每个存储在堆里的对象都有一个专门的方法叫做finalizer。finalizer方法的执行是删除对象前的最后一步。
finalizer的问题是他们并不是系统化地运行的。实际上,,你甚至不能确定哪个对象finalizer会被运行。但是我们能确定删除的那个对象之前,
它的finalizer方法一定要执行。确认了这个,我们加一行代码来告诉程序:所有堆里的对象的finalizer方法运行之前先等待。代码如下:
GC.WaitForPendingFinalizers();
We have one hurdle cleared and just one left to go – using the proper
thread. In the .NET environment, a program is run inside a process, also
called an application domain. This allows the operating system to separate
each different program running on it at the same time. Within a process, a
program or a part of a program is run inside a thread. Execution time for a
program is allocated by the operating system via threads. When we are timing
the code for a program, we want to make sure that we’re timing just the
code inside the process allocated for our program and not other tasks being
performed by the operating system.
We can do this by using the Process class in the .NET Framework. The
Process class has methods for allowing us to pick the current process (the
process our program is running in), the thread the program is running in, and
a timer to store the time the thread starts executing. Each of these methods
can be combined into one call, which assigns its return value to a variable to
store the starting time (a TimeSpan object). Here’s the line of code (okay, two
lines of code):
我们有一。使用合适的线程。.net环境里,程序是运行在一个进程里的,也叫做一个应用程序域。
这允许操作系统把每个不同的程序分开让他们同时运行。在一个进程里,一个程序或程序的一部分是运行在一个线程里的。
执行时间是由操作系统来分配的。当我们测试运行时间时,我们需要确定恰好进程不被其他进程干扰。
也可以用进程类做这个工作。进程类里有让我们拾取当前进程的方法,程序里正在运行的线程。然后记录线程开始执行的时间。
这些方法可以组合成一个指定它的返回值为开始时间的方法,代码如下:
TimeSpan startingTime;
startingTime = Process.GetCurrentProcess.Threads(0).
UserProcessorTime;
class chapter1 {
static void Main() {
int[] nums = new int[100000];
BuildArray(nums);
TimeSpan startTime;
TimeSpan duration;
startTime =
Process.GetCurrentProcess().Threads[0].
UserProcessorTime;
DisplayNums(nums);
duration =
Process.GetCurrentProcess().Threads[0].
UserProcessorTime.
Subtract(startTime);
Console.WriteLine("Time: " + duration.TotalSeconds);
}
static void BuildArray(int[] arr) {
for(int i = 0; i <= 99999; i++)
arr[i] = i;
}
static void DisplayNums(int[] arr) {
for(int i = 0; i <= arr.GetUpperBound(0); i++)
Console.Write(arr[i] + " ");
}
}
Using the new and improved timing code, the program returns 0.2526.
This compares with the approximately 5 seconds returned using the first
timing code. Clearly, there is a major discrepancy between these two timing
techniques and you should use the .NET techniques when timing code in the
.NET environment.
使用这个改进的时间记录代码,时间变成了0.2526秒,比之前的5秒,显然差距很大。
所以,当在.net下计算运行时间时要使用.net的技术。
A Timing Test Class
Although we don’t need a class to run our timing code, it makes sense to
rewrite the code as a class, primarily because we’ll keep our code clear if we
can reduce the number of lines in the code we test.
A Timing class needs the following data members:
startingTime—to store the starting time of the code we are testing
duration—the ending time of the code we are testing
The starting time and the duration members store times and we chose to use
the TimeSpan data type for these data members.We’ll use just one constructor
method, a default constructor that sets both the data members to 0.
We’ll need methods for telling a Timing object when to start timing code
and when to stop timing.We also need a method for returning the data stored
in the duration data member.
As you can see, the Timing class is quite small, needing just a few methods.
Here’s the definition:
一个时间测试类
尽管我们不需要一个类去运行我们的时间测试代码,把它写成一个类,主要是为了使我们的代码变干净。
一个时间测试类需要如下成员:开始时间,结束时间。
开始时间和结束时间用TimeSpan来存储。只需要一个构造函数,一个默认的构造函数来把值设置成0.
我们需要一个方法来告诉时间对象什么时候开始和结束,还需要一个方法来返回结束时间。
如下就是这个类了。
public class Timing {
TimeSpan startingTime;
TimeSpan duration;
public Timing() {
startingTime = new TimeSpan(0);
duration = new TimeSpan(0);
}
public void StopTime() {
duration =
Process.GetCurrentProcess().Threads[0].
UserProcessorTime.Subtract(startingTime);
}
public void startTime() {
GC.Collect();
GC.WaitForPendingFinalizers();
startingTime =
Process.GetCurrentProcess().Threads[0].
UserProcessorTime;
}
public TimeSpan Result() {
return duration;
}
}
Here’s the program to test the DisplayNums subroutine, rewritten with the
Timing class:
using System;
using System.Diagnostics;
public class Timing {
TimeSpan startingTime;
TimeSpan duration;
public Timing() {
startingTime = new TimeSpan(0);
duration = new TimeSpan(0);
}
public void StopTime() {
duration =
Process.GetCurrentProcess().Threads[0].
UserProcessorTime.
Subtract(startingTime);
}
public void startTime() {
GC.Collect();
GC.WaitForPendingFinalizers();
startingTime =
Process.GetCurrentProcess().Threads[0].
UserProcessorTime;
}
public TimeSpan Result() {
return duration;
}
}
class chapter1 {
static void Main() {
int[] nums = new int[100000];
BuildArray(nums);
Timing tObj = new Timing();
tObj.startTime();
DisplayNums(nums);
tObj.stopTime();
Console.WriteLine("time (.NET): " & tObj.Result.
TotalSeconds);
}
static void BuildArray(int[] arr) {
for(int i = 0; i < 100000; i++)
arr[i] = I;
}
}
By moving the timing code into a class, we’ve cut down the number of lines
in the main program from 13 to 8. Admittedly, that’s not a lot of code to cut
out of a program, but more important than the number of lines we cut is the
clutter in the main program. Without the class, assigning the starting time to
a variable looks like this:
通过把时代代码放进一个类,我们把代码行数从13行砍成8行了。再没什么代码可以砍了,
但是比代码行数更重要的是main程序里的杂乱。没有这个类的话,就这样用
startTime = Process.GetCurrentProcess().Threads[0)].UserProcessorTime;
With the Timing class, assigning the starting time to the class data member
looks like this:
tObj.startTime();
使用时间类来记录开始时间:tObj.startTime();
SUMMARY
This chapter reviews three important techniques we will use often in this book.
Many, though not all of the programs we will write, as well as the libraries we
will discuss, are written in an object-oriented manner. The Collection class
we developed illustrates many of the basic OOP concepts seen throughout
these chapters. Generic programming allows the programmer to simplify the
definition of several data structures by limiting the number of methods that
have to be written or overloaded. The Timing class provides a simple, yet
effective way to measure the performance of the data structures and algorithms
we will study.
总结
这一章回顾了三种重要的技术。虽然不是所有程序都会写,以及我们提到的类库是用OO的方式写的。
我们开发演示了基础的OOP概念。泛型编程允许程序员通过限制不得不重载的方法的数量来简化一些数据结构的定义。
时间类提供了一个简单有效地方法来衡量数据结和算法的表现。