An Introduction to Collections, Generics,and the Timing Class
This book discusses the development and implementation of data structures
and algorithms using C#. The data structures we use in this book are found
in the .NET Framework class library System.Collections. In this chapter, we
develop the concept of a collection by first discussing the implementation of
our own Collection class (using the array as the basis of our implementation)
and then by covering the Collection classes in the .NET Framework.
本章,我们首先用自己的方式实现集合(用数组),然后再说.net Framework。

An important addition to C# 2.0 is generics. Generics allow the C# programmer
to write one version of a function, either independently or within a
class, without having to overload the function many times to allow for different
data types. C# 2.0 provides a special library, System.Collections.Generic,
that implements generics for several of the System.Collections data structures.
This chapter will introduce the reader to generic programming.
Finally, this chapter introduces a custom-built class, theTiming class, which
we will use in several chapters to measure the performance of a data structure
and/or algorithm. This class will take the place of Big O analysis, not because
Big O analysis isn’t important, but because this book takes a more practical
approach to the study of data structures and algorithms.

A collection is a structured data type that stores data and provides operations
for adding data to the collection, removing data from the collection, updating
data in the collection, as well as operations for setting and returning the values
of different attributes of the collection.
Collections can be broken down into two types: linear and nonlinear. A
linear collection is a list of elements where one element follows the previous
element. Elements in a linear collection are normally ordered by position
(first, second, third, etc.). In the real world, a grocery list is a good example
of a linear collection; in the computer world (which is also real), an array is
designed as a linear collection.
Nonlinear collections hold elements that do not have positional order
within the collection. An organizational chart is an example of a nonlinear
collection, as is a rack of billiard balls. In the computer world, trees, heaps,
graphs, and sets are nonlinear collections.
Collections, be they linear or nonlinear, have a defined set of properties that
describe them and operations that can be performed on them. An example
of a collection property is the collections Count, which holds the number of
items in the collection. Collection operations, called methods, include Add
(for adding a new element to a collection), Insert (for adding a new element
to a collection at a specified index), Remove (for removing a specified element
from a collection), Clear (for removing all the elements from a collection),
Contains (for determining if a specified element is a member of a collection),
and IndexOf (for determining the index of a specified element in a


Within the two major categories of collections are several subcategories.
Linear collections can be either direct access collections or sequential access
collections, whereas nonlinear collections can be either hierarchical or
grouped. This section describes each of these collection types.

Direct Access Collections
The most common example of a direct access collection is the array.We define
an array as a collection of elements with the same data type that are directly
accessed via an integer index, as illustrated in Figure 1.1.
Item ø Item 1 Item 2 Item 3 . . . Item j Item n−1
FIGURE 1.1. Array.

像 Item ø Item 1, Item 2, Item 3, . . . Item j, Item n−1

Arrays can be static so that the number of elements specified when the array
is declared is fixed for the length of the program, or they can be dynamic, where
the number of elements can be increased via the ReDim or ReDim Preserve
In C#, arrays are not only a built-in data type, they are also a class. Later
in this chapter, when we examine the use of arrays in more detail, we will
discuss how arrays are used as class objects.

We can use an array to store a linear collection. Adding new elements to an
array is easy since we simply place the new element in the first free position
at the rear of the array. Inserting an element into an array is not as easy (or
efficient), since we will have to move elements of the array down in order
to make room for the inserted element. Deleting an element from the end of
an array is also efficient, since we can simply remove the value from the last
element. Deleting an element in any other position is less efficient because,
just as with inserting, we will probably have to adjust many array elements
up one position to keep the elements in the array contiguous.We will discuss
these issues later in the chapter. The .NET Framework provides a specialized
array class, ArrayList, for making linear collection programming easier. We
will examine this class in Chapter 3.
NET Framework专门提供了一个数组类,ArrayList,能为我们操作线性集合更简单,具体留到第三章说

Another type of direct access collection is the string. A string is a collection
of characters that can be accessed based on their index, in the same manner we
access the elements of an array. Strings are also implemented as class objects
in C#. The class includes a large set of methods for performing standard
operations on strings, such as concatenation, returning substrings, inserting
characters, removing characters, and so forth.We examine the String class in
Chapter 8.

C# strings are immutable, meaning once a string is initialized it cannot be
changed. When you modify a string, a copy of the string is created instead of
changing the original string. This behavior can lead to performance degradation
in some cases, so the .NET Framework provides a StringBuilder class that
enables you to work with mutable strings.We’ll examine the StringBuilder in
Chapter 8 as well.
The final direct access collection type is the struct (also called structures
and records in other languages). A struct is a composite data type that holds
data that may consist of many different data types. For example, an employee
record consists of employee’ name (a string), salary (an integer), identification
number (a string, or an integer), as well as other attributes. Since storing each
of these data values in separate variables could become confusing very easily,
the language provides the struct for storing data of this type.
A powerful addition to the C# struct is the ability to define methods for
performing operations stored on the data in a struct. This makes a struct
somewhat like a class, though you can’t inherit or derive a new type from
a structure. The following code demonstrates a simple use of a structure
in C#:

比如,一条雇员记录包含 雇员姓名(string型),薪水(int型),工号(string或int型),等等。如果把这些变量分开存储的话很容易搞混。

using System;
public struct Name
  private string fname, mname, lname;
  public Name(string first, string middle, string last)
= first;
= middle;
= last;
  public string firstName
        return fname;
= firstName;
  public string middleName {
    get {
      return mname;
    set {
= middleName;
  public string lastName {
    get {
      return lname;
    set {
= lastName;
  public override string ToString() {
    return (String.Format("{0} {1} {2}", fname, mname,
  public string Initials() {
    return (String.Format("{0}{1}{2}",fname.Substring(0,1),
0,1), lname.Substring(0,1)));
  ("My initials are {0}.", inits);

Sequential Access Collections
A sequential access collection is a list that stores its elements in sequential
order. We call this type of collection a linear list. Linear lists are not limited
by size when they are created, meaning they are able to expand and contract
dynamically. Items in a linear list are not accessed directly; they are referenced
by their position, as shown in Figure 1.2. The first element of a linear list is
at the front of the list and the last element is at the rear of the list.
Because there is no direct access to the elements of a linear list, to access an
element you have to traverse through the list until you arrive at the position
of the element you are looking for. Linear list implementations usually allow
two methods for traversing a list—in one direction from front to rear, and
from both front to rear and rear to front.
1st, 2nd,3rd, Nth

front             rear

A simple example of a linear list is a grocery list. The list is created by
writing down one item after another until the list is complete. The items are
removed from the list while shopping as each item is found.
Linear lists can be either ordered or unordered. An ordered list has values
in order in respect to each other, as in:
Beata Bernica David Frank Jennifer Mike Raymond Terrill
An unordered list consists of elements in any order. The order of a list makes
a big difference when performing searches on the data on the list, as you’ll see
in Chapter 2 when we explore the binary search algorithm versus a simple
linear search.

FIGURE 1.3. Stack Operations.
Some types of linear lists restrict access to their data elements. Examples
of these types of lists are stacks and queues. A stack is a list where access is
restricted to the beginning (or top) of the list. Items are placed on the list
at the top and can only be removed from the top. For this reason, stacks are
known as Last-in, First-out structures. When we add an item to a stack, we
call the operation a push. When we remove an item from a stack, we call that
operation a pop. These two stack operations are shown in Figure 1.3.
The stack is a very common data structure, especially in computer systems
programming. Stacks are used for arithmetic expression evaluation and for
balancing symbols, among its many applications.
Bernica Push    Bernica Pop
David           David
Raymond    Raymond
Mike      Mike
A queue is a list where items are added at the rear of the list and removed
from the front of the list. This type of list is known as a First-in, First-out structure.
Adding an item to a queue is called an EnQueue, and removing an item
from a queue is called a Dequeue. Queue operations are shown in Figure 1.4.
Queues are used in both systems programming, for scheduling operating
system tasks, and for simulation studies. Queues make excellent structures
for simulating waiting lines in every conceivable retail situation. A special
type of queue, called a priority queue, allows the item in a queue with the
highest priority to be removed from the queue first. Priority queues can be
used to study the operations of a hospital emergency room, where patients
with heart trouble need to be attended to before a patient with a broken arm,for example.
队列是一个只能在末尾删除在前面添加的list,被称为先进先出的结构,添加元素叫做 入列,删除元素叫做 出列。

The last category of linear collections we’ll examine are called generalized
indexed collections. The first of these, called a hash table, stores a set of data
values associated with a key. In a hash table, a special function, called a hash
function, takes one data value and transforms the value (called the key) into
an integer index that is used to retrieve the data. The index is then used to
access the data record associated with the key. For example, an employee
record may consist of a person’s name, his or her salary, the number of years
the employee has been with the company, and the department he or she works
in. This structure is shown in Figure 1.5. The key to this data record is the
employee’s name. C# has a class, called HashTable, for storing data in a hash
table. We explore this structure in Chapter 10.

Another generalized indexed collection is the dictionary. A dictionary is
made up of a series of key–value pairs, called associations. This structure
is analogous to a word dictionary, where a word is the key and the word’s
definition is the value associated with the key. The key is an index into the
value associated with the key. Dictionaries are often called associative arrays
because of this indexing scheme, though the index does not have to be an
integer. We will examine several Dictionary classes that are part of the .NET
Framework in Chapter 11.
Hierarchical Collections
Nonlinear collections are broken down into two major groups: hierarchical
collections and group collections. A hierarchical collection is a group of items
divided into levels. An item at one level can have successor items located at
the next lower level.
One common hierarchical collection is the tree. A tree collection looks like
an upside-down tree, with one data element as the root and the other data
values hanging below the root as leaves. The elements of a tree are called
nodes, and the elements that are below a particular node are called the node’s
children. A sample tree is shown in Figure 1.6.
Trees have applications in several different areas. The file systems of most
modern operating systems are designed as a tree collection, with one directory
as the root and other subdirectories as children of the root.

A binary tree is a special type of tree collection where each node has no
more than two children. A binary tree can become a binary search tree, making
searches for large amounts of data much more efficient. This is accomplished
by placing nodes in such a way that the path from the root to a node where
the data is stored is along the shortest path possible.
Yet another tree type, the heap, is organized so that the smallest data value
is always placed in the root node. The root node is removed during a deletion,
and insertions into and deletions from a heap always cause the heap to reorganize
so that the smallest value is placed in the root. Heaps are often used
for sorts, called a heap sort. Data elements stored in a heap can be kept sorted
by repeatedly deleting the root node and reorganizing the heap.
Several different varieties of trees are discussed in Chapter 12.
还有些其他的树,比如 堆,最小的数据在根结点,删除操作会移除根结点,插入或删除操作会重新组织堆,让最小值放在根结点。
A nonlinear collection of items that are unordered is called a group. The three
major categories of group collections are sets, graphs, and networks.
A set is a collection of unordered data values where each value is unique.
The list of students in a class is an example of a set, as is, of course, the integers.
Operations that can be performed on sets include union and intersection.

An example of set operations is shown in Figure 1.7.
A graph is a set of nodes and a set of edges that connect the nodes. Graphs
are used to model situations where each of the nodes in a graph must be visited,
sometimes in a particular order, and the goal is to find the most efficient way
to “traverse” the graph. Graphs are used in logistics and job scheduling and
are well studied by computer scientists and mathematicians. You may have
heard of the “Traveling Salesman” problem. This is a particular type of graph
problem that involves determining which cities on a salesman’s route should
be traveled in order to most efficiently complete the route within the budget
allowed for travel.

A sample graph of this problem is shown in Figure 1.8.
This problem is part of a family of problems known as NP-complete problems.
This means that for large problems of this type, an exact solution is not
known. For example, to find the solution to the problem in Figure 1.8, 10
factorial tours, which equals 3,628,800 tours. If we expand the problem to
100 cities, we have to examine 100 factorial tours, which we currently cannot
do with current methods. An approximate solution must be found instead.

A network is a special type of graph where each of the edges is assigned a
weight. The weight is associated with a cost for using that edge to move from
one node to another. Figure 1.9 depicts a network of cities where the weights
are the miles between the cities (nodes).
We’ve now finished our tour of the different types of collections we are going
to discuss in this book. Now we’re ready to actually look at how collections
are implemented in C#.We start by looking at how to build a Collection class
using an abstract class from the .NET Framework, the CollectionBase class.
The .NET Framework library does not include a generic Collection class
for storing data, but there is an abstract class you can use to build your
own Collection class—CollectionBase. The CollectionBase class provides the
programmer with the ability to implement a custom Collection class. The
class implicitly implements two interfaces necessary for building a Collection
class, ICollection and IEnumerable, leaving the programmer with having to
implement just those methods that are typically part of a Collection class.

A Collection Class Implementation Using ArrayLists
In this section, we’ll demonstrate how to use C# to implement our own Collection
class. This will serve several purposes. First, if you’re not quite up
to speed on object-oriented programming (OOP), this implementation will
show you some simple OOP techniques in C#.We can also use this section to
discuss some performance issues that are going to come up as we discuss the
different C# data structures. Finally, we think you’ll enjoy this section, as well
as the other implementation sections in this book, because it’s really a lot of
fun to reimplement the existing data structures using just the native elements
of the language. As Don Knuth (one of the pioneers of computer science)
says, to paraphrase, you haven’t really learned something well until you’ve
taught it to a computer. So, by teaching C# how to implement the different
data structures, we’ll learn much more about those structures than if we just
choose to use the classes from the library in our day-to-day programming.
其次还会讨论一些其他数据结构的性能问题。最后,你会发现重新实现本地语言既有的数据结构的乐趣。 Don Knuth说过:

Defining a Collection Class
The easiest way to define a Collection class in C# is to base the class on an
abstract class already found in the System.Collections library—the Collection-
Base class. This class provides a set of abstract methods you can implement
to build your own collection. The CollectionBase class provides an underlying
data structure, InnerList (an ArrayList), which you can use as a base for
your class. In this section, we look at how to use CollectionBase to build a
Collection class.
Implementing the Collection Class
The methods that will make up the Collection class all involve some type of
interaction with the underlying data structure of the class—InnerList. The
methods we will implement in this first section are the Add, Remove, Count,
and Clear methods. These methods are absolutely essential to the class, though
other methods definitely make the class more useful.
Let’s start with the Add method. This method has one parameter – an
Object variable that holds the item to be added to the collection. Here is the


public void Add(Object item)

ArrayLists store data as objects (the Object data type), which is why we
have declared item as Object. You will learn much more about ArrayLists
in Chapter 2.
The Remove method works similarly:

public void Remove(Object item) {

The next method is Count. Count is most often implemented as a property,
but we prefer to make it a method. Also, Count is implemented in the
of Count found in CollectionBase:

public new int Count() {
return InnerList.Count;

The Clear method removes all the items from InnerList. We also have to use
the new keyword in the definition of the method:

public new void Clear() {

This is enough to get us started. Let’s look at a program that uses the
Collection class, along with the complete class definition:
There are several other methods you can implement in order to create a
more useful Collection class. You will get a chance to implement some of
these methods in the exercises.

Generic Programming
One of the problems with OOP is a feature called “code bloat.” One type of
code bloat occurs when you have to override a method, or a set of methods,
to take into account all of the possible data types of the method’s parameters.
One solution to code bloat is the ability of one value to take on multiple data
types, while only providing one definition of that value. This technique is
called generic programming.
A generic program provides a data type “placeholder” that is filled in by a
specific data type at compile-time. This placeholder is represented by a pair
of angle brackets (< >), with an identifier placed between the brackets. Let’s
look at an example.
A canonical first example for generic programming is the Swap function.
Here is the definition of a generic Swap function in C#:


using System;
using System.Collections;
public class Collection : CollectionBase<T> {
public void Add(Object item) {
public void Remove(Object item) {
public new void Clear() {
public new int Count() {
return InnerList.Count;
class chapter1 {
static void Main() {
Collection names
= new Collection();
foreach (Object name in names)
"Number of names: " + names.
"Number of names: " + names.
"Number of names: " + names.






static void Swap<T>(ref T val1, ref T val2) {
T temp;
= val1;
= val2;
= temp;

The placeholder for the data type is placed immediately after the function
name. The identifier placed inside the angle brackets is now used whenever a
generic data type is needed. Each of the parameters is assigned a generic data
type, as is the temp variable used to make the swap. Here’s a program that
tests this code:

using System;
class chapter1 {
static void Main() {
int num1 = 100;
int num2 = 200;
"num1: " + num1);
"num2: " + num2);
<int>(ref num1, ref num2);
"num1: " + num1);
"num2: " + num2);
string str1 = "Sam";
string str2 = "Tom";
"String 1: " + str1);
"String 2: " + str2);
<string>(ref str1, ref str2);
"String 1: " + str1);
"String 2: " + str2);
static void Swap<T>(ref T val1, ref T val2) {
T temp;
= val1;
= val2;
= temp;



The output from this program is:
Generics are not limited to function definitions; you can also create generic
classes. A generic class definition will contain a generic type placeholder after
the class name. Anytime the class name is referenced in the definition, the type
placeholder must be provided. The following class definition demonstrates
how to create a generic class:

public class Node<T> {
T data;
<T> link;
public Node(T data, Node<T> link) {
this.data = data;
this.link = link;



This class can be used as follows:

Node<string> node1 = new Node<string>("Mike", null);
<string> node2 = new Node<string>("Raymond", node1);

 We will be using the Node class in several of the data structures we examine
in this book.
While this use of generic programming can be quite useful, C# provides a
library of generic data structures already ready to use. These data structures
are found in the System.Collection.Generics namespace and when we discuss
a data structure that is part of this namespace, we will examine its use. Generally,
though, these classes have the same functionality as the nongeneric data
structure classes, so we will usually limit the discussion of the generic class
to how to instantiate an object of that class, since the other methods and their
use are no different.
Timing Tests
Because this book takes a practical approach to the analysis of the data structures
and algorithms examined, we eschew the use of BigOanalysis, preferring
instead to run simple benchmark tests that will tell us how long in seconds
(or whatever time unit) it takes for a code segment to run.
Our benchmarks will be timing tests that measure the amount of time it
takes an algorithm to run to completion. Benchmarking is as much of an art
as a science and you have to be careful how you time a code segment in order
to get an accurate analysis. Let’s examine this in more detail.

An Oversimplified Timing Test
First, we need some code to time. For simplicity’s sake, we will time a subroutine
that writes the contents of an array to the console. Here’s the code:


static void DisplayNums(int[] arr) {
for(int i = 0; i <= arr.GetUpperBound(0); i++)
+ " ");



The array is initialized in another part of the program, which we’ll examine
To time this subroutine, we need to create a variable that is assigned the
system time just as the subroutine is called, and we need a variable to store
the time when the subroutine returns. Here’s how we wrote this code:

DateTime startTime;
TimeSpan endTime;
= DateTime.Now;
= DateTime.Now.Subtract(startTime);

Running this code on my laptop (running at 1.4 mHz on Windows XP
Professional), the subroutine ran in about 5 seconds (4.9917). Although this
code segment seems reasonable for performing a timing test, it is completely
inadequate for timing code running in the .NET environment. Why?
First, the code measures the elapsed time from when the subroutine was
called until the subroutine returns to the main program. The time used by
other processes running at the same time as the C# program adds to the time
being measured by the test.
Second, the timing code doesn’t take into account garbage collection performed
in the .NET environment. In a runtime environment such as .NET,
the system can pause at any time to perform garbage collection. The sample
timing code does nothing to acknowledge garbage collection and the resulting
time can be affected quite easily by garbage collection. So what do we do
about this?

Timing Tests for the .NET Environment
In the .NET environment, we need to take into account the thread our program
is running in and the fact that garbage collection can occur at any time. We
need to design our timing code to take these facts into consideration.
Let’s start by looking at how to handle garbage collection. First, let’s discuss
what garbage collection is used for. In C#, reference types (such as strings,
arrays, and class instance objects) are allocated memory on something called
the heap. The heap is an area of memory reserved for data items (the types
mentioned previously). Value types, such as normal variables, are stored on
the stack. References to reference data are also stored on the stack, but the
actual data stored in a reference type is stored on the heap.

Variables that are stored on the stack are freed when the subprogram in
which the variables are declared completes its execution. Variables stored on
the heap, on the other hand, are held on the heap until the garbage collection
process is called. Heap data is only removed via garbage collection when there
is not an active reference to that data.
Garbage collection can, and will, occur at arbitrary times during the execution
of a program. However, we want to be as sure as we can that the
garbage collector is not run while the code we are timing is executing.We can
head off arbitrary garbage collection by calling the garbage collector explicitly.
The .NET environment provides a special object for making garbage
collection calls, GC. To tell the system to perform garbage collection, we
simply write:

That’s not all we have to do, though. Every object stored on the heap has
a special method called a finalizer. The finalizer method is executed as the
last step before deleting the object. The problem with finalizer methods is
that they are not run in a systematic way. In fact, you can’t even be sure an
object’s finalizer method will run at all, but we know that before we can be
sure an object is deleted, it’s finalizer method must execute. To ensure this,
we add a line of code that tells the program to wait until all the finalizer
methods of the objects on the heap have run before continuing. The line of
code is:

We have one hurdle cleared and just one left to go – using the proper
thread. In the .NET environment, a program is run inside a process, also
called an application domain. This allows the operating system to separate
each different program running on it at the same time. Within a process, a
program or a part of a program is run inside a thread. Execution time for a
program is allocated by the operating system via threads. When we are timing
the code for a program, we want to make sure that we’re timing just the
code inside the process allocated for our program and not other tasks being
performed by the operating system.
We can do this by using the Process class in the .NET Framework. The
Process class has methods for allowing us to pick the current process (the
process our program is running in), the thread the program is running in, and
a timer to store the time the thread starts executing. Each of these methods
can be combined into one call, which assigns its return value to a variable to
store the starting time (a TimeSpan object). Here’s the line of code (okay, two
lines of code):

TimeSpan startingTime;
= Process.GetCurrentProcess.Threads(0).
class chapter1 {
static void Main() {
int[] nums = new int[100000];
TimeSpan startTime;
TimeSpan duration;
"Time: " + duration.TotalSeconds);
static void BuildArray(int[] arr) {
for(int i = 0; i <= 99999; i++)
= i;
static void DisplayNums(int[] arr) {
for(int i = 0; i <= arr.GetUpperBound(0); i++)
+ " ");



Using the new and improved timing code, the program returns 0.2526.
This compares with the approximately 5 seconds returned using the first
timing code. Clearly, there is a major discrepancy between these two timing
techniques and you should use the .NET techniques when timing code in the
.NET environment.


A Timing Test Class
Although we don’t need a class to run our timing code, it makes sense to
rewrite the code as a class, primarily because we’ll keep our code clear if we
can reduce the number of lines in the code we test.
A Timing class needs the following data members:
 startingTime—to store the starting time of the code we are testing
 duration—the ending time of the code we are testing
The starting time and the duration members store times and we chose to use
the TimeSpan data type for these data members.We’ll use just one constructor
method, a default constructor that sets both the data members to 0.
We’ll need methods for telling a Timing object when to start timing code
and when to stop timing.We also need a method for returning the data stored
in the duration data member.
As you can see, the Timing class is quite small, needing just a few methods.
Here’s the definition:

public class Timing {
TimeSpan startingTime;
TimeSpan duration;
public Timing() {
= new TimeSpan(0);
= new TimeSpan(0);
public void StopTime() {
public void startTime() {
public TimeSpan Result() {
return duration;
Here’s the program to test the DisplayNums subroutine, rewritten with the
using System;
using System.Diagnostics;
public class Timing {
TimeSpan startingTime;
TimeSpan duration;
public Timing() {
= new TimeSpan(0);
= new TimeSpan(0);
public void StopTime() {
public void startTime() {
public TimeSpan Result() {
return duration;
class chapter1 {
static void Main() {
int[] nums = new int[100000];
Timing tObj
= new Timing();
"time (.NET): " & tObj.Result.
static void BuildArray(int[] arr) {
for(int i = 0; i < 100000; i++)
= I;



By moving the timing code into a class, we’ve cut down the number of lines
in the main program from 13 to 8. Admittedly, that’s not a lot of code to cut
out of a program, but more important than the number of lines we cut is the
clutter in the main program. Without the class, assigning the starting time to
a variable looks like this:

startTime = Process.GetCurrentProcess().Threads[0)].UserProcessorTime;

With the Timing class, assigning the starting time to the class data member
looks like this:
This chapter reviews three important techniques we will use often in this book.
Many, though not all of the programs we will write, as well as the libraries we
will discuss, are written in an object-oriented manner. The Collection class
we developed illustrates many of the basic OOP concepts seen throughout
these chapters. Generic programming allows the programmer to simplify the
definition of several data structures by limiting the number of methods that
have to be written or overloaded. The Timing class provides a simple, yet
effective way to measure the performance of the data structures and algorithms
we will study.