Wu.Country@侠缘

勤学似春起之苗,不见其增,日有所长; 辍学如磨刀之石,不见其损,日所有亏!

导航

Effective C# 原则11:选择foreach循环

Effective C# 原则11:选择foreach循环
Item 11: Prefer foreach Loops

C#的foreach语句是从do,while,或者for循环语句变化而来的,它相对要好一些,它可以为你的任何集合产生最好的迭代代码。它的定义依懒于.Net框架里的集合接口,并且编译器会为实际的集合生成最好的代码。当你在集合上做迭代时,可用使用foreach来取代其它的循环结构。检查下面的三个循环:

int [] foo = new int[100];

// Loop 1:
foreach ( int i in foo)
  Console.WriteLine( i.ToString( ));

// Loop 2:
for ( int index = 0;  index < foo.Length;  index++ )
  Console.WriteLine( foo[index].ToString( ));

// Loop 3:
int len = foo.Length;
for ( int index = 0;  index < len;  index++ )
  Console.WriteLine( foo[index].ToString( ));

对于当前的C#编译器(版本1.1或者更高)而言,循环1是最好的。起码它的输入要少些,这会使你的个人开发效率提提升。(1.0的C#编译器对循环1而言要慢很多,所以对于那个版本循环2是最好的。) 循环3,大多数C或者C++程序员会认为它是最有效的,但它是最糟糕的。因为在循环外部取出了变量Length的值,从而阻碍了JIT编译器将边界检测从循环中移出。

C#代码是安全的托管代码里运行的。环境里的每一块内存,包括数据的索引,都是被监视的。稍微展开一下,循环3的代码实际很像这样的:

// Loop 3, as generated by compiler:
int len = foo.Length;
for ( int index = 0;  index < len;  index++ )
{
  if ( index < foo.Length )
    Console.WriteLine( foo[index].ToString( ));
  else
    throw new IndexOutOfRangeException( );
}

C#的JIT编译器跟你不一样,它试图帮你这样做了。你本想把Length属性提出到循环外面,却使得编译做了更多的事情,从而也降低了速度。CLR要保证的内容之一就是:你不能写出让变量访问不属于它自己内存的代码。在访问每一个实际的集合时,运行时确保对每个集合的边界(不是len变量)做了检测。你把一个边界检测分成了两个。

你还是要为循环的每一次迭代做数组做索引检测,而且是两次。循环1和循环2要快一些的原因是因为,C#的JIT编译器可以验证数组的边界来确保安全。任何循环变量不是数据的长度时,边界检测就会在每一次迭代中发生。(译注:这里几次说到JIT编译器,它是指将IL代码编译成本地代码时的编译器,而不是指将C#代码或者其它代码编译成IL代码时的编译器。其实我们可以用不安全选项来迫使JIT不做这样的检测,从而使运行速度提高。)

原始的C#编译器之所以对foreach以及数组产生很慢的代码,是因为涉及到了装箱。装箱会在原则17中展开讨论。数组是安全的类型,现在的foreach可以为数组生成与其它集合不同的IL代码。对于数组的这个版本,它不再使用IEnumerator接口,就是这个接口须要装箱与拆箱。

IEnumerator it = foo.GetEnumerator( );
while( it.MoveNext( ))
{
  int i = (int) it.Current; // box and unbox here.
  Console.WriteLine( i.ToString( ) );
}

取而代之的是,foreach语句为数组生成了这样的结构:

for ( int index = 0;  index < foo.Length;  index++ )
  Console.WriteLine( foo[index].ToString( ));

(译注:注意数组与集合的区别。数组是一次性分配的连续内存,集合是可以动态添加与修改的,一般用链表来实现。而对于C#里所支持的锯齿数组,则是一种折衷的处理。)

foreach总能保证最好的代码。你不用操心哪种结构的循环有更高的效率:foreach和编译器为你代劳了。

如果你并不满足于高效,例如还要有语言的交互。这个世界上有些人(是的,正是他们在使用其它的编程语言)坚定不移的认为数组的索引是从1开始的,而不是0。不管我们如何努力,我们也无法破除他们的这种习惯。.Net开发组已经尝试过。为此你不得不在C#这样写初始化代码,那就是数组从某个非0数值开始的。

// Create a single dimension array.
// Its range is [ 1 .. 5 ]
Array test = Array.CreateInstance( typeof( int ),
new int[ ]{ 5 }, new int[ ]{ 1 });

这段代码应该足够让所有人感到畏惧了(译注:对我而言,确实有一点)。但有些人就是很顽固,无认你如何努力,他们会从1开始计数。很幸运,这是那些问题当中的一个,而你可以让编译器来“欺骗”。用foreach来对test数组进行迭代:
foreach( int j in test )
  Console.WriteLine ( j );

foreach语句知道如何检测数组的上下限,所以你应该这样做,而且这和for循环的速度是一样的,也不用管某人是采用那个做为下界。

对于多维数组,foreach给了你同样的好处。假设你正在创建一个棋盘。你将会这样写两段代码:

private Square[,] _theBoard = new Square[ 8, 8 ];

// elsewhere in code:
for ( int i = 0; i < _theBoard.GetLength( 0 ); i++ )
  for( int j = 0; j < _theBoard.GetLength( 1 ); j++ )
    _theBoard[ i, j ].PaintSquare( );

取而代之的是,你可以这样简单的画这个棋盘:
foreach( Square sq in _theBoard )
  sq.PaintSquare( );
(译注:本人不赞成这样的方法。它隐藏了数组的行与列的逻辑关系。循环是以行优先的,如果你要的不是这个顺序,那么这种循环并不好。)

foreach语句生成恰当的代码来迭代数组里所有维数的数据。如果将来你要创建一个3D的棋盘,foreach循环还是一样的工作,而另一个循环则要做这样的修改:
for ( int i = 0; i < _theBoard.GetLength( 0 ); i++ )
  for( int j = 0; j < _theBoard.GetLength( 1 ); j++ )
    for( int k = 0; k < _theBoard.GetLength( 2 ); k++ )
      _theBoard[ i, j, k ].PaintSquare( );
(译注:这样看上去虽然代码很多,但我觉得,只要是程序员都可以一眼看出这是个三维数组的循环,但是对于foreach,我看没人一眼可以看出来它在做什么! 个人理解。当然,这要看你怎样认识,这当然可以说是foreach的一个优点。)

事实上,foreach循环还可以在每个维的下限不同的多维数组上工作(译注:也就是锯齿数组)。 我不想写这样的代码,即使是为了做例示。但当某人在某时写了这样的集合时,foreach可以胜任。

foreach也给了你很大的伸缩性,当某时你发现须要修改数组里底层的数据结构时,它可以尽可能多的保证代码不做修改。我们从一个简单的数组来讨论这个问题:

int [] foo = new int[100];

假设后来某些时候,你发现它不具备数组类(array class)的一些功能,而你又正好要这些功能。你可能简单把一个数组修改为ArrayList:

// Set the initial size:
ArrayList foo = new ArrayList( 100 );

任何用for循环的代码被破坏:
int sum = 0;
for ( int index = 0;
  // won't compile: ArrayList uses Count, not Length
  index < foo.Length;
  index++ )
  // won't compile: foo[ index ] is object, not int.
  sum += foo[ index ];

然而,foreach循环可以根据所操作的对象不同,而自动编译成不同的代码来转化恰当的类型。什么也不用改。还不只是对标准的数组可以这样,对于其它任何的集合类型也同样可以用foreach.

如果你的集合支持.Net环境下的规则,你的用户就可以用foreach来迭代你的数据类型。为了让foreach语句认为它是一个集合类型,一个类应该有多数属性中的一个:公开方法GetEnumerator()的实现可以构成一个集合类。明确的实现IEnumerable接口可以产生一个集合类。实现IEnumerator接口也可以实现一个集合类。foreach可以在任何一个上工作。

foreach有一个好处就是关于资源管理。IEnumerable接口包含一个方法:GetEnumerator()。foreach语句是一个在可枚举的类型上生成下面的代码,优化过的:
IEnumerator it = foo.GetEnumerator( ) as IEnumerator;
using ( IDisposable disp = it as IDisposable )
{
  while ( it.MoveNext( ))
  {
    int elem = ( int ) it.Current;
    sum += elem;
  }
}

如果断定枚举器实现了IDisposable接口,编译器可以自动优化代码为finally块。但对你而言,明白这一点很重要,无论如何,foreach生成了正确的代码。

foreach是一个应用广泛的语句。它为数组的上下限自成正确的代码,迭代多维数组,强制转化为恰当的类型(使用最有效的结构),还有,这是最重要的,生成最有效的循环结构。这是迭代集合最有效的方法。这样,你写出的代码更持久(译注:就是不会因为错误而改动太多的代码),第一次写代码的时候更简洁。这对生产力是一个小的进步,随着时间的推移会累加起来。

=========================

Item 11: Prefer foreach Loops
The C# foreach statement is more than just a variation of the do, while, or for loops. It generates the best iteration code for any collection you have. Its definition is tied to the collection interfaces in the .NET Framework, and the C# compiler generates the best code for the particular type of collection. When you iterate collections, use foreach instead of other looping constructs. Examine these three loops:

int [] foo = new int[100];

// Loop 1:
foreach ( int i in foo)
  Console.WriteLine( i.ToString( ));

// Loop 2:
for ( int index = 0;
  index < foo.Length;
  index++ )
  Console.WriteLine( foo[index].ToString( ));

// Loop 3:
int len = foo.Length;
for ( int index = 0;
  index < len;
  index++ )
  Console.WriteLine( foo[index].ToString( ));

 

For the current and future C# compilers (version 1.1 and up), loop 1 is best. It's even less typing, so your personal productivity goes up. (The C# 1.0 compiler produced much slower code for loop 1, so loop 2 is best in that version.) Loop 3, the construct most C and C++ programmers would view as most efficient, is the worst option. By hoisting the Length variable out of the loop, you make a change that hinders the JIT compiler's chance to remove range checking inside the loop.

C# code runs in a safe, managed environment. Every memory location is checked, including array indexes. Taking a few liberties, the actual code for loop 3 is something like this:

// Loop 3, as generated by compiler:
int len = foo.Length;
for ( int index = 0;
  index < len;
  index++ )
{
  if ( index < foo.Length )
    Console.WriteLine( foo[index].ToString( ));
  else
    throw new IndexOutOfRangeException( );
}

 

The JIT C# compiler just doesn't like you trying to help it this way. Your attempt to hoist the Length property access out of the loop just made the JIT compiler do more work to generate even slower code. One of the CLR guarantees is that you cannot write code that overruns the memory that your variables own. The runtime generates a test of the actual array bounds (not your len variable) before accessing each particular array element. You get one bounds check for the price of two.

You still pay to check the array index on every iteration of the loop, and you do so twice. The reason loops 1 and 2 are faster is that the C# compiler and the JIT compiler can verify that the bounds of the loop are guaranteed to be safe. Anytime the loop variable is not the length of the array, the bounds check is performed on each iteration.

The reason that foreach and arrays generated very slow code in the original C# compiler concerns boxing, which is covered extensively in Item 17. Arrays are type safe. foreach now generates different IL for arrays than other collections. The array version does not use the IEnumerator interface, which would require boxing and unboxing:

IEnumerator it = foo.GetEnumerator( );
while( it.MoveNext( ))
{
  int i = (int) it.Current; // box and unbox here.
  Console.WriteLine( i.ToString( ) );
}

 

Instead, the foreach statement generates this construct for arrays:

for ( int index = 0;
  index < foo.Length;
  index++ )
  Console.WriteLine( foo[index].ToString( ));

 

foreach always generates the best code. You don't need to remember which construct generates the most efficient looping construct: foreach and the compiler will do it for you.

If efficiency isn't enough for you, consider language interop. Some folks in the world (yes, most of them use other programming languages) strongly believe that index variables start at 1, not 0. No matter how much we try, we won't break them of this habit. The .NET team tried. You have to write this kind of initialization in C# to get an array that starts at something other than 0:

// Create a single dimension array.
// Its range is [ 1 .. 5 ]
Array test = Array.CreateInstance( typeof( int ),
new int[ ]{ 5 }, new int[ ]{ 1 });

 

This code should be enough to make anybody cringe and just write arrays that start at 0. But some people are stubborn. Try as you might, they will start counting at 1. Luckily, this is one of those problems that you can foist off on the compiler. Iterate the test array using foreach:

foreach( int j in test )
  Console.WriteLine ( j );

 

The foreach statement knows how to check the upper and lower bounds on the array, so you don't have toand it's just as fast as a hand-coded for loop, no matter what different lower bound someone decides to use.

foreach adds other language benefits for you. The loop variable is read-only: You can't replace the objects in a collection using foreach. Also, there is explicit casting to the correct type. If the collection contains the wrong type of objects, the iteration throws an exception.

foreach gives you similar benefits for multidimensional arrays. Suppose that you are creating a chess board. You would write these two fragments:

private Square[,] _theBoard = new Square[ 8, 8 ];

// elsewhere in code:
for ( int i = 0; i < _theBoard.GetLength( 0 ); i++ )
  for( int j = 0; j < _theBoard.GetLength( 1 ); j++ )
    _theBoard[ i, j ].PaintSquare( );

 

Instead, you can simplify painting the board this way:

foreach( Square sq in _theBoard )
  sq.PaintSquare( );

 

The foreach statement generates the proper code to iterate across all dimensions in the array. If you make a 3D chessboard in the future, the foreach loop just works. The other loop needs modification:

for ( int i = 0; i < _theBoard.GetLength( 0 ); i++ )
  for( int j = 0; j < _theBoard.GetLength( 1 ); j++ )
    for( int k = 0; k < _theBoard.GetLength( 2 ); k++ )
      _theBoard[ i, j, k ].PaintSquare( );

 

In fact, the foreach loop would work on a multidimensional array that had different lower bounds in each direction. I don't want to write that kind of code, even as an example. But when someone else codes that kind of collection, foreach can handle it.

foreach also gives you the flexibility to keep much of the code intact if you find later that you need to change the underlying data structure from an array. We started this discussion with a simple array:

int [] foo = new int[100];

 

Suppose that, at some later point, you realize that you need capabilities that are not easily handled by the array class. You can simply change the array to an ArrayList:

// Set the initial size:
ArrayList foo = new ArrayList( 100 );

 

Any hand-coded for loops are broken:

int sum = 0;
for ( int index = 0;
  // won't compile: ArrayList uses Count, not Length
  index < foo.Length;
  index++ )
  // won't compile: foo[ index ] is object, not int.
  sum += foo[ index ];

 

However, the foreach loop compiles to different code that automatically casts each operand to the proper type. No changes are needed. It's not just changing to standard collections classes, eitherany collection type can be used with foreach.

Users of your types can use foreach to iterate across members if you support the .NET environment's rules for a collection. For the foreach statement to consider it a collection type, a class must have one of a number of properties. The presence of a public GetEnumerator() method makes a collection class. Explicitly implementing the IEnumerable interface creates a collection type. Implementing the IEnumerator interface creates a collection type. foreach works with any of them.

foreach has one added benefit regarding resource management. The IEnumerable interface contains one method: GetEnumerator(). The foreach statement on an enumerable type generates the following, with some optimizations:

IEnumerator it = foo.GetEnumerator( ) as IEnumerator;
using ( IDisposable disp = it as IDisposable )
{
  while ( it.MoveNext( ))
  {
    int elem = ( int ) it.Current;
    sum += elem;
  }
}

 

The compiler automatically optimizes the code in the finally clause if it can determine for certain whether the enumerator implements IDisposable. But for you, it's more important to see that, no matter what, foreach generates correct code.

foreach is a very versatile statement. It generates the right code for upper and lower bounds in arrays, iterates multidimensional arrays, coerces the operands into the proper type (using the most efficient construct), and, on top of that, generates the most efficient looping constructs. It's the best way to iterate collections. With it, you'll create code that is more likely to last, and it's simpler for you to write in the first place. It's a small productivity improvement, but it adds up over time.
 

posted on 2007-02-27 21:06  Wu.Country@侠缘  阅读(5263)  评论(8编辑  收藏  举报