Chapter five -- sorting

第三卷前言之前也看过了，这里不想涉及，看到这章是排序，我想大多数人会选择先看这一章，起码可以想象得到比较有趣。

无非必要我实在不想讲作者喜欢在每一章开头写的那些类似警句的话，不过这里的第二段是比较引人深思的，“We merely arrange a list and look for duphcations"，这一句话也就把排序和查找的关系和它们之间重要的联系说出来了，更重要的是，许多问题的解决似乎和这句话有着很多关系，这不，已经引起我的兴趣！

作者还讨论了半页的sorting，order，sequencing哪个拿来命名排序比较好？看似比较蛋疼，其实刚看还是比较有趣的。

排序的应用：

a) Solving the "togetherness" problem, 这个意思就是把同一个值的东西放在连续的地方，这让我想起，小时候老是在打扑克前对对够不够54张。连续性

b) Matching items in two or more files. 这个就是前面我们说的，一次排序就可以找到你要找的项，这跟查找有莫大的关系。匹配性

c) Searching for information by key values. 这个就是列出有排序规律的表，即字典表示的方式，所以暂时叫它字典性，

上面三个应用特性是我自己的加上的，方便记忆，其实思考排序问题，这三个方面可以说很基本，也很本质。

然后作者有描述起一些有趣的历史和说明排序研究的重要性，有兴趣的建议看下。然后作者说明排序算法对分析算法学的意义所在，排序是比较典型的算法，所以用来研究分析算法的性能特征也显得很有意义。

术语。N个项目，R₁，R₂....R_n，每一个项目叫一个记录，N个项目在一起就叫文件，每一个记录又有一个键码K_j，这键码就是排序的依据。通常我们叫key，除了key，每一个记录都应该有一些额外信息，当然，这额外信息有可能影响排序，但是通常写程序是尽量不影响排序，把有影响和没影响的分清楚。

那么排序需要一个次序关系，熟悉STL的同学也知道，在用排序函数时，需要提供一个决定次序关系的函数范式，因为这个最基本的东西。次序关系也要满足两个条件：

i) Exactly one of the possibilities a < b, a = b, b < a is true. (This is called
the law of trichotomy.) 三分律
ii) If a < b and b < c, then a < c. (This is the familiar law of transitivity.) 传递律

满足这两个特征的叫全序，本章也是对这些有<关系的进行排序，有学过离散的同学应该知道半序和全序这些概念，这和等价这些概念都很有关系，顺便推销下离散数学。

排序的结果就是确定一个下标为1-N的排列p(1).....p(N)，然后所有键码满足K_p(1)<=K_p(2)<=......<=K_p(N)

那么进一步约束，就出现了稳定排序，稳定排序还需要满足，p(i)<p(j) 当K_p(i) = K_p(j)且i<j时

另外还假定有极值，定义为大于或小于所有键码，-∞ < K_j < ∞, for 1 < j < N. 有时候也把极值用作标志指示器，这里的无穷并不表示一定要用一个无穷值来表示，而是用一个不能出现的极值来表示即可。

排序还分内部排序还外部排序，内部排序是当内存可以一次性保存的所有记录，而外部排序则是对于不能够一次性保存的情况，外部排序相对来说需要更严格的存取约束。

The time required to sort N records, using a decent general-purpose sorting algorithm, is roughly proportional to Nlog^N; we make about log A?' "passes" over the data. This is the minimum possible time, as we shall see in Section 5.3.1, if the records are in random order and if sorting is done by pairwise comparisons of keys. Thus if we double the number of records, it will take a little more than twice as long to sort them, all other things being equal. (Actually, as N approaches infinity, a better indication of the time needed to sort is N(log^N)², if the keys are distinct, since the size of the keys must grow at least as fast as log TV; but for practical purposes, N never really approaches infinity.)

On the other hand, if the keys are known to be randomly distributed with respect to some continuous numerical distribution, we will see that sorting can be accomplished in O(N) steps on the average.

这两段话是这一小节里面最难理解的，其实也当然的事情，因为它涉及到排序的最好效率时间的表现，已经后面一些研究成果，所以这里就暂时不去深究了。不过这也勾起我们的兴趣了吧。

posted on 2011-09-07 11:46 Em_Num_Cool 阅读(151) 评论(0) 编辑收藏举报