这道题常规解法需要将每行输入中的字母两两比较一下,才能得出这行输入的"unsortedness",作为这行输入的key,然后对所有的输入按照key进行稳定排序。总体时间复杂度T(n,m) = O(n!)O(m) + O(mlogm)。
常规算法的复杂度令人难以满意。注意到这一点:输入的字母只包括ACGT四个字母。再加上输入的n比较小,可以使用数组这种随机访问数据结构,我们可以一次构建一个O(n)的算法,来计算一行输入的"unsortedness":
题目描述
题目链接:POJ 1007 DNA Sorting
DNA Sorting
Time Limit: 1000MS Memory Limit: 10000K
Total Submissions: 46191 Accepted: 18037
Description
One measure of ``unsortedness'' in a sequence is the number of pairs of entries that are out of order with respect to each other. For instance, in the letter sequence ``DAABEC'', this measure is 5, since D is greater than four letters to its right and E is greater than one letter to its right. This measure is called the number of inversions in the sequence. The sequence ``AACEDGG'' has only one inversion (E and D)---it is nearly sorted---while the sequence ``ZWQM'' has 6 inversions (it is as unsorted as can be---exactly the reverse of sorted). You are responsible for cataloguing a sequence of DNA strings (sequences containing only the four letters A, C, G, and T). However, you want to catalog them, not in alphabetical order, but rather in order of ``sortedness'', from ``most sorted'' to ``least sorted''. All the strings are of the same length.
Input
The first line contains two integers: a positive integer n (0 < n <= 50) giving the length of the strings; and a positive integer m (0 < m <= 100) giving the number of strings. These are followed by m lines, each containing a string of length n.
Output
Output the list of input strings, arranged from ``most sorted'' to ``least sorted''. Since two strings can be equally sorted, then output them according to the orginal order.
Sample Input
10 6
AACATGAAGG
TTTTGGCCAA
TTTGGCCAAA
GATCAGATTT
CCCGGGGGGA
ATCGATGCAT
Sample Output
CCCGGGGGGA
AACATGAAGG
GATCAGATTT
ATCGATGCAT
TTTTGGCCAA
TTTGGCCAAA
Source
East Central North America 1998
解题分析
这道题常规解法需要将每行输入中的字母两两比较一下,才能得出这行输入的"unsortedness",作为这行输入的key,然后对所有的输入按照key进行稳定排序 。总体时间复杂度T(n,m) = O(n!)O(m) + O(mlogm)。
常规算法的复杂度令人难以满意。注意到这一点:输入的字母只包括ACGT四个字母 。再加上输入的n比较小,可以使用数组这种随机访问数据结构,我们可以一次构建一个O(n)的算法,来计算一行输入的"unsortedness":
每读取一个输入,将这个字母的出现次数加1;
如果这个字母是T,则什么也不做;
如果是G,则将unsortedness加上已经出现过的T的次数;
如果是C,则将unsortedness加上已经出现过的T的次数和G的次数;
如果是A,则将unsortedness加上已经出现过的T的次数和G的次数和C的次数;
下面给出这道题的伪代码:
Procedure POJ1007 Begin
Read the number n and m
Dim array as Pair<key, value> array
For i from 0 to m Begin
line <- ReadLine
unsortedness <- 0
occurs[A] <- 0
occurs[C] <- 0
occurs[G] <- 0
occurs[T] <- 0
For ch in line Begin
occurs[ch] <- occurs[ch] + 1
Switch ch Begin
case T:
break
case G:
occurs[T] <- occurs[T] + 1
break
case C:
occurs[T] <- occurs[T] + 1
occurs[G] <- occurs[G] + 1
break
case A:
occurs[T] <- occurs[T] + 1
occurs[G] <- occurs[G] + 1
occurs[C] <- occurs[C] + 1
break
End Switch
unsortedness <- unsortedness + occurs[ch]
End For
put <unsortness, line> into array
End For
stable_sort array and output result
End Procedure
总结
应该充分的利用已知条件,一些特殊的条件可以极大的改进我们的算法效率。
快速排序不是稳定排序。