POJ 1007 解题分析

Technorati 标签: ACM,POJ

题目描述

DNA Sorting

Time Limit: 1000MS
Memory Limit: 10000K

Total Submissions: 46191
Accepted: 18037

Description

One measure of ``unsortedness'' in a sequence is the number of pairs of entries that are out of order with respect to each other. For instance, in the letter sequence ``DAABEC'', this measure is 5, since D is greater than four letters to its right and E is greater than one letter to its right. This measure is called the number of inversions in the sequence. The sequence ``AACEDGG'' has only one inversion (E and D)---it is nearly sorted---while the sequence ``ZWQM'' has 6 inversions (it is as unsorted as can be---exactly the reverse of sorted).
You are responsible for cataloguing a sequence of DNA strings (sequences containing only the four letters A, C, G, and T). However, you want to catalog them, not in alphabetical order, but rather in order of ``sortedness'', from ``most sorted'' to ``least sorted''. All the strings are of the same length.

Input

The first line contains two integers: a positive integer n (0 < n <= 50) giving the length of the strings; and a positive integer m (0 < m <= 100) giving the number of strings. These are followed by m lines, each containing a string of length n.

Output

Output the list of input strings, arranged from ``most sorted'' to ``least sorted''. Since two strings can be equally sorted, then output them according to the orginal order.

Sample Input

10 6
AACATGAAGG
TTTTGGCCAA
TTTGGCCAAA
GATCAGATTT
CCCGGGGGGA
ATCGATGCAT

Sample Output

CCCGGGGGGA
AACATGAAGG
GATCAGATTT
ATCGATGCAT
TTTTGGCCAA
TTTGGCCAAA

Source

East Central North America 1998

解题分析

这道题常规解法需要将每行输入中的字母两两比较一下，才能得出这行输入的"unsortedness"，作为这行输入的key，然后对所有的输入按照key进行稳定排序。总体时间复杂度T(n,m) = O(n!)O(m) + O(mlogm)。

常规算法的复杂度令人难以满意。注意到这一点：输入的字母只包括ACGT四个字母。再加上输入的n比较小，可以使用数组这种随机访问数据结构，我们可以一次构建一个O(n)的算法，来计算一行输入的"unsortedness"：

每读取一个输入，将这个字母的出现次数加1；
如果这个字母是T，则什么也不做；
如果是G，则将unsortedness加上已经出现过的T的次数；
如果是C，则将unsortedness加上已经出现过的T的次数和G的次数；
如果是A，则将unsortedness加上已经出现过的T的次数和G的次数和C的次数；

下面给出这道题的伪代码：

Procedure POJ1007 Begin
	Read the number n and m
	Dim array as Pair<key, value> array
	For i from 0 to m Begin
		line <- ReadLine
		unsortedness <- 0
		occurs[A] <- 0
		occurs[C] <- 0
		occurs[G] <- 0
		occurs[T] <- 0
		For ch in line Begin
			occurs[ch] <- occurs[ch] + 1
			Switch ch Begin
				case T:
					break
				case G:
					occurs[T] <- occurs[T] + 1
					break
				case C:
					occurs[T] <- occurs[T] + 1
					occurs[G] <- occurs[G] + 1
					break
				case A:
					occurs[T] <- occurs[T] + 1
					occurs[G] <- occurs[G] + 1
					occurs[C] <- occurs[C] + 1
					break
			End Switch
			unsortedness <- unsortedness + occurs[ch]
		End For
		put <unsortness, line> into array
	End For
	stable_sort array and output result
End Procedure

总结

应该充分的利用已知条件，一些特殊的条件可以极大的改进我们的算法效率。

快速排序不是稳定排序。

posted @ 2010-07-12 15:28 HCOONa 阅读(628) 评论(0) 编辑收藏举报

刷新页面返回顶部