一个算法问题
有网友在我《在Excel中使用VBA来筛选数据》(http://www.cnblogs.com/maweifeng/archive/2005/06/13/71504.html)一文中提出了如下问题,大概考虑了一下,解答如下,欢迎指正。
问题描述:
有一组数据,数据量很大(大于10000,假设为N),对这N组数据(每组有M个数值且已排好序),如果N组中某几组数据中有h个数字相同(例如7),就认为这几组数据都是相似的,然后把它们一起存放在一个指定的位置。
例如以下数据:
(行号) (数据)
0001 2 4 6 10 12 14 16 18 20 22 24 26
0002 4 10 12 16 20 22 26
…….
0014 2 4 12 14 18 22 24
就认为0001,0002,0014行数据是相似的,把它们存放在指定的位置。
0003 1 5 7 12 14 22 28 39 50 55 60
0007 5 7 14 22 39 55 60
0013 1 7 12 14 28 39 55
…...
0100 5 7 12 22 28 39 50 60
这几组数据也是类似的,把它们也存在上一组类似数据的后面。 其他的以此类推。
实现算法:
for i = 1 to N
for j = i + 1 to N
m , n = 0
count = 0
比较 i 行和 j 行
While
if A[i][m] < A[j][n]
m++
else if A[i][m] == A[j][n]
count++
m++, n++
else
n++
if count > 7
break
标记 i 行和 j 行相似,做相应操作
大概算法很明了,无需过多说明,执行效率为 N2*M (1/2 * N2 * M)。自己再想不出什么更好的算法。有一个问题是:如果第1,3,4,5行类似,在i为3时,4,5行也会伴随扫描进来为类似,但实际上已经不需要,而3有另一组数据和7,8行类似,因此,需要标记这类现象。
如果已经标记的行再不需要参与计算,则将标记的行在循环时跳过即可,效率大概为:N(N-2)*M/2。
对于这个问题,因为各行需要两两比较,因此至少行之间的比较需要N*(N-1)/2次运算,但如果行之间比较的结果在其他行比较时可以用的上的话,应该还有优化的余地。
附带VBA代码:
![](/Images/OutliningIndicators/None.gif)
2
![](/Images/OutliningIndicators/None.gif)
3
![](/Images/OutliningIndicators/None.gif)
4
![](/Images/OutliningIndicators/None.gif)
5
![](/Images/OutliningIndicators/None.gif)
6
![](/Images/OutliningIndicators/None.gif)
7
![](/Images/OutliningIndicators/None.gif)
8
![](/Images/OutliningIndicators/None.gif)
9
![](/Images/OutliningIndicators/None.gif)
10
![](/Images/OutliningIndicators/None.gif)
11
![](/Images/OutliningIndicators/None.gif)
12
![](/Images/OutliningIndicators/None.gif)
13
![](/Images/OutliningIndicators/None.gif)
14
![](/Images/OutliningIndicators/None.gif)
15
![](/Images/OutliningIndicators/None.gif)
16
![](/Images/OutliningIndicators/None.gif)
17
![](/Images/OutliningIndicators/None.gif)
18
![](/Images/OutliningIndicators/None.gif)
19
![](/Images/OutliningIndicators/None.gif)
20
![](/Images/OutliningIndicators/None.gif)
21
![](/Images/OutliningIndicators/None.gif)
22
![](/Images/OutliningIndicators/None.gif)
23
![](/Images/OutliningIndicators/None.gif)
24
![](/Images/OutliningIndicators/None.gif)
25
![](/Images/OutliningIndicators/None.gif)
26
![](/Images/OutliningIndicators/None.gif)
27
![](/Images/OutliningIndicators/None.gif)
28
![](/Images/OutliningIndicators/None.gif)
29
![](/Images/OutliningIndicators/None.gif)
30
![](/Images/OutliningIndicators/None.gif)
31
![](/Images/OutliningIndicators/None.gif)
32
![](/Images/OutliningIndicators/None.gif)
33
![](/Images/OutliningIndicators/None.gif)
34
![](/Images/OutliningIndicators/None.gif)
35
![](/Images/OutliningIndicators/None.gif)
36
![](/Images/OutliningIndicators/None.gif)
37
![](/Images/OutliningIndicators/None.gif)
38
![](/Images/OutliningIndicators/None.gif)
39
![](/Images/OutliningIndicators/None.gif)
40
![](/Images/OutliningIndicators/None.gif)
41
![](/Images/OutliningIndicators/None.gif)
42
![](/Images/OutliningIndicators/None.gif)
43
![](/Images/OutliningIndicators/None.gif)
44
![](/Images/OutliningIndicators/None.gif)
45
![](/Images/OutliningIndicators/None.gif)
46
![](/Images/OutliningIndicators/None.gif)
47
![](/Images/OutliningIndicators/None.gif)
48
![](/Images/OutliningIndicators/None.gif)
49
![](/Images/OutliningIndicators/None.gif)
50
![](/Images/OutliningIndicators/None.gif)
51
![](/Images/OutliningIndicators/None.gif)
52
![](/Images/OutliningIndicators/None.gif)
53
![](/Images/OutliningIndicators/None.gif)
54
![](/Images/OutliningIndicators/None.gif)
55
![](/Images/OutliningIndicators/None.gif)
56
![](/Images/OutliningIndicators/None.gif)
57
![](/Images/OutliningIndicators/None.gif)
58
![](/Images/OutliningIndicators/None.gif)
59
![](/Images/OutliningIndicators/None.gif)
60
![](/Images/OutliningIndicators/None.gif)
61
![](/Images/OutliningIndicators/None.gif)
62
![](/Images/OutliningIndicators/None.gif)
63
![](/Images/OutliningIndicators/None.gif)
64
![](/Images/OutliningIndicators/None.gif)
65
![](/Images/OutliningIndicators/None.gif)
66
![](/Images/OutliningIndicators/None.gif)
67
![](/Images/OutliningIndicators/None.gif)
68
![](/Images/OutliningIndicators/None.gif)
69
![](/Images/OutliningIndicators/None.gif)
70
![](/Images/OutliningIndicators/None.gif)
71
![](/Images/OutliningIndicators/None.gif)
72
![](/Images/OutliningIndicators/None.gif)
73
![](/Images/OutliningIndicators/None.gif)
74
![](/Images/OutliningIndicators/None.gif)
75
![](/Images/OutliningIndicators/None.gif)
76
![](/Images/OutliningIndicators/None.gif)
77
![](/Images/OutliningIndicators/None.gif)
78
![](/Images/OutliningIndicators/None.gif)
79
![](/Images/OutliningIndicators/None.gif)
80
![](/Images/OutliningIndicators/None.gif)
81
![](/Images/OutliningIndicators/None.gif)
82
![](/Images/OutliningIndicators/None.gif)