Sorting by Counting

Firstly, we define the terminology and notation to be used in our study of sorting: 

Records series: R1, R2, ... , RN

are supposed to be sorted into nondecreasing order of their keys K1, K2, ..., Kn, essentially by discovering permutation p(1), p(2)..., p(n) such that:

Kp(1) <= Kp(1) ... <= Kp(1)N

In this artical we will discuss internal sorting, which means the number of records to be sorted is small enough that can be load in memory at the sametime.

Sometimes single record to be sorted maybe be far bigger than key sorting, so we can use an auxiliary table of some sort specifys the permutation, we create an new table of link addresses and to manipulate this addresses rather than the bulky records around. This method is called address table sorting.

Compared to creating new tables, we can add auxiliary link field to the records, and after manipulating the records, the auxiliary field will point to the following records.

 

Sorting by counting

This idea is based on that the jth key in the final sorted sequence if greater thatn exactly j - 1 of the other keys, Putting this another way, if we know that certain key exceeds exactly 27 others, and if no two keys are equal, the corresponding record should go into position 28 after sorting. So the idea is compare every pair keys, counting how many are less than each particular one.

  (K[j] and K[i], for 1 < j < N) for 1 < i < N

We can see compare (K[j], K[i]), (K[j], K[i]) and reduntant.

Now we given the algorithm(Comparision counting):

This algorithm sort R1, R2, ... , Ron the key K1, K2, ... , KN

by maintaining an auxiliary table COUNT[1], ..., COUNT[N] to count the the numbers of keys less than or equal to a given key. After the conclusion, COUNT[j] + 1 will specify the position of the record Rj.

C1:[Clear COUNTS]. Set the COUNT[1] to COUNT[N] to zero.

C2:[Loop on i.] Perform the step 3, for i = N downto 2; then terminate the alogrithm

C3:[Loop on j.] Perform step C4, for j = i - 1, i - 2, ..., 1.

C4:[Compare Ki : Kj] If Ki < Kj, increase COUNT[j]; otherwise increase COUNT[i]

This algorithm involve no movement about records.

In the discusion just now, we blithely assumed that these keys are not equal; for really program we should consider  2 keys equality.

And this algorithm spends Ω(n2)

 

For other case, if in the case that many keys are fall into a small value field(range for u to v), this is assumption is quite restrictive but we can use this in many place. For example we apply this algorithm to the leading digits of keys instead of applying it to the whole key, then the records will be partially sorted.

Distribution counting:

Assuming that all keys are integers ranging u < Ki < v for i in (1, N), this algorithm sorts records by make an auxiliary table COUNT. At the conclusion of the algorithm the records are moved to an output area S.

D1. [Clear COUNTs.] set COUNT[u] to COUNT[v] all to zero.

D2. [Loop on j.] Perform step D3 for 1 <= j <= N; then go to D4.

D3. [Increase COUNT[Kj].]

D4. [Accumulate.] Set COUNT[i] += COUNT[i - 1] (the amount of keys not bigger than i)

D5.[Loop on j] Perform step D6 for j = N, N - 1, to 1; then terminate the algorithm.

D6.[Output Rj.] Set i ← COUNT[Kj], Si Rj, and COUNT[Kj] i - 1.

 

The python code implement the Distributiom algorithm is below:

#!/usr/bin/env python
class Record(object):
    def __init__(self, value):
        self.value = value
        self.info = "record %s" % self.value
    def __repr__(self):
        return '({:}, {:})'.format(self.value, self.info)

keys = [3,1,1,0,3,7,5,5,2,4,2,1,0,2,6,4]
records = []
for i in range(len(keys)):
    records.append(Record(keys[i]))

length = len(keys)
count = []
for i in range(length):
    if len(count) <= keys[i]:
        for j in range(len(count), keys[i] + 1):
            count.append(0)
    # count times of keys value
    count[keys[i]] += 1
print(len(count), count)

# Accumulate the num of keys
# that are equal or smaller than i
for i in range(1, len(count)):
    count[i] = count[i] + count[i - 1]
print(len(count), count)

# Output the the record to an output area S(array)
output = []
for i in range(len(keys)):
    output.append(None)
for j in range(len(keys)):
    i = count[keys[j]]
    output[i - 1] = records[j]
    count[keys[j]] = i - 1
print(records)
View Code
posted @ 2016-01-31 15:56  快乐的小土狗  阅读(160)  评论(0编辑  收藏  举报