Median Maintainence—中间值查找问题

问题描述:

随机给出一串数i, 要能够给出其中大小中间的那个数

 

算法描述:

一般做法,做插入排序,然后中间值在索引一半的位置,时间复杂度一般,插入排序平均时间复杂度O(n2),再找中间

值,效率不高。

 

这里的做法是,引入数据结构--Heap来解决问题,时间复杂度为O(logn)。

 

引入两个堆,max heap和 min heap来存放整数串i的两个部分,需要满足如下条件:

1. 大小条件

    max heap中的元素个数只能比min中的多1个,或者是相等,否则进行调整

2. 顺序条件

    max heap中存放前半部分小的值

    min heap中存放后半部分大的值

    max heap中最大的值只能比min中最小的值小,或者是相等,否则进行调整

 

也就是median产生在max heap中的堆顶,或者是max heap堆顶和min heap中的堆顶的平均值

 

代码如下:

class MyHeap:
    # heap type 
    MAX_HEAP = 1
    MIN_HEAP = 0
        
    def __init__(self, type=MAX_HEAP, arr=None):
        self.type = type
        # if init directly by array
        if arr is not None:
            self.data = arr[:]
            length = len(arr)
            # the last non leave node
            begin = length / 2 - 1
            for i in range(begin, -1, -1):
                self.heapify(i)
        else:
            self.data = []
        
    def __heapify(self, i):
        length = len(self.data)
        left = self.__leftChild(i)
        right = self.__rightChild(i)
        
        largest = i
        
        while left < length or right < length:
            if self.type == self.MAX_HEAP:
                if left < length and self.data[left] > self.data[largest]:
                    largest = left
                if right < length and self.data[right] > self.data[largest]:
                    largest = right
            elif self.type == self.MIN_HEAP:
                if left < length and self.data[left] < self.data[largest]:
                    largest = left
                if right < length and self.data[right] < self.data[largest]:
                    largest = right
                
            if i != largest:
                self.__swap(i, largest)
                
                i = largest
                left = self.__leftChild(i)
                right = self.__rightChild(i)
            else:
                break
    
    def inset(self, item):
        self.data.insert(0, item)
        # heapify starts from 0
        self.__heapify(0)
    
    def delete(self, index):
        self.data.pop(index)
        # if delete the 0 index item, heapify from 0
        self.heapify(index - 1 if index - 1 else 0)
    
    def pop(self):
        # pop the extreme value, what ever it is max or min
        self.__swap(0, len(self.data) - 1)
        extreme = self.data.pop()
        self.__heapify(0)
        return extreme
    
    # overwrite the getitem method of MyHeap class,
    # so you can use [] to get value by index
    def __getitem__(self, index):
        if len(self.data) == 0:
            raise Error("no items")
        return self.data[index]
    
    # overwrite the len method of MyHeap class,
    # so you can len(heapclass) to get the size of heap
    def __len__(self):
        return len(self.data)
    
    def __swap(self, i, j):
        temp = self.data[i]
        self.data[i] = self.data[j]
        self.data[j] = temp
        
    # index of array starts from zero
    def __rightChild(self, i):
        return 2 * i + 1
    
    def __leftChild(self, i):
        return 2 * i + 2
    
    # overwrite the repr method of MyHeap class,
    # so you can print the readability info of heap
    def __repr__(self):
        return str(self.data)
    
class MedianMaintain:
    def __init__(self):
        self.maxHeap = MyHeap(MyHeap.MAX_HEAP)
        self.minHeap = MyHeap(MyHeap.MIN_HEAP)
        # the total number of items in both heaps
        self.N = 0
    
    def insert(self, item):
        # to obey size requirement rule, before insertion, if 
        # total number is even, it is OK, insert new item to 
        # max heap, and then adjust it
        if self.N % 2 == 0:
            self.maxHeap.inset(item)
            self.N += 1
            
            if len(self.minHeap) == 0:
                return 
            
            # to obey order requirement rule, largest of items in max heap should 
            # less or equal than smallest of the items in the min heap, if not, 
            # swap them
            if self.maxHeap[0] > self.minHeap[0]:
                toMin = self.maxHeap.pop()
                toMax = self.minHeap.pop()
                self.maxHeap.inset(toMax)
                self.minHeap.inset(toMin)
        else:
            # to obey the size requirement rule, before insertion, if the size of 
            # max heap is odd, then to insert the new item, and pop the extreme value
            # to insert into min heap
            self.maxHeap.inset(item)
            toMin = self.maxHeap.pop()
            self.minHeap.inset(toMin)
            self.N += 1
            
    def getMedian(self):
        # if total size if even, the median is the average of value of root of min and max heap
        if self.N % 2 == 0:
            return (self.maxHeap[0] + self.minHeap[0]) / 2.0
        else:
            # if total size if odd, median is root of max heap
            return self.maxHeap[0]
        
    def __repr__(self):
        return "max heap: " + str(self.maxHeap) + '\n' + "min heap: " + str(self.minHeap)
    
if __name__ == "__main__":
    medianMaintain = MedianMaintain()
    medianMaintain.insert(5)
    medianMaintain.insert(4)
    medianMaintain.insert(3)
    medianMaintain.insert(2)
    medianMaintain.insert(1)
    medianMaintain.insert(6)
    
    print medianMaintain
    
    print medianMaintain.getMedian()
posted @ 2012-05-06 10:31  btchenguang  阅读(1224)  评论(0编辑  收藏  举报