*Top K Frequent Elements

Given a non-empty array of integers, return the k most frequent elements.

For example,
Given [1,1,1,2,2,3] and k = 2, return [1,2].

Note: 

    • You may assume k is always valid, 1 ≤ k ≤ number of unique elements.
    • Your algorithm's time complexity must be better than O(n log n), where n is the array's size.

 

 

 

Solution 1: HashMap + Heap

This solution is very straight forward, it stores all the elements in a HashMap that maps elements to their frequencies, then inserts all the map entry to a max heap so as to get the ones with the highest frequency. The time complexity is O(nlogn) and space complexity is O(n).

 

public List<Integer> topKFrequent(int[] nums, int k) {
    Map<Integer, Integer> map = new HashMap<>();
    for(int n : nums){
        map.put(n, map.getOrDefault(n,0) + 1);
    }

    PriorityQueue<Map.Entry<Integer, Integer>> maxHeap =
                     new PriorityQueue<>((a,b) -> (b.getValue() - a.getValue()));
    for(Map.Entry<Integer,Integer> entry : map.entrySet()){
        maxHeap.add(entry);
    }

    List<Integer> res = new ArrayList<>();
    while(res.size()<k){
        Map.Entry<Integer, Integer> entry = maxHeap.poll();
        res.add(entry.getKey());
    }
    return res;
}

Solution 2: Bucket Sort

Since sorting takes O(nlogn), can we accelerate the sorting process by using extra space? The answer is yes, bucket sort.

 

bucketSort(arr[], n)
1) Create n empty buckets (Or lists).
2) Do following for every array element arr[i].
.......a) Insert arr[i] into bucket[n*array[i]]
3) Sort individual buckets using insertion sort.
4) Concatenate all sorted buckets.

 

 

 

public List<Integer> topKFrequent(int[] nums, int k) {
    List<Integer>[] bucket = new List[nums.length + 1];
    Map<Integer, Integer> frequencyMap = new HashMap<>();
    for (int n : nums) {
        frequencyMap.put(n, frequencyMap.getOrDefault(n, 0) + 1);
    }

    for (int key : frequencyMap.keySet()) {
        int frequency = frequencyMap.get(key);
        if (bucket[frequency] == null) {
            bucket[frequency] = new ArrayList<>();
        }
        bucket[frequency].add(key);
    }

    List<Integer> res = new ArrayList<>();
    for (int pos = bucket.length - 1; pos >= 0 && res.size() < k; pos--) {
        if (bucket[pos] != null) {
            res.addAll(bucket[pos]);
        }
    }
    return res;
}

 

Top K Frequent Elements in a Data Stream (System Design)

If we replace the input array with a data stream, it becomes harder for a single machine to store everything in a single HashMap, it’s also impossible to perform sorting for such large data set. Let’s see what can we do if we can use multi machines to handle this work.

Solution 1: Multi HashMap + heap

We can split the data into n shardings and move a record to a instance respectively. Instance0 only handles the data with its hash code equals hash(item) % n == 0Instance1 only handles the data with hash(item) % n == 1, etc.

For each instance, we have a HashMap and Heap to store the items as in the first solution for the leetcode question.

In order to retrieve the global top k elements, we only need to gather the heaps from all the instances and run a merge sort, it’s very similar to the leetcode question: 23. Merge k Sorted Lists.

Even though we distribute the work load to multiple machines to reduce the data size to 1/n, it’s still not the optimal way to handle large data set in daily work.

See more details at: 

https://zpjiang.me/2017/11/13/top-k-elementes-system-design/

 

reference: https://zpjiang.me/2017/11/13/top-k-elementes-system-design/

posted @ 2016-08-11 06:38  Hygeia  阅读(287)  评论(0编辑  收藏  举报