LeetCode #274 H-Index

Question

Given an array of citations (each citation is a non-negative integer) of a researcher, write a function to compute the researcher's h-index.

According to the definition of h-index on Wikipedia: "A scientist has index h if h of his/her N papers have at least h citations each, and the other N − h papers have no more than h citations each."

Example:

Input: citations = [3,0,6,1,5]
Output: 3 
Explanation: [3,0,6,1,5] means the researcher has 5 papers in total and each of them had 
             received 3, 0, 6, 1, 5 citations respectively. 
             Since the researcher has 3 papers with at least 3 citations each and the remaining 
             two with no more than 3 citations each, her h-index is 3.

排序+遍历O(nlogn)

根据 wiki(见参考链接)中提供的计算方法:

First we order the values of f from the largest to the lowest value. Then, we look for the last position in which f is greater than or equal to the position (we call h this position).

即只要从大到小排序然后遍历找到最后一个 citations[i] >= i 就行了,此时 h=i (实际上是citations[i] >= i+1)

为什么work?举个例子

6 5 3 1 0

1 2 3 4 5

这说明前三个数满足 citations[i] >= i >= 3 的,后两个数满足 citations[i] < i (此时i最小取4),所以citations[i] <=3

当然,根据题目定义的方法来进行比较也是ok的,时间复杂度没有增加,但后续改进会难以继续

bucket sort:O(n)

用bucket sort桶排序可以达到O(n)。这题有个非常值得注意的特点是,h的范围是在[0, n]之间的,所以可以用bucket sort!

class Solution:
    def hIndex(self, citations: List[int]) -> int:
        length = len(citations)
        freq_list = [0 for i in range(length+1)]
        
        # first pass freq_list
        for i in range(length):
            if citations[i] > length:
                index = length
            else:
                index = citations[i]
            freq_list[index] += 1
        
        # second pass freq_list
        last = 0
        for i in range(length, -1, -1):
            freq_list[i] += last
            last = freq_list[i] 
            if freq_list[i] >= i:
                return i

桶排序的关键是建立一个映射,比如基数为10的基数排序就是建立f(x) = x mod 10 这样的映射。我们先定义bucket:

freq_list[i]:表示有多少篇文章被至少引用了i次

要求出freq_list,需要两次遍历:第一次求出有多少篇文章被引用了i次,第二次求出有多少篇文章被至少引用了i次。

注意到,如果有x篇文章的引用至少3次,那么引用至少2次的文章数量y等于x加上引用次数等于2次的文章数量,即 y= x + freq_list[i],因此该步骤可以以一次遍历完成。

 

 

参考:

https://en.wikipedia.org/wiki/H-index 

https://www.cnblogs.com/zmyvszk/p/5619051.html

 
posted @ 2020-01-09 23:34  sbj123456789  阅读(197)  评论(0编辑  收藏  举报