LeetCode #274 H-Index
Question
Given an array of citations (each citation is a non-negative integer) of a researcher, write a function to compute the researcher's h-index.
According to the definition of h-index on Wikipedia: "A scientist has index h if h of his/her N papers have at least h citations each, and the other N − h papers have no more than h citations each."
Example:
Input:citations = [3,0,6,1,5]
Output: 3 Explanation:[3,0,6,1,5]
means the researcher has5
papers in total and each of them had received3, 0, 6, 1, 5
citations respectively. Since the researcher has3
papers with at least3
citations each and the remaining two with no more than3
citations each, her h-index is3
.
排序+遍历O(nlogn)
根据 wiki(见参考链接)中提供的计算方法:
First we order the values of f from the largest to the lowest value. Then, we look for the last position in which f is greater than or equal to the position (we call h this position).
即只要从大到小排序然后遍历找到最后一个 citations[i] >= i 就行了,此时 h=i (实际上是citations[i] >= i+1)
为什么work?举个例子
6 5 3 1 0
1 2 3 4 5
这说明前三个数满足 citations[i] >= i >= 3 的,后两个数满足 citations[i] < i (此时i最小取4),所以citations[i] <=3
当然,根据题目定义的方法来进行比较也是ok的,时间复杂度没有增加,但后续改进会难以继续
bucket sort:O(n)
用bucket sort桶排序可以达到O(n)。这题有个非常值得注意的特点是,h的范围是在[0, n]之间的,所以可以用bucket sort!
class Solution: def hIndex(self, citations: List[int]) -> int: length = len(citations) freq_list = [0 for i in range(length+1)] # first pass freq_list for i in range(length): if citations[i] > length: index = length else: index = citations[i] freq_list[index] += 1 # second pass freq_list last = 0 for i in range(length, -1, -1): freq_list[i] += last last = freq_list[i] if freq_list[i] >= i: return i
桶排序的关键是建立一个映射,比如基数为10的基数排序就是建立f(x) = x mod 10 这样的映射。我们先定义bucket:
freq_list[i]:表示有多少篇文章被至少引用了i次
要求出freq_list,需要两次遍历:第一次求出有多少篇文章被引用了i次,第二次求出有多少篇文章被至少引用了i次。
注意到,如果有x篇文章的引用至少3次,那么引用至少2次的文章数量y等于x加上引用次数等于2次的文章数量,即 y= x + freq_list[i],因此该步骤可以以一次遍历完成。
参考:
https://en.wikipedia.org/wiki/H-index
https://www.cnblogs.com/zmyvszk/p/5619051.html