LeetCode #274 H-Index

Question

Given an array of citations (each citation is a non-negative integer) of a researcher, write a function to compute the researcher's h-index.

According to the definition of h-index on Wikipedia: "A scientist has index h if h of his/her N papers have at least h citations each, and the other N − h papers have no more than h citations each."

Example:

Input: citations = [3,0,6,1,5]
Output: 3 
Explanation: [3,0,6,1,5] means the researcher has 5 papers in total and each of them had 
             received 3, 0, 6, 1, 5 citations respectively. 
             Since the researcher has 3 papers with at least 3 citations each and the remaining 
             two with no more than 3 citations each, her h-index is 3.

排序+遍历O(nlogn)

根据 wiki（见参考链接）中提供的计算方法：

First we order the values of f from the largest to the lowest value. Then, we look for the last position in which f is greater than or equal to the position (we call h this position).

即只要从大到小排序然后遍历找到最后一个 citations[i] >= i 就行了，此时 h=i （实际上是citations[i] >= i+1）

为什么work？举个例子

6 5 3 1 0

1 2 3 4 5

这说明前三个数满足 citations[i] >= i >= 3 的，后两个数满足 citations[i] < i （此时i最小取4），所以citations[i] <=3

当然，根据题目定义的方法来进行比较也是ok的，时间复杂度没有增加，但后续改进会难以继续

bucket sort：O(n)

用bucket sort桶排序可以达到O(n)。这题有个非常值得注意的特点是，h的范围是在[0, n]之间的，所以可以用bucket sort！

class Solution:
    def hIndex(self, citations: List[int]) -> int:
        length = len(citations)
        freq_list = [0 for i in range(length+1)]
        
        # first pass freq_list
        for i in range(length):
            if citations[i] > length:
                index = length
            else:
                index = citations[i]
            freq_list[index] += 1
        
        # second pass freq_list
        last = 0
        for i in range(length, -1, -1):
            freq_list[i] += last
            last = freq_list[i] 
            if freq_list[i] >= i:
                return i

桶排序的关键是建立一个映射，比如基数为10的基数排序就是建立f(x) = x mod 10 这样的映射。我们先定义bucket：

freq_list[i]：表示有多少篇文章被至少引用了i次

要求出freq_list，需要两次遍历：第一次求出有多少篇文章被引用了i次，第二次求出有多少篇文章被至少引用了i次。

注意到，如果有x篇文章的引用至少3次，那么引用至少2次的文章数量y等于x加上引用次数等于2次的文章数量，即 y= x + freq_list[i]，因此该步骤可以以一次遍历完成。

参考：

https://en.wikipedia.org/wiki/H-index

https://www.cnblogs.com/zmyvszk/p/5619051.html

posted @ 2020-01-09 23:34 sbj123456789 阅读(199) 评论(0) 编辑收藏举报

刷新页面返回顶部

sbj123456789

LeetCode #274 H-Index

Question

排序+遍历O(nlogn)

bucket sort：O(n)

公告