LeetCode #274. H-Index 数组

Description


Given an array of citations (each citation is a non-negative integer) of a researcher, write a function to compute the researcher's h-index.

According to the definition of h-index on Wikipedia: "A scientist has index h if h of his/her N papers have at least h citations each, and the other N − h papers have no more than h citations each."

Example:

Input: citations = [3,0,6,1,5]
Output: 3 
Explanation: [3,0,6,1,5] means the researcher has 5 papers in total and each of them had 
             received 3, 0, 6, 1, 5 citations respectively. 
             Since the researcher has 3 papers with at least 3 citations each and the remaining 
             two with no more than 3 citations each, her h-index is 3.

Note: If there are several possible values for h, the maximum one is taken as the h-index.



思路


解法一

纯暴力解题。由题目得 h <= N,所以用双层循环,外层遍历 h 的所有可能取值,内层遍历 citations 求出 ≥ h 和 ≤ h 的个数。

用两个计数器分别统计 at least h 和 no more than h 的数字的出现个数,并用 if + continue 保证每个数字只被统计一次。注意,计数器 cnt1 统计了 h 个数字后就不再统计,以确保计算器 cnt2 能正确统计剩下 N - h 篇引用不超过 h 的论文。

由于计算器 cnt1 统计 h 个数字后就不再统计,我们就需要预先进行降序排序以避免计数错误,比如 [2, 1] 这组数据的计数结果应该与 [1, 2] 这组数据是相同的。

时间复杂度:O(n^2) = 排序 O(nlgn) + 两层循环 O(n^2)
空间复杂度:O(1)

耗时 185 ms, Memory 6.5 MB, ranking 5.56%

class Solution {
public:
    int hIndex(vector<int> &citations) {
        sort(citations.begin(), citations.end(), greater<int>());
        int paper_num = citations.size();
        int max_h_idx = 0;  // h index for a scientist must include 0

        for (int h = 1; h <= paper_num; ++h) {
            int cnt1 = 0;
            int cnt2 = 0;

            for (int cit : citations) {
                if (cnt1 != h && cit >= h) {
                    ++cnt1;
                    continue;  // guarantee only count every element once
                }
                
                if (cit <= h) {
                    ++cnt2;
                }
            }

            if (cnt1 == h && cnt2 == (paper_num - h)) {
                max_h_idx = h;
            }
        }

        return max_h_idx;
    }
};



解法二

Wikipedia 给出了统计 H-Index 的算法:

  • 将其发表的所有SCI论文按被引次数从高到低排序;
  • 从前往后查找排序后的列表,直到某篇论文的序号大于该论文被引次数。所得序号减一即为H指数。

我理解的原理是:在这个算法中,h 值等于数组的索引值,在降序排序后,每次条件遍历一个元素,就说明至少有一篇论文的引用数 ≥ h,而数组剩下的另一半元素恰好就是那些引用数 ≤ h 的论文。因此,当索引值 i >= citations[i] 时,说明 citations[0..i] 这些论文的引用数是 ≥ h,而 citations[i..n] 这些论文的引用数是 ≤ h,刚好就是这道算法题的答案。

时间复杂度:O(n) = 排序 O(nlgn) + 遍历citations O(n)
空间复杂度:O(1)

耗时 4 ms, Memory 6.6 MB, ranking 70.58%

class Solution {
public:
    int hIndex(vector<int> &citations) {
        sort(citations.begin(), citations.end(), greater<int>());

        for (int i = 0; i < citations.size(); ++i) {
            if (i >= citations[i]) return i;
        }

        return citations.size();
    }
};



参考




posted @ 2020-04-17 14:45  bw98  阅读(291)  评论(0编辑  收藏  举报