[LeetCode] 1093. Statistics from a Large Sample 大样本统计


You are given a large sample of integers in the range [0, 255]. Since the sample is so large, it is represented by an array count where count[k] is the number of times that k appears in the sample.

Calculate the following statistics:

  • minimum: The minimum element in the sample.
  • maximum: The maximum element in the sample.
  • mean: The average of the sample, calculated as the total sum of all elements divided by the total number of elements.
  • median:
    • If the sample has an odd number of elements, then the median is the middle element once the sample is sorted.
    • If the sample has an even number of elements, then the median is the average of the two middle elements once the sample is sorted.
  • mode: The number that appears the most in the sample. It is guaranteed to be unique.

Return the statistics of the sample as an array of floating-point numbers [minimum, maximum, mean, median, mode]. Answers within 10-5 of the actual answer will be accepted.

Example 1:

Input: count = [0,1,3,4,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0]
Output: [1.00000,3.00000,2.37500,2.50000,3.00000]
Explanation: The sample represented by count is [1,2,2,2,3,3,3,3].
The minimum and maximum are 1 and 3 respectively.
The mean is (1+2+2+2+3+3+3+3) / 8 = 19 / 8 = 2.375.
Since the size of the sample is even, the median is the average of the two middle elements 2 and 3, which is 2.5.
The mode is 3 as it appears the most in the sample.

Example 2:

Input: count = [0,4,3,2,2,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0]
Output: [1.00000,4.00000,2.18182,2.00000,1.00000]
Explanation: The sample represented by count is [1,1,1,1,2,2,2,3,3,4,4].
The minimum and maximum are 1 and 4 respectively.
The mean is (1+1+1+1+2+2+2+3+3+4+4) / 11 = 24 / 11 = 2.18181818... (for display purposes, the output shows the rounded number 2.18182).
Since the size of the sample is odd, the median is the middle element 2.
The mode is 1 as it appears the most in the sample.

Constraints:

  • count.length == 256
  • 0 <= count[i] <= 109
  • 1 <= sum(count) <= 109
  • The mode of the sample that count represents is unique.

这道题说是有很多在0到 255 中的整数,由于重复的数字太多了,所以这里采用的是统计每个数字出现的个数的方式,用数组 count 来表示,其中 count[i] 表示数字i出现的次数。现在让统计原始数组中的最大值,最小值,平均值,中位数,和众数。这里面的最大最小值很好求,最小值就是 count 数组中第一个不为0的位置,最大值就是 count 数组中最后一个不为0的位置。最小值 mn 初始化为 256,在遍历 count 数组的过程中,遇到不为0的数字时,若此时 mn 为 256,则更新为坐标i。最大值 mx 直接每次更新为值不为0的坐标i即可。平均值也好求,只要求出所有的数字之和,跟数字的个数相除就行了,注意由于数字之和可能很大,需要用 double 来表示。众数也不难求,只要找出 count 数组中的最大值,则其坐标就是众数。比较难就是中位数了,由于数组的个数可奇可偶,中位数的求法不同,这里为了统一,采用一个小 trick,比如数组 1,2,3 和 1,2,3,4,可以用坐标为 (n-1)/2 和 n/2 的两个数字求平均值得到,对于长度为奇数的数组,这两个坐标表示的是相同的数字。这里由于是统计数组,所以要找的两个位置是 (cnt+1)/2 和 (cnt+2)/2,其中 cnt 是所有数字的个数。再次遍历 count 数组,使用 cur 来累计当前经过的数字个数,若 cur 小于 first,且 cur 加上 count[i] 大于等于 first,说明当前数字i即为所求,加上其的一半到 median。同理,若 cur 小于 second,cur 加上 count[i] 大于等于 second,说明当前数字i即为所求,加上其的一半到 median 即可,参见代码如下:


class Solution {
public:
    vector<double> sampleStats(vector<int>& count) {
        double mn = 256, mx = 0, mean = 0, median = 0, sum = 0;
        int cnt = 0, mode = 0;
        for (int i = 0; i < count.size(); ++i) {
            if (count[i] == 0) continue;
            if (mn == 256) mn = i;
            mx = i;
            sum += (double)i * count[i];
            cnt += count[i];
            if (count[i] > count[mode]) mode = i;
        }
        mean = sum / cnt;
        int first = (cnt + 1) / 2, second = (cnt + 2) / 2, cur = 0;
        for (int i = 0; i < count.size(); ++i) {
            if (cur < first && cur + count[i] >= first) median += i / 2.0;
            if (cur < second && cur + count[i] >= second) median += i / 2.0;
            cur += count[i];
        }
        return {mn, mx, sum / cnt, median, (double)mode};
    }
};

Github 同步地址:

https://github.com/grandyang/leetcode/issues/1093


参考资料:

https://leetcode.com/problems/statistics-from-a-large-sample/

https://leetcode.com/problems/statistics-from-a-large-sample/discuss/317626/Python-Solution

https://leetcode.com/problems/statistics-from-a-large-sample/discuss/317857/Java-Simple-2-pass-code-w-comments-and-explanation.


LeetCode All in One 题目讲解汇总(持续更新中...)

posted @ 2021-04-01 15:33  Grandyang  阅读(514)  评论(0编辑  收藏  举报
Fork me on GitHub