Statistics
1 Law of small numbers
- If the statistical data is very limited, then events will manifest as various extreme cases,
- and these cases are merely coincidental events that have nothing to do with their expected value
2 Law of Large Numbers :
- If the data is sufficiently large, the probability of an event occurring will increasingly approximate its expected value.
3 Central Limit Theorem (CLT)
- Given a population with any distribution,
- each time I randomly draw a sample of size n from this population,
- and I repeat this sampling m times.
- Then, for each of these m samples, I calculate the sample mean.
- The distribution of these m sample means will approximately follow a normal distribution.
4 Uniform Distribution
The probability of the sample X falling within the interval [a,b][a,b] is the same for any subinterval of equal length. The probability density function of X is:
$$f(x)=\frac{1}{b-a}$$
5 Bernoulli Distribution
The outcome of a Bernoulli trial has only two possible results. For example, flipping a coin yields either 0 or 1。
6 Binomial Distribution
Perform n independent Bernoulli trials, each trial has only two outcomes: 0 or 1. If n = 1, it is clearly a Bernoulli distribution.
$$P(x=k)=C_{n}^{k}p^{k}(1-p)^{n-k}$$
7 Poisson Distribution
Assume that the average number of occurrences of an event is given by λ. Then the probability distribution of the number of times the event occurs within a fixed interval of time (or space) is called the Poisson distribution. It is a discrete probability distribution
$$P(x=k)=e^{-\lambda }\frac{\lambda ^{k}}{k^{!}}$$
8 Exponential Distribution
If the expected number of occurrences of an event per unit time is given by λ, then the probability distribution of the time t until the next event occurs is called the exponential distribution
$$P(t)=1-e^{-\lambda t}$$
- The Poisson distribution counts the number of occurrences.
- The exponential distribution measures the time until an occurrence.
9 CI, confidence interval
CI = (P.hat – z* x SE, P.hat + z* x SE)
Example, coins toss
- CI: Confidence interval
- P.hat: Sample proportion of heads (number of heads / total tosses)
- z*: Critical z-score corresponding to the desired confidence level (e.g., 1.96 for 95% confidence)
- n: Sample size (total number of coin tosses)
- SE: standard error which is square root [p.hat(1-p.hat) / n]
$$\mu_{\text{global}} = \frac{\sum_{i=1}^k n_i \mu_i}{\sum_{i=1}^k n_i}$$
$$\sigma_{\text{global}} = \sqrt{\frac{\sum_{i=1}^k (n_i - 1) \sigma_i^2}{\sum_{i=1}^k (n_i - 1)}}$$
10 Simpson’s Paradox
The Simpson’s Paradox conveys that the direction of a trend changes when looking at sub-groups, rather than the whole.
11 Law of Total Probability
The Law of Total Probability is a fundamental rule in probability theory that expresses the total probability of an event as a weighted sum of its conditional probabilities over a partition of the sample space.
$$P(A) = \sum_{i=1}^{n} P(B_i) \cdot P(A \mid B_i)$$
12 Arithmetic Series Sum Formula
$$S_n = \frac{n(a_1 + a_n)}{2}$$
13 Geometric Series Sum Formula
$$S_n = \frac{a_1 (r^n - 1)}{r - 1}, \quad r \neq 1$$14 covariance
$$\operatorname{Cov}(X, Y) = E[XY] - E[X]E[Y]$$

浙公网安备 33010602011771号