Statistics

 

1 Law of small numbers

  • If the statistical data is very limited, then events will manifest as various extreme cases,
  • and these cases are merely coincidental events that have nothing to do with their expected value

2 Law of Large Numbers :

  • If the data is sufficiently large, the probability of an event occurring will increasingly approximate its expected value.

3 Central Limit Theorem (CLT)

  • Given a population with any distribution,
  • each time I randomly draw a sample of size n from this population,
  • and I repeat this sampling m times.
  • Then, for each of these m samples, I calculate the sample mean.
  • The distribution of these m sample means will approximately follow a normal distribution. 

4 Uniform Distribution

The probability of the sample X falling within the interval [a,b][a,b] is the same for any subinterval of equal length. The probability density function of X is:

$$f(x)=\frac{1}{b-a}$$

5 Bernoulli Distribution

The outcome of a Bernoulli trial has only two possible results. For example, flipping a coin yields either 0 or 1。 

6 Binomial Distribution

Perform n independent Bernoulli trials, each trial has only two outcomes: 0 or 1. If n = 1, it is clearly a Bernoulli distribution.

$$P(x=k)=C_{n}^{k}p^{k}(1-p)^{n-k}$$

7 Poisson Distribution

Assume that the average number of occurrences of an event is given by λ. Then the probability distribution of the number of times the event occurs within a fixed interval of time (or space) is called the Poisson distribution. It is a discrete probability distribution

$$P(x=k)=e^{-\lambda }\frac{\lambda ^{k}}{k^{!}}$$

8 Exponential Distribution

If the expected number of occurrences of an event per unit time is given by λ, then the probability distribution of the time t until the next event occurs is called the exponential distribution

$$P(t)=1-e^{-\lambda t}$$

  

  • The Poisson distribution counts the number of occurrences.
  • The exponential distribution measures the time until an occurrence. 

 

9 CI, confidence interval

CI = (P.hat – z* x SE, P.hat + z* x SE)

Example, coins toss

  • CI: Confidence interval
  • P.hat: Sample proportion of heads (number of heads / total tosses)
  • z*: Critical z-score corresponding to the desired confidence level (e.g., 1.96 for 95% confidence)
  • n: Sample size (total number of coin tosses)
  • SE: standard error which is square root [p.hat(1-p.hat) / n]

 

$$\mu_{\text{global}} = \frac{\sum_{i=1}^k n_i \mu_i}{\sum_{i=1}^k n_i}$$

$$\sigma_{\text{global}} = \sqrt{\frac{\sum_{i=1}^k (n_i - 1) \sigma_i^2}{\sum_{i=1}^k (n_i - 1)}}$$

10 Simpson’s Paradox 

The Simpson’s Paradox conveys that the direction of a trend changes when looking at sub-groups, rather than the whole.

11 Law of Total Probability

The Law of Total Probability is a fundamental rule in probability theory that expresses the total probability of an event as a weighted sum of its conditional probabilities over a partition of the sample space.

$$P(A) = \sum_{i=1}^{n} P(B_i) \cdot P(A \mid B_i)$$

12 Arithmetic Series Sum Formula

$$S_n = \frac{n(a_1 + a_n)}{2}$$

13 Geometric Series Sum Formula

$$S_n = \frac{a_1 (r^n - 1)}{r - 1}, \quad r \neq 1$$

14 covariance

$$\operatorname{Cov}(X, Y) = E[XY] - E[X]E[Y]$$

posted @ 2019-11-17 19:37  ylxn  阅读(2048)  评论(0)    收藏  举报