统计Ⅱ-笔记

这学期的课程内容,原本想要整理一下的,但笔记有点乱,另外这门课注重数理推导所以公式有点多;因此只整理习题课上的提纲,基本上覆盖了这学期的主要内容(主要是因为自己懒🐶)。

Estimation: para \(\theta\) estimator \(\hat\theta\)

  • MLE
  • LSE

Unbiased: \(E[\hat\theta]=\theta\)

Hyperthesis Testing

\(H_0: \theta\in\Theta_0\leftrightarrow H_1:\theta\in\Theta_0^C\)

e.g. normal dist. \(x_1...x_n\sim N(\theta,1)\), estimate the para

Confidence Interval

Regression

Model: \(Y=X\beta+\epsilon\)

  • Y: response vector
  • X: Data matrix, design matrix
  • \(\theta\): unknown parameters
  • \(\epsilon\): error\(\sim N(0,\sigma^2I_n)\)

Estimation

  1. LSE: \(\hat\beta=(X^TX)^{-1}X^TY\overset{\triangle}{=}b\)
  2. MLE
  3. BLUE: best linear unbiased estimator (Gauss-Markov Theo)

Fitted Model of Y: \(\hat Y=Xb=X(X^TX)^{-1}X^TY\overset{\triangle}{=}HY\)

Residual: \(e=Y=\hat Y=(I-H)Y\)

Residual sum of squares: \(SSE=Y^T(I-H)Y\)

Estimator of \(\sigma^2\): \(\hat\sigma^2\overset{\triangle}{=}s^2={SSE\over n-k-1}\), unbiased

Distribution of \(b\) and \(s^2\):

\[b\sim N(\beta, \sigma^2(X^TX)^{-1})\\ (n-k-1)s^2\sim\sigma^2 \chi^2(n-k-1)\\ {SSE\over \sigma^2}={(n-k-1)s^2\over\sigma^2}\sim\chi^2(n-k-1)\\ \]

Inference

Sum of Squares Degree of Freedom
Total: \(SST=\sum(y_i-\hat y)^2=Y^T(I-{1\over n}11^T)Y\) \(df_T=n-1\)
Regression: \(SSR\sum(\hat y_i-\bar y)^2=Y^T(H-{1\over n}11^T)Y\) \(df_R=k\)
Residual: \(SSE=\sum(y_i-\hat y_i)^2=Y^T(I-H)Y\) \(df_E=n-1-k\)
\(SSR=SSR+SSE\) \(df_t=df_R+df_E\)
  • F-test for regression relation: \(H_0: \beta_1=...=\beta_k=0\)

\[F={SSR/\sigma^2k\over SSE/\sigma^2(n-k-1)}={MSR\over MES}\sim F_{k,n-k-1} \]

Reject \(H_0\), when \(F>F_{k,n-k-1}(1-\alpha)\)

  • Test for \(\beta_j\): \(H_0: \beta_j=0\)

\[{b_i-\beta_j\over\sqrt{\sigma^2(X^TX)^{-1}_{j+1,j+1}}}\sim N(0,1) \]

Under \(H_0\)

\[{b_i\over\sqrt{\sigma^2(X^TX)^{-1}_{j+1,j+1}}}\sim N(0,1) \]

Model Fitness

\[R^2={SSR\over SST}\in[0,1]\\ R^2_\alpha=1-{SSE/(n-k-1)\over SST/(n-1)} \]

e.g.

given values \(x_0=(1,x_{01},...,x_{0k})^T\), \(x_0^T\beta\) CI

\[{x_0^Tb-x_0^T\beta\over\sigma\sqrt{x_0^T(X^TX)^{-1}_{j+1,j+1}x_0}}\sim N(0,1)\\ {x_0^Tb-x_0^T\beta\over s \sqrt{x_0^T(X^TX)^{-1}_{j+1,j+1}x_0}}\sim t_{n-k-1}\\ \]

so the \(1-\alpha\) CI for \(x_0^T\beta\) is \([x_0^T\pm t_{n-k-1}(\alpha/1)s\sqrt{x_0^T(X^TX)^{-1}x_0}]\)

e.g.

CI for a new ovservation

e.g.

\(H_0: C\beta=0\), where \(C\in R^{d\times (k+1)}\)

The test statistic is

\[F={SSE_R-SSE_F/d\over SSE_F/(n-k-1)}\sim F_{d,n-k-1} \]

if \(F>F_{d-,n-k-1}(\alpha)\), regect \(H_0\)

  • \(SSE_F\): SSE under the full model

  • \(SSE_R\): SSE under the reduced model: \(Y=X\beta+\epsilon, C\beta=0\)

\[SSE_R-SSE_F=b^TC^T(C(X^TX)^{-1}C^T)^{-1}Cb \]

Model Seletion

\[C_p={SSE\over s^2}-(n-2p+1)\\ AIC=n\log(SSE(p))+2p \]

Logistic Regression

Model: \(y_1,..y_n\) iid. binary observations \(y_i\)=1with prob. \(p_i\) and \(=0\) with prob. \(1-p_i\)

\[logit(p_i)=\log{p_i\over 1-p_i}=\beta^Tx_i\\ p_i={\exp(\beta^T x_i)\over 1+\exp(\beta^T x_i)} \]

MLE for \( p_i\)(as for \(\beta\)):

\[l(\beta)=\sum y_i\log(p_i)+(1-y_i)\log(1-p_i)\\ \hat\beta=\arg\max_\beta l(\beta) \]

Generalized linear model

GLM:

  • \(y_i\) : random component from a particular dist. in the exponential family

\[g(EY)=X\beta \]

  • \(X\beta\): linear predictor

  • \(g\): link func.

Exponential family

\[f_\lambda(x)=e^{\lambda x-r(\lambda)}f_0(x), \lambda\in\Lambda \]

Normal, Poisson, Binary, Gamma

e.g.: Gamma dist:

\[f(x)={\beta^\alpha\over \Gamma(\alpha)}x^{\alpha-1}e^{-\beta x}, x>0\\ =\exp(\alpha \log(\beta)+(\alpha\log(x)-\beta x-\log\Gamma(x)))={x^{\alpha-1}\over\Gamma(\alpha)}e^{-\beta x +\alpha\log\beta} \]

Mean and variance

Moment generating func.

\[\varphi(t)=Ee^{tX}\\ \varphi'(t)|_{t=0}=Ee^{tX}X|_{t=0}=EX\\ \varphi''(t)|_{t=0}=Ee^{tX}X^2|_{t=0}=EX^2\\ \]

\[Var(X)=\varphi''(0)=\varphi'(0)^2 \]

For \(X\sim\) exponential family, \(EX=r'(\lambda), Var(X)=r''(\lambda)\)

  • link func \(g(u)=u\) identity + normal response , GLM reduces to linear regression;
  • link func \(g(u)=logit(u)\) + binary response, GLM reduces to logistic regression

A key idea in GLM is respresent \(\lambda_1,...,\lambda_n\) as a linear equation

\[\lambda=X\beta\\ \lambda_i=x_i^T\beta \]

density func. of \(y\) is

\[\prod f_{\lambda_i}(y_i)=\prod e^{\lambda_iy_i-r(\lambda_i)}f_0(y_i)\\ =\exp(\sum \lambda_i y_i-r(\lambda_i))\prod f_o(y_i) \]

\[\sum \lambda_i y_i=\lambda^Ty=\beta^TX^Tz=\beta^Tz\\ \sum r(\lambda_i)=\sum r(x_i^T\beta) \]

\[\prod f_{\lambda_i}(y_i)=\exp(\beta^Tz-\sum r(x_i^T \beta))\prod f_0(y_i)\\ =\exp(\beta^Tz-\varphi(\beta))f_0(y) \]

where \(\varphi(\beta)=\sum r(x_i^T\beta), f_0(y)=\prod f_0(y_i)\)

\[\varphi'(\beta)=\sum r'(x_i^T\beta)x_i=\sum r'(\lambda_;)x_i=(x_1,...,x_n)(r'(\lambda_1),..,r'(\lambda_n))^T=X^T\mu(\beta) \]

loglikelihood:

\[l_\beta(y)=\beta^Tz-\varphi(\beta)+\log f_0(y)=\beta^TX^Ty-\varphi(\beta)+c\\ l_\beta'(y)=X^Ty-X^T\mu(\beta) \]

then, MLE of \(\beta\), denoted by \(\hat\beta\) satisfies

\[X^T(y-\mu(\hat\beta))=0 \]

ANOVA (one-way)

Model: \(y_{ij}=\mu_i+\epsilon_{ij}\), \(\epsilon_{ij}\overset{iid}{\sim}N(0,\sigma^2)\)

for \(i=1,...,r\) \(\sum_{i=1}^r n_i=n\)

Estimation

LSE, MLE for \(\mu_i\): \(\hat\mu_i=\bar y_{i.}\)

unbiaesd estimaor for \(\sigma^2\): MSE

Inference

Hyphysis: \(H_0: \mu_1=...=\mu_r\)

SS degree of freedom mean
\(SST=\sum_i\sum_j(y_{ij}-\bar y_{..})^2\) \(df=n-1\)
\(S_e=\sum_i\sum_j(y_{ij}-\bar y_{i.})^2\) \(df=n-r\) \(MSE\)
\(S_A=\sum_i\sum_j(\bar y_{i.}-\bar y_{..})^2\) \(df=r-1\) \(MSA\)
\(SST=S_A+S_e\)

Test statistic:

\[F={MSA\over MSE}\sim F_{r-1, n-r} \]

Reject \(H_0\), when \(F>F_{r-1,n-r}(\alpha)\)

Two-factor ANOVA

Model (sample level)

\(y_{ijk}=\mu_{ij}+\epsilon_{ijk}\), \(\epsilon_{ijk}\overset{iid}{\sim}N(0,\sigma^2)\)

Cell mean: \(y_{ijk}=\mu+\alpha_i+\beta_i+r_{ij}+\epsilon_{ijk}\)

Estimation

Inference (with interaction)

\(r_{ij}\ne0\)

\[\sum_{ijk}(y_{ijk}-\bar y_{...})^2=\sum_{ijk}(y_{ijk}-\bar y_{ij.})^2+\sum_{ijk}(\bar y_{ij.}-\bar y_{...})^2\\ SST=S_e+S_{AB}\\ S_{AB}=SSA+SSB+SSAB\\ \]

\[SSA=\sum_{ijk}(\bar y_{i..}-\bar y_{...})^2\\ SSB=\sum_{ijk}(\bar y_{.j.}-\bar y_{...})^2\\ SSAB=\sum_{ijk}(\bar y_{ij.}-\bar y_{i..}-\bar y_{.j.}+\bar y_{...})^2\\ \]

Revised: \(Y_{ijk}=\mu+\alpha_i+\beta_j+\epsilon_{ijk}\)

SS Full Revised
\(S_{AB}\) \(df=ab-1\) $ df=a+b-2$
\(SSA\) \(df=a-1\) \(df=a-1\)
\(SSB\) \(df=b-1\) \(df=b-1\)
\(SSAB\) \(df=(a-1)(b-1)\)
\(S_e\) \(df=ab(n-1)\) \(df=abn-a-b+1\)
\(SST\) \(df=abn-1\) \(df=abn-1\)

Test for interaction:

\[F={MSAB\over MSE}\sim F_{(a-1)(b-1), ab(n-1)} \]


对于最后的双因子 ANOVA 再说明一点:上面给出了在有无交互作用下的方差分解及其自由度。对于交互作用/主效应的假设检验;或者均值/各均值的线性组合的 CI,他们的思路都是一样的。在这里只给出了对于交互作用的假设检验统计量;对于其他的各量来说,方法都是一样的:找出对于量的分布(如 A/B 的主效应为相应自由度的卡方分布,均值则为一定方差的正态分布),该分布中有为知参数\(\sigma^2\),再借助 \(s^2=MSE=S_e/df\) 将其消去,得到相应的 t 分布或 F 分布,再根据问题要求(假设检验或 CI)进行求解。唯一需要注意的是,要看清两因子之间是否有交互作用,这决定了 \(MSE\) 的自由度。

posted @ 2020-01-06 22:20  Easonshi  阅读(365)  评论(0编辑  收藏  举报