ANalysis Of VAriance (ANOVA) Appendix 1: Proofs
1 Noncentral Chi-square distribution
1 Noncentral Chi-square distribution with \(k\) degrees of freedom
If \(X_i \overset{i.i.d}{\sim} \mathcal{N}(\mu_i, \sigma_i^2), \forall i=1,2,\cdots,k\), then
\[Y = \sum_{i=1}^{k}\frac{X_i^2}{\sigma^2_i} \sim {\chi'}^2_k(\delta)
\]
where \(k\) is the degree of freedom; \(\delta\) is non-centrality parameter and
\[\delta = \sum_{i=1}^{k}\frac{\mu_i^2}{\sigma^2_i} = \mathbb{E}[Y] - k
\]
and \(\mathbb{E}[Y]\) is the expectation of the random variable \(Y\)
\[\mathbb{E}[Y] = k + \sum_{i=1}^{k}\frac{\mu_i^2}{\sigma^2_i}
\]
2 Noncentral \(F\) distribution with \((v_1,v_2)\) degrees of freedom and noncentral parameter
If \(X_1\) follows a noncentral Chi-square distribution with the noncentral parameter \(\delta\), and \(X_2\) follows a Chi-square distribution. i.e., \(X_1 \sim {\chi'}^2_{k_1}(\delta)\) and \(X_1 \sim {\chi'}^2_{k_2}\), then the following random variable \(F\) follows a noncentral \(F\) distribution:
\[F = \frac{X_1/k_1}{X_2/k_2} \sim{F'}_{k_1, k_2}(\delta)
\]
3 Type II error in ANOVA
\[\beta = \mathrm{Pr} \left(F_0 \leq F_{\alpha, a-1, N-a} \mid H_0 \text{ is false } \right)
\]
where
\[F_0 = \frac{ MS_{\text {Treatments }}}{MS_E}
= \frac{ \left(SS_{\text {Treatments }} / \sigma^2 \right) / (a-1)}{ \left(SS_E / \sigma^2 \right) / (N-a)}
\quad \sim \quad
{F'}_{a-1,N-a} \left(\frac{n}{\sigma^2} \sum_{i=1}^a \tau_i^2 \right)
\]
4 Relation between \(\Phi^2\) and \(\delta\)
\[\delta = \Phi^2 \cdot a
\]
2 Proofs
2.1 Fixed effect model
Proof of \(SS_T = SS_{\text{Treatments}} + SS_E\)
\[\begin{aligned}
SS_T &=
\sum_{i=1}^a \sum_{j=1}^n ( y_{i j}-\bar{y}_{i \cdot}+\bar{y}_{i \cdot}-\bar{y}_{\cdot \cdot} )^2
\\
&= \sum_{i=1}^a \sum_{j=1}^n \left[
( \bar{y}_{i \cdot} - \bar{y}_{\cdot \cdot} )^2
+ 2 (\bar{y}_{i \cdot}-\bar{y}_{\cdot \cdot} ) (y_{i j}-\bar{y}_{i \cdot} )
+ (y_{i j}-\bar{y}_{i \cdot})^2
\right]
\\
&= \underbrace{\sum_{i=1}^a \sum_{j=1}^n ( \bar{y}_{i \cdot} - \bar{y}_{\cdot \cdot} )^2 }_{SS_{\text{Treatments}}}
+ 2 \sum_{i=1}^a \sum_{j=1}^n \left[ (\bar{y}_{i \cdot}-\bar{y}_{\cdot \cdot}) (y_{i j} - \bar{y}_{i \cdot} ) \right]
+ \underbrace{ \sum_{i=1}^a \sum_{j=1}^n ( y_{i j} - \bar{y}_{i \cdot} )^2 }_{SS_E}
\\
&= SS_{\text {Treatments }} + SS_E
+ 2 \sum_{i=1}^a \left\{ \left(\bar{y}_{i \cdot}-\bar{y}_{\cdot \cdot}\right) \times \left[\sum_{j=1}^n\left(y_{i j}-\bar{y}_{i \cdot}\right)\right] \right\}
\\
&=S S_{\text {Treatments }}+ SS_E \qquad
\text{because }\sum_{j=1}^n\left(y_{i j}-\bar{y}_{i \cdot}\right)=0
\end{aligned}
\]
Proof of \(SS_{\text{Treatments}} / \sigma^2 \sim \chi^2_{a-1}\)
Proof: \(SS_{\text{Treatments}} / \sigma^2 \sim \chi^2_{a-1}\) if \(H_0\) is true (i.e., \(\tau_1 = \tau_2 = \cdots = \tau_a\) or \(\mu_1 = \mu_2 = \cdots = \mu_a\))
Assume \(H_0\) is true, we have \(y_{ij} \overset{i.i.d.}{\sim} \mathcal{N} \left(\mu, \sigma^2 \right)
\). Thus, we have
\[\bar{y}_{i \cdot} = \frac{1}{n} \sum_{j=1}^{n} y_{ij}
\quad \overset{i.i.d.}{\sim} \quad
\mathcal{N} \left(\mu, \frac{\sigma^2}{n} \right)
\]
Then let
\[z_i = \frac{\bar{y}_{i \cdot} - \mu}{\sqrt{\sigma^2 / n }} \quad \overset{i.i.d.}{\sim} \quad
\mathcal{N}(0, 1)
\]
Consider
\[\begin{aligned}
SS_{\text {Treatmens }}
&= n \sum_{i=1}^a (\bar{y}_{i \cdot}-\bar{y}_{\cdot \cdot} )^2
\\
&= n \sum_{i=1}^a (\bar{y}_{i \cdot} - \mu + \mu - \bar{y}_{\cdot \cdot} )^2
\\
&= n \left[ \sum_{i=1}^a (\bar{y}_{i \cdot}-\mu)^2
- 2 \sum_{i=1}^a [(\bar{y}_{i \cdot}-\mu) (\bar{y}_{\cdot \cdot} - \mu ) ]
+ \sum_{i=1}^a (\bar{y}_{\cdot \cdot} - \mu )^2\right]
\\
&= n \left[ \sum_{i=1}^a (\bar{y}_{i \cdot} - \mu )^2-a (\bar{y}_{\cdot \cdot}-\mu )^2 \right]
\\
&= n \sum_{i=1}^a\left(\bar{y}_{i \cdot}-\mu\right)^2-n a\left(\bar{y}_{\cdot \cdot}-\mu\right)^2
\end{aligned}
\]
Thus,
\[\begin{aligned}
& \frac{n }{\sigma^2} \sum_{i=1}^a (\bar{y}_{i \cdot}-\mu)^2
\\
=& \frac{n}{\sigma^2} \sum_{i=1}^a (\bar{y}_{i \cdot}-\bar{y}_{\cdot \cdot})^2
+ \frac{n a}{\sigma^2} \left(\bar{y}_{\cdot \cdot}-\mu \right)^2
\\
=& \underbrace{ \frac{n}{\sigma^2} \sum_{i=1}^a \left[
(\bar{y}_{i \cdot} - \mu)
- \frac{1}{a} \sum \limits_{i=1}^a (\bar{y}_{i \cdot} - \mu ) \right]^2}_{SS_{\text {Treatmens }} / \sigma^2}
+ \frac{n}{a \sigma^2} \left[\sum_{i=1}^a (\bar{y}_{i \cdot}-\mu ) \right]^2
\\
=& \underbrace{ \sum_{i=1}^{a} (z_i-\bar{z})^2}_{SS_{\text {Treatmens }} / \sigma^2}
+ \frac{1}{a} \left(\sum_{i=1}^a z_i \right)^2 = \sum_{i=1}^{a} z_i^2
\\
=& \left[z_1, z_2, \cdots, z_a\right]
\times \left[\mathbf{I}_{a \times a} - \frac{1}{a} \mathbf{1}_{a \times a} \right] \times \left[ \begin{array}{l}
z_1 \\ z_2 \\ \vdots \\ z_a \end{array} \right]
+ \left[z_1, z_2, \cdots, z_a\right] \times \left[\frac{1}{a} \mathbf{1}_{a \times a} \right] \times \left[\begin{array}{l} z_1 \\ z_2 \\ \vdots \\ z_a \end{array} \right]
\end{aligned}
\]
where \(\bar{z} = \frac{1}{a} \sum_{i=1}^{a} z_i\) and matrix \(\mathbf{I}_{a \times a}\) is a \(a \times a\) identity matrix; \(\mathbf{1}_{a \times a}\) is a \(a \times a\) all-ones matrix.
Thus \(\text{Rank}\left[\mathbf{I}_{a \times a} - \frac{1}{a} \mathbf{1}_{a \times a} \right] = a - 1\) and \(\text{Rank}\left[\frac{1}{a} \mathbf{1}_{a \times a} \right]=1\)
Then, according to the Cochran's theorem, we have
\[\frac{SS_{\text{Treatments }}}{\sigma^2}
= \sum_{i=1}^a \left(z_i - \bar{z} \right)^2 \sim \chi_{a-1}^2
\]
Proof of \(SS_E/\sigma^2 \sim \chi^2_{N-a}\)
To be completed...
Proof of \(\mathbb{E}[MS_E]=\sigma^2\)
Proof: \(MS_E\) is an unbiased estimation of \(\sigma^2\), i.e.,
\[\mathbb{E}[MS_E]=\mathbb{E} \left[\frac{SS_E}{N-a} \right]=\sigma^2
\]
\[\begin{aligned}
\mathbb{E} \left[ MS_E \right] &= \mathbb{E} \left[ \frac{S S_E}{N-a} \right]
= \frac{1}{N-a} \cdot \mathbb{E} \left[\sum_{i=1}^a \sum_{j=1}^n \left(y_{i j} -\bar{y}_{i \cdot}\right)^2 \right] \\
& = \frac{1}{N-a} \cdot \mathbb{E} \left[
\sum_{i=1}^a \sum_{j=1}^n \left(y_{i j}^2-2 y_{i j} \bar{y}_{i \cdot} + \bar{y}_{i \cdot}^2 \right) \right] \\
& = \frac{1}{N-a} \cdot \mathbb{E} \left[
\sum_{i=1}^a \sum_{j=1}^n y_{i j}^2
- 2 n \sum_{i=1}^a \bar{y}_{i \cdot}^2
+ n \sum_{i=1}^a \bar{y}_{i \cdot}^2 \right] \\
& = \frac{1}{N-a} \cdot \mathbb{E} \left[
\sum_{i=1}^a \sum_{j=1}^n y_{i j}^2
- \frac{1}{n} \sum_{i=1}^a y_{i \cdot}^2 \right] \\
& =\frac{1}{N-a} \cdot \mathbb{E} \left[
\sum_{i=1}^a \sum_{j=1}^n \left(\mu+\tau_i+\varepsilon_{i j}\right)^2
- \frac{1}{n} \sum_{i=1}^a \left(\sum_{j=1}^n \left(\mu + \tau_i + \varepsilon_{i j} \right) \right)^2 \right] \\
& = \frac{1}{N-a} \cdot \mathbb{E} \left[n \sum_{i=1}^a \left(\mu + \tau_i \right)^2 + N \sigma^2 - n \sum_{i=1}^a \left(\mu + \tau_i \right)^2 - a \sigma^2 \right] \\
& = \sigma^2
\end{aligned}
\]
Proof of \(\mathbb{E}[MS_{\text{Treatmeats}}]=\sigma^2+\frac{n}{a-1}\sum_{i=1}^{a}\tau_i^2\)
Proof: Under the assumption \(\sum_{i=1}^{a}\tau_i=0\), we have
\[\mathbb{E}[MS_{\text{Treatmeats}}] = \sigma^2+\frac{n}{a-1}\sum_{i=1}^{a}\tau_i^2
\]
First, we have
\[\begin{split}
MS_{\text {Treatments }} &= \frac{n}{a-1} \sum_{i=1}^a\left(\bar{y}_{i \cdot} - \bar{y}_{\cdot \cdot}\right)^2 \\
&= \frac{n}{a-1} \left[ \sum_{i=1}^a \bar{y}_{i \cdot}^2
- 2 \sum_{i=1}^a \bar{y}_{i \cdot} \, \bar{y}_{\cdot \cdot}
+ \sum_{i=1}^a \bar{y}_{\cdot \cdot}^2 \right] \\
&=\frac{n}{a-1} \left[ \sum_{i=1}^a \bar{y}_{i \cdot}^2- a \bar{y}_{\cdot \cdot}^{2} \right]
\end{split}
\]
Then,
\[\begin{split}
\mathbb{E} \left[ M S_{\text {Treatment }} \right]
&= \frac{n}{a-1} \ \mathbb{E} \left[ \sum_{i=1}^a \bar{y}_{i \cdot}^2-a \bar{y}_{\cdot \cdot}^2 \right] \\
&= \frac{n}{a-1} \left( \sum_{i=1}^a \mathbb{E} \left[ \bar{y}_{i \cdot}^2 \right]
- a \, \mathbb{E} \left[ \bar{y}_{\cdot \cdot}^2 \right] \right) \\
&= \frac{n}{a-1} \left[ \sum_{i=1}^a \left(
\mathbb{E} \left[\bar{y}_{i \cdot}\right]^2 + \mathbb{D} \left[ \bar{y}_{i \cdot} \right]
\right)
- a \left( \mathbb{E} \left[ \bar{y}_{\cdot \cdot}\right]^2 + \mathbb{D} \left[ \bar{y}_{\cdot \cdot} \right] \right)
\right] \\
& = \frac{n}{a-1} \left\{
\sum_{i=1}^a \left[ \left(\mu+\tau_i\right)^2 + \frac{\sigma^2}{n} \right]
- a \left[ \left( \mu + \frac{ \sum_{i=1}^a \tau_i}{a}\right)^2 + \frac{\sigma^2}{a \cdot n} \right]
\right\} \\
& = \sigma^2 + \frac{n}{a-1} \left[ \sum_{i=1}^a \left( \mu+\tau_i\right)^2-a \mu^2 \right] \\
& = \sigma^2 + \frac{n}{a-1} \left( \sum_{i=1}^a \tau_i^2+2 \mu \sum_{i=1}^a \tau_i \right) \\
& = \sigma^2 + \frac{n}{a-1} \sum \limits_{i=1}^a \tau_i^2
\end{split}
\]
2.2 Random Effect Model