4. Centred Model & Anova Derivations

Expectation & Variance of Estimators

Model: $y_{i} = β_{0} + β_{1} x_{i} + ε_{i}$

For Slope Parameter

We know:

{\hat{β}}_{1} = \frac{S_{X Y}}{S_{X X}}

WTS:

{\hat{β}}_{1} = \sum_{}^{} c_{i} y_{i}

where $c_{i} = \frac{x_{i} - \bar{x}}{S_{X X}}$ in unbiased

Lemma:

\begin{aligned} \sum_{}^{} (x_{i} - \bar{x}) (y_{i} - \bar{y}) & = \sum_{}^{} (x_{i} - \bar{x}) y_{i} - \sum_{}^{} (x_{i} - \bar{x}) \bar{y} \\ (1) & n o t e : \sum_{}^{} (x_{i} - \bar{x}) = \sum_{}^{} x_{i} - n \bar{x} = 0 \\ = \sum_{}^{} (x_{i} - \bar{x}) y_{i} \end{aligned}

since $y_{i} \sim normal ⟹ {\hat{β}}_{1} \sim normal$

\begin{aligned} E [{\hat{β}}_{1}] & = E [\sum_{}^{} c_{i} y_{i}] \\ = \sum_{}^{} E [\frac{y_{i} (x_{i} - \bar{x})}{S_{X X}}] \\ (c_{i} constant) & = \sum_{}^{} (\frac{x_{i} - \bar{x}}{S_{X X}}) E [y_{i}] \\ = \sum_{}^{} (\frac{x_{i} - \bar{x}}{S_{X X}}) (β_{0} + β_{1} x_{i}) \\ = \frac{β_{0}}{S_{X X}} \sum_{}^{} {(x_{i} - \bar{x})}^{= 0 by (1)} + \frac{β_{1}}{S_{X X}} \sum_{}^{} (x_{i} - \bar{x}) x_{i} \\ = \frac{β_{1}}{S_{X X}} \sum_{}^{} x_{i}^{2} - n {\bar{x}}^{2} \\ (unbiased) & = β_{1} \\ V a r [{\hat{β}}_{1}] & = V a r [\sum_{}^{} c_{i} y_{i}] \\ = \sum_{}^{} c_{i}^{2} V a r [y_{i}] \\ = σ^{2} \sum_{}^{} {[\frac{x_{i} - \bar{x}}{S_{X X}}]}^{2} \\ = \frac{σ^{2}}{S_{X X}^{2}} \sum_{}^{} (x_{i} - \bar{x})^{2} \\ = \frac{σ^{2}}{S_{X X}} \end{aligned}

{\hat{β}}_{1} = \sum_{}^{} \frac{x_{i} - \bar{x}}{S_{x x}} y_{i} = \sum_{}^{} c_{i} y_{i} \sim N (β, \frac{σ^{2}}{S_{x x}})

Thus,

\begin{array}{r} \frac{{\hat{β}}_{1} - β_{1}}{\frac{σ}{\sqrt{S_{x x}}}} \sim N (0, 1) \\ \frac{S_{x x} ({\hat{β}}_{1} - β_{1})^{2}}{σ^{2}} \sim χ_{1}^{2} \end{array}

General Testing Parameters

Assumptions

F_{*} = \frac{⟨ y, u_{2} ⟩^{2}}{\frac{⟨ y, u_{3} ⟩^{2} + \dots + ⟨ y, u_{n} ⟩^{2}}{n - 2}} \sim F (1, n - 2)

by 16.5

and that $s^{2} = \frac{\sum_{}^{} (y_{i} - {\hat{y}}_{i})^{2}}{n - 2} = \frac{S S E}{n - 2}$ is unbiased

Centred Model

traditionally y_{i} = β_{0} \cdot (\begin{matrix} 1 \\ 1 \\ 1 \\ ⋮ \\ 1 \end{matrix}) + β_{1} x_{i} + ε_{i}

take w_{1} = (\begin{matrix} 1 \\ 1 \\ 1 \\ ⋮ \\ 1 \end{matrix}), w_{2} = (\begin{matrix} x_{1} - \bar{x} \\ x_{2} - \bar{x} \\ ⋮ \\ x_{n} - \bar{x} \end{matrix})

centred model y_{i}^{*} = β_{0}^{*} w_{1} + β_{1}^{*} w_{2} + ε_{i}^{*}

note: $β_{0}^{*} w_{1}$ and $β_{1}^{*} w_{2} + ε_{i}^{*}$ will be orthogonal

Estimates:

\begin{aligned} {\hat{y}}_{i} & = {\hat{β}}_{0} + \underset{―}{{\hat{β}}_{1} x_{i}} = (\bar{y} - {\hat{β}}_{1} \bar{x}) + {\hat{β}}_{1} x_{i} \\ = \bar{y} + {\hat{β}}_{1} (x_{i} - \bar{x}) \\ = {\hat{β}}_{0}^{*} + {\hat{β}}_{1}^{*} (x_{i} - \bar{x}) = {\hat{y}}_{i}^{*} \end{aligned}

So they are equivalent models

Centered Model LSE

\begin{aligned} min_{β_{0}^{*}, β_{1}^{*}} ⟨ y_{-} \hat{y}, y - \hat{y} ⟩ \\ \frac{\partial Q}{\partial β_{0}^{*}} & = - 2 (\sum_{}^{} y_{i} - n β_{0}^{*} - β_{1}^{*} \sum_{}^{} {(x_{i} - \bar{x})}^{0}) = 0 \\ = \sum_{}^{} y_{i} - n β_{0}^{*} = 0 \\ ⟹ {\hat{β}}_{0}^{*} = \bar{y} \end{aligned}

\begin{aligned} \frac{\partial Q}{\partial β_{1}^{*}} & = - 2 (\sum_{}^{} (x_{i} - \bar{x}) y_{i} - β_{0}^{*} \sum_{}^{} (x_{i} - \bar{x}) - β_{1}^{*} \sum_{}^{} (x_{i} - \bar{x})^{2}) = 0 \\ {\hat{β}}_{1}^{*} & = \frac{\sum_{}^{} (x_{i} - \bar{x}) y_{i}}{\sum_{}^{} (x_{i} - \bar{x})^{2}} \end{aligned}

Anova Derivation

\begin{aligned} (\begin{array}{c} y_{1} \\ y_{2} \\ ⋮ \\ y_{n} \end{array}) = β_{0} (\begin{array}{c} 1 \\ 1 \\ ⋮ \\ 1 \end{array}) + β_{1} (\begin{array}{c} x_{1} - \bar{x} \\ x_{2} - \bar{x} \\ ⋮ \\ x_{n} - \bar{x} \end{array}) + (\begin{array}{c} ε_{1} \\ ε_{2} \\ ⋮ \\ ε_{n} \end{array}) \\ (\begin{array}{c} y_{1} \\ y_{2} \\ ⋮ \\ y_{n} \end{array}) = \sqrt{n} β_{0} \underset{u_{1}}{(\begin{array}{c} \frac{1}{\sqrt{n}} \\ \frac{1}{\sqrt{n}} \\ ⋮ \\ \frac{1}{\sqrt{n}} \end{array})} + \sqrt{S_{x x}} \cdot β_{1} \underset{u_{2}}{(\begin{array}{c} \frac{x_{1} - \bar{x}}{\sqrt{S_{x x}}} \\ \frac{x_{2} - \bar{x}}{\sqrt{S_{x x}}} \\ ⋮ \\ \frac{x_{n} - \bar{x}}{\sqrt{S_{x x}}} \end{array})} + (\begin{array}{c} ε_{1} \\ ε_{2} \\ ⋮ \\ ε_{n} \end{array}) \end{aligned}

to get an orthonormal basis ${u_{1}, u_{2}, \dots, u_{n}}$

\begin{aligned} y = ⟨ y, u_{1} ⟩ u_{1} + ⟨ y, u_{2} ⟩ u_{2} + \underset{Residual}{\underset{―}{⟨ y, u_{3} ⟩ u_{3} \dots + ⟨ y, u_{n} ⟩ u_{n}}} \\ also & | | y | |^{2} = \sum_{i}^{n} ⟨ y, u_{i} ⟩^{2}, since orthonormal \end{aligned}

\begin{aligned} ⟨ y, u_{2} ⟩ = \sqrt{S_{x x}} {\hat{β}}_{1} \\ ⟹ \frac{1}{\sqrt{S_{x x}}} \sum_{}^{} (x_{i} - \bar{x}) y_{i} = \sqrt{S_{x x}} {\hat{β}}_{1} \\ ⟹ {\hat{β}}_{1} = \frac{\sum_{}^{} (x_{i} - \bar{x}) y_{i}}{S_{x x}} \end{aligned}

\begin{aligned} {\hat{β}}_{1} = \frac{1}{\sqrt{S_{x x}}} \frac{(\sum_{}^{} (x_{i} - \bar{x}) y_{i})}{\sqrt{S_{x x}}} = \frac{1}{\sqrt{S_{x x}}} ⟨ y, u_{2} ⟩ \\ ⟹ {\hat{β}}_{1}^{2} S_{x x} = ⟨ y, u_{2} ⟩^{2} = S S R \\ y & = \bar{y} (\begin{array}{c} 1 \\ 1 \\ ⋮ \\ 1 \end{array}) + {\hat{β}}_{1} (\begin{array}{c} x_{1} - \bar{x} \\ ⋮ \\ x_{n} - \bar{x} \end{array}) + ε_{i} \\ ⟹ (\begin{array}{c} y_{1} - \bar{y} \\ y_{2} - \bar{y} \\ ⋮ \\ y_{n} - \bar{y} \end{array}) = {\hat{β}}_{1} (\begin{array}{c} x_{1} - \bar{x} \\ ⋮ \\ x_{n} \end{array}) + ε_{i} \end{aligned}

\begin{aligned} ⟨ \vec{y} - \bar{y}, \vec{y} - \bar{y} ⟩ & = ⟨ β_{1} (x - \bar{x}) + ε, β_{1} (x - \bar{x}) + ε ⟩ \\ = ⟨ β_{1} (x - \bar{x}), β_{1} (x - \bar{x}) ⟩ + 2 {⟨ β_{1} (x - \bar{x}), ε ⟩}^{0, orthogonal} + ⟨ ε, ε ⟩ \end{aligned}

| | y - \bar{y} | |^{2} = {\hat{β}}_{1}^{2} | | x - \bar{x} | |^{2} + | | Residual Vector | |^{2}

\begin{aligned} y = ⟨ y, u_{1} ⟩ u_{1} + \dots + ⟨ y, u_{n} ⟩ u_{n} \\ | | y - \hat{y} | |^{2} = ⟨ y, u_{3} ⟩^{2} + \dots + ⟨ y, u_{n} ⟩^{2} = \sum_{}^{} y - (u_{n})^{2} \end{aligned}

SST = SSR + SSE $⟹ S S T = \sum_{}^{} (y_{i} - \bar{y})^{2} = (n - 1) \cdot Sample Variance of y$

S_{X X} = | | x - \bar{x} | |^{2} = (n - 1) Sample var. of x

⟹ S S E = S S T - S S R = S_{Y Y} - {\hat{β}}_{1}^{2} S_{x x} = | | Residual vector | |^{2}

F-Test Restated

F_{*} = \frac{\frac{{\hat{β}}_{1}^{2} S_{x x}}{| | Residual Vector | |^{2}}}{n - 2} = \frac{\frac{S S R}{S S E}}{n - 2}

Anova

\begin{array}{ccc} Source & d f & S S & M S & F * \\ Regression & 1 & S S R = {\hat{β}}_{1}^{2} S_{x x} & M S R = \frac{S S R}{1} & \frac{M S R}{M S E} \\ Error & n - 2 & S S E = S S T - S S R & M S E = \frac{S S E}{n - 2} \\ Total & n - 1 & S_{y y} = S S T \end{array}

P value of rejection region $P [F_{(1, n - 2)} > F *]$

Example

The following data show the brand, price ($), and the overall score for six
stereo headphones that were tested by Consumer Reports. The overall
score is based on sound quality and effectiveness of ambient noise
reduction. Scores range from 0 (lowest) to 100 (highest).

\begin{array}{ccccccc} Brand & Bose & Skullcandy & Koss & Phillips & Denon & JVC \\ Price (x) & 180 & 150 & 95 & 70 & 70 & 35 \\ Score (y) & 76 & 71 & 61 & 56 & 40 & 26 \end{array}

Need to find $\bar{x}, \bar{y}, S_{x x}, S_{x y}, S_{y y}, \sum_{}^{} x_{i} y_{i}$

\bar{x} = 100, \bar{y} = 55, S_{x x} = 14, 950, S_{y y} = 1800, S_{x y} = 4755

{\hat{β}}_{1} = \frac{S_{x y}}{S_{x x}} = 0.3180

S S R = {\hat{β}}_{1}^{2} S_{x x} = 1512.376

S S T = S_{y y} = 1800

S S E = S S T - S S R = 287.264

F^{*} = \frac{S S R}{\frac{S S E}{4}} = 21.0327

P-value = $P [F (1, 4) > F_{*} = 21.0327]$
$⟹ 0.01 < p-value<0.025$

Rejection region $R R = {F such that F > F_{0.05} (1, 4) = 7.71}$
and $F^{*} \in R R ⟹$ reject $H_{0}$ at the 5% significant level.

Hypothesis Testing Summary

Let

\begin{matrix} (16.4,5) & W_{n - 2} = \frac{1}{σ^{2}} \sum_{3}^{n} ⟨ y, u_{i} ⟩^{2} = \frac{1}{σ^{2}} | | y - \hat{y} | |^{2} \sim χ_{n - 2}^{2} \end{matrix}

\begin{aligned} ⟹ \frac{1}{σ^{2}} E [\sum_{}^{} [y_{i} - \hat{y}]^{2}] = n - 2 \\ ⟹ E [\frac{\sum_{}^{} [y_{i} - \hat{y}]^{2}}{n - 2}] = σ^{2} \end{aligned}

Therefore $\frac{S S E}{n - 2}$ is unbiased for $σ^{2}$

\begin{matrix} (1) & W_{n - 2} = \frac{n - 2}{σ^{2}} \frac{S S E}{n - 2} = \frac{n - 2}{σ^{2}} s^{2} \end{matrix}

by the var and expectation of Beta1

\frac{\sqrt{S_{x x}} ({\hat{β}}_{1} - β_{1})}{σ} \sim N (0, 1)

⟹ \frac{\frac{\sqrt{S_{x x}} ({\hat{β}}_{1} - β_{1})}{σ}}{\sqrt{\frac{(n - 2) s^{2}}{(n - 2) σ^{2}}}} = \frac{\sqrt{S_{x x}} ({\hat{β}}_{1} - β_{1})}{s} \sim t_{n - 2}

note $(t_{d f})^{2} \sim F_{d f}$

⟹ t_{*} = \frac{{\hat{β}}_{1}}{\frac{s}{\sqrt{s_{x x}}}}

Types of Tests

For two-sided alternatives $H_{0} : β_{1} = 0, H_{a} : β_{1} \neq 0$

F_{*} = \frac{{\hat{β}}_{1}^{2} S_{x x}}{\frac{S S E}{n - 2}} = t_{*}^{2}

For $H_{0} : β_{1} = β_{10}, H_{a} : β_{1} \neq β_{10}, β_{1} > β_{10}, β_{1} < β_{10}$

t_{*} = \frac{\sqrt{S_{x x}} ({\hat{β}}_{1} - β_{10})}{s}