3. Linear Models

Regression Analysis

"All models are wrong but some are useful" - George Box

Notation
$x_{i} = (1, x_{i, 1}, \dots, x_{i, d})^{⊤}$
$b = (b_{1}, b_{2}, \dots, b_{d + 1})^{⊤}$

f (x_{i}) = x_{i}^{⊤} b

R S S (b) = \frac{1}{n} \sum_{i = 1}^{n} (y_{i} - x_{i}^{⊤} b)^{2} = | | Y - X b | |_{2}^{2}

\begin{aligned} \hat{b} & = {argmin}_{b \in R^{d + 1}} R S S (b) \\ = (X^{⊤} X)^{- 1} X^{T} Y \end{aligned}

Using $\hat{f} (x_{i})$ instead this is analogous to our training error

If not invertible no unique solution

Compare to knn bias-variance and Validation

TSS:= $\frac{1}{n} \sum_{}^{} (y_{i} - \bar{y})^{2}$ clueless training error
Basically the training error if we used $\hat{f} (x) = \bar{y}$

$R^{2} = 1 - \frac{R S S (\hat{b})}{T S S}$ review of r2

A small $R^{2}$ does not necessarily mean that there is no linear relationship, or that there is no relationship. Large doesn't mean the relationship is linear.

0 \leq R^{2} \leq 1

Proof

\begin{aligned} R^{2} & = 1 - \frac{R S S (\hat{ℓ})}{T S S} \\ R^{2} & \geq 0 ⟺ \frac{R S S (\hat{ℓ})}{T S S} \leq 1 \\ T S S = R S S ((\bar{y}, 0, 0, \dots, 0)^{⊤}) \\ \geq min_{ℓ \in R^{d + 1}} R S S (ℓ) = R S S (\hat{ℓ}) \end{aligned}

Coefficients:  
Estimate Std. Error t value Pr(>|t|)  
(Intercept) 2.938889 0.311908 9.422 <2e-16 ***  
TV 0.045765 0.001395 32.809 <2e-16 ***  
radio 0.188530 0.008611 21.893 <2e-16 ***  
newspaper -0.001037 0.005871 -0.177 0.86

Coefficients: information about $\hat{b}$ in linear regression model.
(Intercept): information about ˆb1, the intercept
TV: information the coefficient of predictor TV
Estimate: value of ${\hat{b}}_{1}$ , ${\hat{b}}_{2}$ etc

t-value: value of t-test statistic. Pr(> |t|): p-value of test for t-test. This is a test for
H0 : $b_{j} = 0$ , i.e. predictor xj not informative in the presence of other predictors

Residual Output Summary

Residuals:

\begin{array}{ccccc} Min & 1Q & Median & 3Q & Max \\ - 8.8277 & - 0.8908 & 0.2418 & 1.1893 & 2.8292 \end{array}

Residual standard error:

\sqrt{\frac{1}{n - d - 1} \sum_{i = 1}^{n} {\hat{ε}}_{i}^{2}}

on 196 degrees of freedom

Multiple $R$ -squared:

R^{2} = 0.8972 (Adjusted R^{2} = 0.8956)

F-statistic:

F = 570.3 on 3 and 196 DF, p-value < 2.2 \times 10^{- 16}

Key Definitions

Residuals:

{\hat{ε}}_{i} := y_{i} - x_{i}^{⊤} \hat{β}

Residual standard error:
Scaled square root of training error (formula above).
Multiple $R^{2}$ :
Proportion of variance explained by the model.

F-Test

Tests the null hypothesis:

H_{0} : β_{2} = \dots = β_{d + 1} = 0

Interpretation: none of the predictors have a linear effect on the outcome.
further information in Regression Analysis

Interactions

f (x_{i}) = b_{1} + b_{2} x_{i, 1} + b_{3} x_{i, 2} + b_{4} {\underset{―}{x_{i, 1} x_{i, 2}}}_{}

no longer linear