3. Linear Models

"All models are wrong but some are useful" - George Box

Notation
xi=(1,xi,1,,xi,d)
b=(b1,b2,,bd+1)

f(xi)=xibRSS(b)=1ni=1n(yixib)2=||YXb||22b^=argminbRd+1RSS(b)=(XX)1XTY

Using f^(xi) instead this is analogous to our training error

If not invertible no unique solution


Compare to knn bias-variance and Validation


TSS:= 1n(yiy¯)2 clueless training error
Basically the training error if we used f^(x)=y¯

R2=1RSS(b^)TSS review of r2

A small R2 does not necessarily mean that there is no linear relationship, or that there is no relationship. Large doesn't mean the relationship is linear.

0R21

Proof

R2=1RSS(^)TSSR20RSS(^)TSS1TSS=RSS((y¯,0,0,,0))minRd+1RSS()=RSS(^)
Coefficients:  
Estimate Std. Error t value Pr(>|t|)  
(Intercept) 2.938889 0.311908 9.422 <2e-16 ***  
TV 0.045765 0.001395 32.809 <2e-16 ***  
radio 0.188530 0.008611 21.893 <2e-16 ***  
newspaper -0.001037 0.005871 -0.177 0.86  

Coefficients: information about b^ in linear regression model.
(Intercept): information about ˆb1, the intercept
TV: information the coefficient of predictor TV
Estimate: value of b^1, b^2etc

t-value: value of t-test statistic. Pr(> |t|): p-value of test for t-test. This is a test for
H0 : bj=0, i.e. predictor xj not informative in the presence of other predictors

Residual Output Summary

Residuals:

Min1QMedian3QMax8.82770.89080.24181.18932.8292

Residual standard error:

1nd1i=1nε^i2

on 196 degrees of freedom

Multiple R-squared:

R2=0.8972(Adjusted R2=0.8956)

F-statistic:

F=570.3on 3 and 196 DF, p-value <2.2×1016

Key Definitions

ε^i:=yixiβ^

F-Test

Tests the null hypothesis:

H0:β2==βd+1=0

Interpretation: none of the predictors have a linear effect on the outcome.
further information in Regression Analysis

Interactions

f(xi)=b1+b2xi,1+b3xi,2+b4 xi,1xi,2

no longer linear