8. Intro to Multiple Linear Regression

Model

[y1y2yn]=[1x11x21x1j1x21x22 1xn1xn2xnj][β0β1βj]+[ε1ε2εj]

MLE Criterion

YXβ=εminβ(YXβ)T(YXβ)=minβεTεminβ(YTβTXT)(YXβ)minβ(YTYYTXaTββTXTYaβTXTXβ)Q

by derivative theorems

Qb=aa2AB=2XTY2XTXβ=0β^=(XTX)1XTYY^=Xβ^=X(XTX)1XTY=HYε^=YY^=YHY=(1H)Y

Key Properties for Hat Matrix

HT=[X(XTX)1XT]T=X[(XTX)T]1XT=HsymmetricHH=X(XTX)1XTX(XTX)1XT=X(XTX)1XT=Hidempotent

can now apply chi theorems
(IH) is also symmetric and idempotent

Expected Value of a Random Vector

ε=[ε1ε2εn],E[ε]:=[E(ε1)E(ε2)E(εn)]=[000]

Covariance Matrix of a Random Vector

Let Y1,Y2 be two random variables and μ1=E[Y1],μ2=E[Y2],σ12=V(Y1),σ22=V(Y2),σ122=cov(Y1,Y2)=E[(Y1μ1)(Y2μ2)]

Y=[Y1Y2] and μ=[μ1μ2][Y1μ1Y2μ2][Y1μ1,Y2μ2]=[(Y1μ1)2(Y1μ1)(Y2μ2)(Y2μ2)(Y1μ1)(Y2μ2)2]E[(Yμ)(Yμ)]=[E[(Y1μ1)2]E[(Y1μ1)(Y2μ2)]E[(Y2μ2)(Y1μ1)]E[(Y2μ2)2]]=[σ12σ122σ122σ22]

Useful Properties

A= matrix of constants, c= vector of constants, y= random vector

  1. E[Ay+c]=AE[y]+c, can show easily that expectation is a Linear Transformations and this property follows.
  2. cov[Ay+c]=Acov(y)AT
    =E[(Ay+cE(Ay+c))(Ay+cE(Ay+c))T]=E[(Ay+c(AE(y)+c))(Ay+c(AE(y)+c))T]
    =E[(AyAE(y))(AyAEy)T]=E[A(yE(y))(yE(y))TAT]=Acov(y)AT

So $$cov(Y)=cov(\varepsilon+X\beta)=cov[\varepsilon]=\begin{pmatrix}
\sigma^{2} & 0 & \dots & 0 \
0 & \sigma^{2}&\dots & 0 \
\vdots & \vdots & \ddots& \vdots \
0 & 0 & \dots & \sigma^{2}
\end{pmatrix}$$
Let W be a random matrix
3) E(tr(W))=trE(W)

Model Moments

Parameters

E[β^]=E[(XTX)1XTXβ]=Iβ=βCov(β^)=Cov((XTX)1XTY)=((XTX)1XT)Cov(Y)((XTX)1XT)T=σ2[XTX]1X=[1x1x¯1x2x¯1xnx¯]XTX=[n00Sxx][XTX]1=[1n001Sxx]σ2[XTX]1=[σ2n00σ2Sxx] for SLR

Error

E[ε^]=E[YY^]=E[Y]E[Y^]=XβE[Xβ^]=XβXE[β^]=XβXβ=0

Can do E[YHY]=E[(IH)Y]

Cov(ε^)=(IH)Cov(Y)(1H)T=(IH)σ2(IH)=(1H)σ2 since symmetric and idempotent

ε^=(IH)Y=(IH)(Xβ+ε)=Xβ+εHXβHε=Xβ+εXβHε=(IH)εE[ε^Tε^]=E[εT(IH)T(IH)ε]=E[εT(IH)ε](since IH is symmetric)=tr(E[(IH)εεT])=tr((IH)E[εεT]cov(ε))=tr((IH)σ2I)=σ2tr(IH)=σ2(n(p+1))

Say B=AE[εεT], then:
Bii=jAijE[εjεi]=jAijE[εiεj]

So:
tr(AE[εεT])=iBii=ijAijE[εiεj]=i,jAijE[εiεj]

Which is exactly the same as:
E[εTAε]=i,jAijE[εiεj]

Confidence Interval

For E(y0) at x0
Model: y^=Xn×(p+1)Tβ^(p+1)×n+εn

y0=β0+β1x0,1++βpx0,py0=[1,x0,1,x0,2,,x0,p][β0β1βp]β^=[XTX]1XTAY=AY=[β^0β^1β^p]Y0=X0Tβ+ε0,ε0N(0,σ2)Target: E(y0)=E[X0Tβ+ε]=X0TβEstimator: E(y0)^=X0Tβ^,which is a linear combination of normal β^is{E[X0Tβ^]=X0TE[β^]=X0Tβcov[[X0TX0Tβ^]]T=X0Tcov[β^]X0=σ2X0T(XTX)1X01×1=var(X0Tβ^)X0Tβ^X0TβσX0T(XTX)1X0N(0,1)

It can be shown that [n(p+1)]s2σ2χ2(df=n(p+1)) using theorem 16.4

X0Tβ^X0TβsX0T(XTX)1X0tn(p+1)

Confidence Interval

X0Tβ^±tα2,n(p+1)sX0T(XTX)1X0

Prediction Interval

Initial dataset (Y1,Y2,,Yn)
Target Y= new observation
Y=β0+β1x1++βpxp+ε=XTβ+ε
Prediction Y^=XTβ

Y=Y^+(YY^)prediction error

E(Y)=XTβ, cov(Y)=σ2 since there is only one error
E(Y^)=XTE(β^)=XTβ
cov(Y^)=XTcov(β^)(XT)T=XTσ2(XTX)1X=σ2XT(XTX)1X1×1=var(Y^)
E[YY^]=XTβXTβ=0
var[YY^]=indσ2+σ2XT(XTX)1X

YY^σ1+XT(XTX)XN(0,1)YY^s1+XT(XTX)Xt(df=n(p+1))