9. MLR Hypothesis Testing

SLR Review

Recall our earlier F-tests for anova T tests with respect to correlation and our parameters
Where F=MSRMSE=(n2)r21r2

MLR

εn×1Nn(0n×1,σ2In×n)

Overall Test

yi=XiTβ+εi=β0+xi,1β1+xi,2β2++xi,pβp

H0:β1=β2==βp=0, Ha: at least one βi0
SST=SSR+SSE

R2SSRSSTF=MSRMSE=SSRpSSEn(p+1)=n(p+1)pSSRSSEor F=n(p+1)pSSRSSTSSESST=n(p+1)pR21R2

where df numerator = p, df denominator = n(p+1)

(remember SSR=SSYSSE)

RR = {F such that F>Fα,p,n(p+1)}

Addition of a Group of Variables

F=(SSE(R)SSE(C)(complete parametersreduced parameters))SSE(C)ncomplete parameters 

R = reduced , C = complete
Ha: At least one new parameter contributes information
RR = {F>Fα,CR,nC}

Example Comparing Means

Bilirubin is formed in the liver, where haemoglobin and other haemoproteines are decomposed into bile pigments. Bilirubin is partly reabsorbed by the intestine, and returns to the liver. If the liver has suffered degeneration, if the decomposition of haemoglobin is elevated, or if the gall bladder has been destroyed, bilirubin can accumulate to high levels in the blood, leading to jaundice. Blood samples were taken from three young men at one-week intervals, and the concentration of bilirubin in the serum was measured. The measured concentrations are shown in the following Table.

Individual AIndividual BIndividual C142032202741233241273455273455
A=c(14,20,23,27,27);  
B=c(20,27,32,34,34);  
C=c(32,41,41,55,55);  
cbind(mean(A), mean(B),mean(C));  
##      [,1] [,2] [,3]  
## [1,] 22.2 29.4 44.8  
cbind(var(A), var(B), var(C));  
##      [,1] [,2] [,3]  
## [1,] 29.7 35.8 100.2

Is there sufficient evidence to indicate a difference in mean bilirubin concentration for the
three young men? Fit appropriate linear model(s) to the data and test at the 0.05
level of significance. Provide conclusion in the context of this problem.

Complete model:

yi=μ1xi1+μ2xi2+μ3xi3+εi

where

xij={1if ith meausurement belongs to jth person0otherwise

εiiidN(0,σ2)

[y1y5y6y10y11y15]=[100100010010001001][μ1μ2μ3]+ε

β^c=[XcTXc]1XcTY

XcTXc=(500050005)=(n000n000n)

XcTY=[15yi610yi1115yi]=[n1yin1+n2yin1+n2+n3yi]

[XcTXc]1XcY=[y¯1y¯2y¯3]Y^c=Xcβ^c=[Y¯1Y¯1Y¯2Y¯2Y¯3Y¯3]}n1(5) times}n2(5) times}n3(5) timesSSE(c)=i=1N(YiY^i)2=i=1n1(YiY¯1)2+i=n1+1n1+n2(YiY¯2)2+i=n1+n2+1n1+n2+n3(YiY¯3)2=(n11)s12+(n21)s22+(n31)s32

Reduced Model:

yi=μ+εi,εiiidN(0,σ2)[y1y15]=[11]μ+ε

XRTXR=15[XRTXR]1=115=1N

XRTY=15yiβ^R=1NNyiY^R=XRβ^R=[y¯y¯]SSE(R)=i=1N(YiY^i)2=i=1N(YiY¯)2=(N1)SDATASET2

Hypothesis Test

H0:μ1=μ2=μ3=μ

F=(SSE(R)SSE(C)(# of C betas# of R betas))SSE(C)N# of C betasSSE(C)=4[29.7+35.8+100.2]=662.8SSE(R)=(N1)sDATASET2=1995.733F=(1995.733662.831)(662.8153)12.0664RR={F>F2,12,α=0.05=3.89}

our FRR so reject H0

Example 2 Testing Means

Rhree brands of batteries are under study. It is suspected that the average life (in
weeks) of the three brands is different. Five batteries of each brand are tested with the
following results.

Brand 1Brand 2Brand 3OneTwoThree1007610896801009275969684989282100
                 One Two Three  
Sample means     95.2 79.4 100.4  
                 One Two Three  
Sample variances 11.2 14.8 20.8

Answer each of the following questions by fitting an appropriate multiple linear regression
model and assuming that the errors are independent and Normally distributed with mean
0 and variance σ2. You have to show all your work to get full credit. Simplify final answers and round to 4 decimal places where appropriate.

i) Find β^, the vector of least-squares estimators for your model. Provide its expression
and numerical value.

β^=(XTX)1XTY=(150001500015)(5yi10yi15yi)=(y¯1y¯2y¯3)=(95.279.4100.4)s2=SSEN#betas=SSE153(yiy^i)2=15[yiy¯1]2+610[yiy¯2]2+1115(yiy¯)2

Target: μ2=[0,1,0][μ1μ2μ3]=aTβ
For these types of questions a is a central components

E[aTβ^]=aTE[β^]=aTβ
cov[aTβ^]=aTcov(β^)a=σ2aT(XTX)1a

aT(XTX)1a=[010](150001500015)[010]=15aTβ^aTβσaT(XTX)1aN(0,1)[N3]s2σ2χN32aTβ^aTβsaT(XTX)1at(N3)

CI =aTβ^±tα2,N3saT(XTX)1a

For μ3μ2=[0,1,1][μ1μ2μ3]
steps are the same but aT=[0,1,1]
Estimator y¯3y¯2=aTβ^

aT(XTX)1a=1n2+1n3

General Testing of Parameters

t=aTβ^(aTβ)0saT(XTX)1a

s2aT(XTX)1a=SE(aTβ^)

Example Individual Variables

In a small-scale study of the relation between degree of brand liking (Y ) and moisture content (X1) and sweetness (X2) of the product, the following results were obtained (data are coded):

X1 X2 Y  
1 4 2 64  
2 4 4 73  
3 4 2 61  
4 4 4 76  
5 6 2 72  
6 6 4 80  
7 6 2 71  
8 6 4 83  
9 8 2 83  
10 8 4 89  
11 8 2 86  
12 8 4 93  
13 10 2 88  
14 10 4 95  
15 10 2 94  
16 10 4 100  

A first-order model for mean brand liking, E(Y), as a function of moisture content X1 and sweetness X2 was fit to the data using R, i.e., Y=β0+β1X1+β2X2+ε.

modA=lm(y~x1 + x2);  
summary(modA);
             Estimate Std. Error t value  
(Intercept) 37.650 2.9961032 12.566323  
x1          4.425 0.3011197 14.695153  
x2          4.375 0.6733241 6.497614
modA=lm(y~x1 + x2);  
anova(modA);  
	      Df Sum Sq Mean Sq F value  
x1        1 1566.45 1566.45 215.947  
x2        1 306.25 306.25 42.219  
Residuals 13 94.30 7.25

i) Is moisture content X1 a statistically useful predictor of brand liking? Test using α = 0.01. Please, provide: hypotheses, test statistic, P-value or Rejection Region, and conclusion. Justify your answers.

H0:β1=0 vs Ha:β10
Test stat t14.6951

RR={t such that |t|>tα2,13}

tR reject H0