3 Dataset: predictor variable (assumed to be constant)
(only one if simple linear regression otherwise a vector for multiple linear regression)
response variable (assumed to be random/stochastic)
Assumption: response variables are normally distributed (conditioned on the predictor), variance is equal across all predictors.
The data values Y come from different conditional distributions for each :
Goal: Find the line that best fits our data
Carry out a hypothesis test:
If reject construct C.I for at , also want to predict future observation () when
Main Assumptions
Randomness (wrt Y)
Correct Functional Specification (linear, polynomial, etc.)
L: Expected value is given by a linear function
Uncorrelated Errors
I: Independent
N: Normally Distributed
E: Equal variance
Constant variance (Y|X)
Normality