regression grind

a dump of my plan + notes for studying for my finals for a class that i should be doing well but is not because i'm just not good at math and stats apparently. might be the most i've studied for a class ever in my life

go through

notes
hw
quizzes
pq1, q1
pq2, q2
pfinals

prediction

distribution of SSE (sigma_hat)
e(sse)
show y_hat independent to residual
distribution of beta_hat
log reg why use logit? issues with linear model
explain what hii is?
why 0 < hii < 1
what is stud(ei)?
press

Outliers

outlier in x: leverage (hii) > 3p/n
outlier in y: discrepancy (studentized e) > t (n-1-p), 1-a/2 (outlier in
both: influence (cooks distance) >4/n

multicollinearity

problems: inflated SE checks:

swing/change sign coefficients in f-test
correlation matrix
VIF

solution

drop
feature engineer
regularized regression
dimensionality reduction
partial least square

heteroskedasticity

unbiased

detection

residual plot

problem: no longer BLEU -> wrong SE(beta) and CI/PI widths solution

log / square
boxcox
robust SE
WLS

if ei is non linear, use nonparametric regression (knn, moving average)

non-normal

still BLEU
no inference,

detection

histogram
qq plot
test for normality: shapiro

for normal: skewness: 0 (third moment) kurtosis: 3 (fourth moment)

omnibus k2 test (want high p-value to reject)
JB test

problems: unreliable t.test, wrong CI/PI

false assumption of linearity

transform y -> may introduce hetero if homo
transform x -> nice when only prob is non-linearity
transform both

Model selection under: biased coefs + predictions (under/overstimate), overestimate sigma2

extra vars: unbiased, MSE has fewer degrees, wider CI and lower power

over(multicol): inflated SE for coefs, rank deficient

adjusted R 1 - MSE / SST = 1 - SSE / n-p / SST / n-1 takes into account the "cost" of losing DF

Mallows Cp

identify subset where Cp is near k+1 where k is no of preds
this means bias is small
if all not near, missing predictor
if a number of them, choose model with smallest

AIC, BIC

estimates infomation lost in a model
trade-off goodness in fit vs simplicity, penalized by no. of model params (p)
larger penalty term in BIC than AIC : ln(n)p vs 2p

PRESS

modified SEE, uses predicted value for ith obs from model fit on data excluding that point

BENEDICT NEO 梁耀恩

regression grind