a dump of my plan + notes for studying for my finals for a class that i should be doing well but is not because i'm just not good at math and stats apparently. might be the most i've studied for a class ever in my life
go through
- notes
- hw
- quizzes
- pq1, q1
- pq2, q2
- pfinals
prediction
- distribution of SSE (sigma_hat)
- e(sse)
- show y_hat independent to residual
- distribution of beta_hat
- log reg why use logit? issues with linear model
- explain what hii is?
- why 0 < hii < 1
- what is stud(ei)?
- press
Outliers
- outlier in x: leverage (hii) > 3p/n
- outlier in y: discrepancy (studentized e) > t (n-1-p), 1-a/2 (outlier in
- both: influence (cooks distance) >4/n
multicollinearity
problems: inflated SE checks:
- swing/change sign coefficients in f-test
- correlation matrix
- VIF
solution
- drop
- feature engineer
- regularized regression
- dimensionality reduction
- partial least square
heteroskedasticity
- unbiased
detection
- residual plot
problem: no longer BLEU -> wrong SE(beta) and CI/PI widths solution
- log / square
- boxcox
- robust SE
- WLS
if ei is non linear, use nonparametric regression (knn, moving average)
non-normal
- still BLEU
- no inference,
detection
- histogram
- qq plot
- test for normality: shapiro
for normal: skewness: 0 (third moment) kurtosis: 3 (fourth moment)
- omnibus k2 test (want high p-value to reject)
- JB test
problems: unreliable t.test, wrong CI/PI
false assumption of linearity
- transform y -> may introduce hetero if homo
- transform x -> nice when only prob is non-linearity
- transform both
Model selection under: biased coefs + predictions (under/overstimate), overestimate sigma2
extra vars: unbiased, MSE has fewer degrees, wider CI and lower power
over(multicol): inflated SE for coefs, rank deficient
adjusted R 1 - MSE / SST = 1 - SSE / n-p / SST / n-1 takes into account the "cost" of losing DF
Mallows Cp
- identify subset where Cp is near k+1 where k is no of preds
- this means bias is small
- if all not near, missing predictor
- if a number of them, choose model with smallest
AIC, BIC
- estimates infomation lost in a model
- trade-off goodness in fit vs simplicity, penalized by no. of model params (p)
- larger penalty term in BIC than AIC : ln(n)p vs 2p
PRESS
- modified SEE, uses predicted value for ith obs from model fit on data excluding that point