Qualitative predictor variables not appearing in regression summary output R -
i have big dataset use run linear regression models qualitative predictor variables. call dataset wn , qualitative variables ostate , dstate (states in us). here see there 62 unique values of ostate , dstate within wn:
> unique(wn$ostate) [1] ny ma pa de dc va md wv nc ri sc nh ga fl al tn ms me ky oh in mi vt ia wi mn sd nd mt ct il mo ks ne nj la ar ok tx co wy id ut az nm nv ca or wa 62 levels: aa ae ak al ap ar az ca co ct dc de fl fm ga gu hi ia id il in ks ky la ma md me mh mi mn mo mp ms mt nc nd ne nh nj nm nv ny oh ok or pa pr pw ri sc sd tn tx ut va vi vt wa ... wy > unique(wn$dstate) [1] ma ri nh me vt ct ny nj pa de dc va md wv nc sc ga fl al tn ms ky oh in mi ia wi mn sd nd mt il mo ks ne la ar ok tx co wy id ut az nm nv ca or wa 62 levels: aa ae ak al ap ar az ca co ct dc de fl fm ga gu hi ia id il in ks ky la ma md me mh mi mn mo mp ms mt nc nd ne nh nj nm nv ny oh ok or pa pr pw ri sc sd tn tx ut va vi vt wa ... wy
now running regression model predict rate distance, ostate , dstate follows:
> wn.lr = lm(wn$rate~wn$distance+wn$ostate+wn$dstate)
when check regression summary, see 48 ostate , dstate predictors populated, , remaining 14 missing. small part of summary output given below. example see ostateal missing in output:
> summary(wn.lr) call: lm(formula = wn$rate ~ wn$distance + wn$ostate + wn$dstate) residuals: min 1q median 3q max -2370.3 -218.4 -18.9 170.8 9105.7 coefficients: estimate std. error t value pr(>|t|) (intercept) 1.208e+03 6.632e+00 182.171 < 2e-16 *** wn$distance 1.626e+00 3.111e-03 522.722 < 2e-16 *** wn$ostatear 2.000e+02 7.294e+00 27.419 < 2e-16 *** wn$ostateaz 1.981e+02 8.372e+00 23.667 < 2e-16 *** wn$ostateca 1.056e+02 7.919e+00 13.340 < 2e-16 *** wn$ostateco 1.323e+02 7.332e+00 18.043 < 2e-16 *** wn$ostatect -2.019e+02 1.827e+01 -11.048 < 2e-16 *** wn$ostatedc 5.711e+02 2.178e+01 26.223 < 2e-16 ***
on other hand, when check entities ostate = "al", see there on 6000 rows:
> wnnew<-subset(wn,ostate=="al") > nrow(wnnew) [1] 6213
any explanation this?
this because of aliasing (i.e. model overidentified). example, massachusetts level in both dstate
, ostate
variables, think effect in both treatments can't separated.
Comments
Post a Comment