So I'm an R newbie, we're attempting to use it for some regression analysis at work on some of our data sets. To start we wanted to take a very simple data set that we had and attempt to fit a linear model to it.
The problem that I'm running into is that once I import the data file and perform the lm() function I lose one of my levels, I don't understand where it has gone or if I'm just interpreting the output wrong.
the output looks like this:
there should be another level "AgeBucket16", it's the first level of the AgeBucket factor.
I get the same problem when I perform an anova using the same factors, I lose the first level of both my "AgeBucket" factor and my "Hospital" factor.
can anyone shed some light on what I'm not picking up on??
The problem that I'm running into is that once I import the data file and perform the lm() function I lose one of my levels, I don't understand where it has gone or if I'm just interpreting the output wrong.
the output looks like this:
Code:
fit<-lm(TotalPercPaid120~AgeBucket)
> summary(fit)
Call:
lm(formula = TotalPercPaid120 ~ AgeBucket)
Residuals:
Min 1Q Median 3Q Max
-90.496 -0.495 0.264 0.317 45.452
Coefficients:
Estimate Std. Error t value Pr(>|t|)
(Intercept) 0.597990 0.009114 65.611 < 2e-16 ***
AgeBucket25 -0.087959 0.015536 -5.662 1.51e-08 ***
AgeBucket30 -0.104718 0.016645 -6.291 3.17e-10 ***
AgeBucket35 -0.102780 0.016092 -6.387 1.70e-10 ***
AgeBucket40 -0.072274 0.015402 -4.693 2.70e-06 ***
AgeBucket45 -0.039197 0.014904 -2.630 0.00854 **
AgeBucket50 0.033393 0.013828 2.415 0.01574 *
AgeBucket55 0.085377 0.012923 6.607 3.96e-11 ***
AgeBucket60 0.116011 0.012731 9.113 < 2e-16 ***
AgeBucket65 0.162438 0.012688 12.802 < 2e-16 ***
AgeBucket70 0.109460 0.013485 8.117 4.85e-16 ***
AgeBucket75 0.086453 0.014602 5.921 3.22e-09 ***
AgeBucket80 0.121772 0.015791 7.711 1.26e-14 ***
AgeBucket85 0.137719 0.017063 8.071 7.08e-16 ***
AgeBucket90 0.154927 0.021803 7.106 1.21e-12 ***
AgeBucket95 0.163869 0.038299 4.279 1.88e-05 ***
AgeBucketPlus100 0.052145 0.068789 0.758 0.44843
---
Signif. codes: 0 ‘***’ 0.001 ‘**’ 0.01 ‘*’ 0.05 ‘.’ 0.1 ‘ ’ 1
Residual standard error: 0.728 on 64449 degrees of freedom
Multiple R-squared: 0.01448, Adjusted R-squared: 0.01423
F-statistic: 59.18 on 16 and 64449 DF, p-value: < 2.2e-16
I get the same problem when I perform an anova using the same factors, I lose the first level of both my "AgeBucket" factor and my "Hospital" factor.
Code:
fit2<-aov(TotalPercPaid120~AgeBucket+Hospital)
> summary(fit2)
Df Sum Sq Mean Sq F value Pr(>F)
AgeBucket 16 502 31.36 59.24 <2e-16 ***
Hospital 1 39 38.72 73.15 <2e-16 ***
Residuals 64448 34118 0.53
---
Signif. codes: 0 ‘***’ 0.001 ‘**’ 0.01 ‘*’ 0.05 ‘.’ 0.1 ‘ ’ 1
> coef(fit2)
(Intercept) AgeBucket25 AgeBucket30 AgeBucket35 AgeBucket40 AgeBucket45 AgeBucket50 AgeBucket55 AgeBucket60 AgeBucket65
0.63715260 -0.10455534 -0.11868351 -0.11665114 -0.08450580 -0.05072533 0.02153494 0.07419295 0.10470710 0.15132806
AgeBucket70 AgeBucket75 AgeBucket80 AgeBucket85 AgeBucket90 AgeBucket95 AgeBucketPlus100 HospitalSt Mary
0.09947468 0.07662257 0.11098498 0.12526939 0.14027786 0.15178454 0.03671460 -0.05010209