confidence interval with R

I am trying to predict the amount that a male with average status, income and verbal score would spend along with an appropriate 95% CI.

I used my linear model with all my variables and sex is coded as male=0 and female=1 in data set.
I think I did something wrong because I get all 47 observations but there are 19 females and 28 males.
I set up the prediction as:
> predict(mdl, sex=0, interval='confidence', level=0.90)
fit lwr upr
1 -10.6507430 -21.4372267 0.1357407
2 -9.3711318 -21.9428731 3.2006095
3 -5.4630298 -15.0782882 4.1522286
4 24.7957487 12.5630143 37.0284831...
please help...


Ambassador to the humans
It's not clear what you're trying to do. Here is an example of building a model and using predict:

n <- 30
x1 <- rnorm(n)
x2 <- runif(n)
y <- 2 + x1 - .5*x2 + rnorm(n)
dat <- data.frame(y, x1, x2)

o <- lm(y ~ x1 + x2, data = dat)

# Gives predictions for every value in the data set
predict(o, interval = "confidence", level = .90)

# If we want new predictions we need to use the newdata
# argument

newdat <- data.frame(x1 = c(1, 2), x2 = c(.5, .7))
predict(o, newdata = newdat, interval = "confidence", level = .9)
I have a data set with 47 observation and 5 variables (sex, income, status, verbal score and spending) where sex is coded as 0=male and female for 1.
The questions was to predict the amount that a male with average income, status and verbal score would spend with a 95%CI.
i know i have to use my linear model to predict average, but my prediction set up is wrong. How do I predict for male when its coded as such in my data...that is where i get confused...


Ambassador to the humans
You would use the exact same setup as I gave in my previous post but you would plug in your variables and the values you want them to take into the newdata parameter. In your case sex=0 gives male so you would make sure that the sex portion of newdata had a 0...
sorry Dason but I like I said I am new to R and I still dont understand. I am trying to clarify the prediction and CI for the problem because I have to then repeat the prediction with maximal values and compare CI to determine which is wider. Thanks


Ambassador to the humans
I guess it's still not clear to me what is giving you trouble. Have you worked through that example I gave you?
I need the prediction for male but i get all 47 observation instead of the prediction for male on average?

then when doing the newdata as you suggested > I used the max. values found in my summary for each variable ...Was that correct?
g2<-data.frame(status=75, income=15, verbal=10, sex=0)
> predict(g,g2, level=.90)
> predict(g,g2, interval='confidence', level=.90)
fit lwr upr
1 71.30794 47.07516 95.54072


Ambassador to the humans
So to get the predictions for the males where everything else is average put in the average values for the other predictors.
but I also need the "prediction" band for max values as well correct?
> predict(g,g2, interval='prediction', level=.90)
fit lwr upr
1 71.30794 26.10037 116.5155


Ambassador to the humans
Ok... I don't know what to do with this thread anymore. You aren't actually asking a question there. It's not clear what is giving you problems.


Less is more. Stay pure. Stay poor.
If it is still unclear, I believe Snowy88 wants 95% confidence interval around the sex estimate (beta coefficient).


Ambassador to the humans
will do that. But I don't think that's what they want. They talk about prediction and this doesn't get you a prediction.


Less is more. Stay pure. Stay poor.
I was assuming this from their posts yesterday. Another way not using R would be:

If sample size > 30 the following is typically used:
-22.1133-1.96(8.21111) and -22.1133+1.96(8.21111)
Or (95% CI: -38.20708, -6.01952)
I need the prediction on average for male with a 95%CI then repeat with maximal values using the variables status, income and verbal. After I need to determine which CI is wider and explain...Now If I did the predictions correctly shown above with the mean then the max values. But doesn't that give 2 prediction intervals and 2 confident intervals for both the average and max. values ie..Average
> g1<-data.frame(status=43, income=4.64, verbal=6.66, sex=0)
> predict(g,g1, interval='confidence', level=.90)
fit lwr upr
1 28.11506 19.7603 36.46983
> predict(g,g1, interval='prediction', level=.90)
fit lwr upr
1 28.11506 -10.95281 67.18293

How will I create a plot to show which CI are wider?


Ambassador to the humans
Well you question seems pretty straightforward. It sounds like you only need a confidence interval (not necessarily a prediction interval) and you're just supposed to compare the width for when you use the average values for everything else compared to using the max values for everything else. So instead of changing the interval type from confidence to prediction you should change the other covariates from their average values to the maximal values...

It sounds like the idea is to get you to explore the fact that predictions near the mean values for the covariates have smaller confidence intervals than predictions far away from the mean.