You start with 50 values for x. Then you calculate 50 y values and add them up. Are these 50 x values random x's from some known distribution? or are they numbers chosen by you for their importance? or are they the numbers from 1 to 50? or what?
I have a linear regression equation (y=6.0707x-1200.2) that was used to develop 50 point estimates. The sum of these estimates is the point estimate I need. How do I put a confidence interval around the summed estimate? The standard error for the regression model is 218.5...is the confidence interval just the sum of 218.5*1.96 for the 50 estimates? I don't think its that simple....
Thanks for the help.
You start with 50 values for x. Then you calculate 50 y values and add them up. Are these 50 x values random x's from some known distribution? or are they numbers chosen by you for their importance? or are they the numbers from 1 to 50? or what?
These are known fish lengths (x) plugged into the equation to estimate eggs produced (y). The values for y are summed together to produce the estimate I need to put bounds around.
Do you have the covariance matrix for the parameter estimates? That's what you'll need since you're basically looking at (sum of xs)*(beta_1 hat) + 50*(beta_0 hat) and to get the standard error of that you need the covariance matrix of the beta-hats.
I'm too lazy to write it in the fancy math symbols at the moment.
I don't have emotions and sometimes that makes me very sad.
I need a covariance matrix for one variable?
Yes. You have two estimated regression coefficients.
I don't have emotions and sometimes that makes me very sad.
The intercept and the x variable?
Or, alternatively, if you have the original data that gave you the regression equation, you can do a straightforward bootstrap analysis for the confidence interval of the sum you need.
I can do that but I'm not sure that is correct? Maybe I'm not explaining this very well and for that I apologize. Let me start from the beginning. I have fish hatchery data that contains fish length and number of eggs per female (fecundity). I plotted the data and ran a regression analysis and found a relationship between length and fecundity. the results are shown below.
Regression Statistics
Multiple R 0.673777428
R Square 0.453976023
Adjusted R Square 0.449642499
Standard Error 218.4572976
Observations 128
ANOVA
df SS MS F Significance F
Regression 1 4999480.301 4999480.301 104.7590973 2.90218E-18
Residual 126 6013172.452 47723.59089
Total 127 11012652.75
Coefficients Standard Error t Stat P-value Lower 95% Upper 95%
Intercept -1200.199297 277.5496378 -4.324269009 3.08112E-05 -1749.461871 -650.9367233
X Variable 1 6.070724526 0.593122846 10.23518917 2.90218E-18 4.89695185 7.244497201
I then took known lengths from 50 fish and plugged them into the equation (y=6.0707x-1200.2; example of one estimate is 6.0707(673)-1200.2=2885 eggs) to produce 50 estimates of fish fecundity. Then I summed those up to estimate how many eggs went into the stream for a given year. Because I have 50 estimates, I'm a little confused how doing a bootstrap analysis off of the original data will provide me with a measure of variance around the summed point estimate. I must be confusing myself?
Thanks for your help
What software are you using?
I don't have emotions and sometimes that makes me very sad.
That is all good. But the problem is that the slope and interval are uncertain and their estimates are also correlated.
To get a bootstrap confidence interval, resample the original data (128 fish) with replacement and get an estimate of the slope and intercept for the resampled data. Use these new values of the slope and intercept with your 50 new fish to get the 50 egg estimates and add them up. This is bootstrap total number 1.
Repeat this a few thousand times and list all the bootstrap totals. Find the bottom and top 2.5%tiles. That is your 95% CI.
This way you are getting the whole range of possible slopes and intercepts and keeping the correlation between them.
If you don't have resampling software, write back and possibly I can help, or you can calculate the variance/covariance matrix from the data and someone else may give you a formula.
cheers, kat
I used xlstat...
Thanks Dason and Katxt....I thought about the bootstrap analysis last night after I left and then I realized what you were referring too. I can try to find code for doing the bootstrap analysis in Program R. I'll let you know if I have any luck.
Hope it goes well. If you can't find some suitable R code, I have a basic bootstrap Excel spreadsheet which wouldn't be too hard to set up for your problem. kat
Tweet |