1. Combining Regressions

Hi,

I have several different regressions for various sites within an area estimating Y from X.

I would like to combine all these sites in to one region with one regression equation.

Is there a way statistically show that this is possible to certain level of confidence.

I know this will increase my error from site to site, but would like to some how quantify that increase in error.

Thanks for your help, it's been years since I've taken a stats course.

P.S. I am using Excel and SigmaStat software, if you know of a quick function to use.

Thanks
Danny Klimetz

Are you saying that you have several different samples of x and y's that you can find an individual relationship with? And that you want to combine them into one large group and ignore the stratified nature of your data?

I think what you need is a regression model with several indicator variables to account for the difference between each group.

For instance you have your y and x. Then you ad I1 with 1's for the group 1 data and then zero's elsewhere then for I2 you have 1's with the group 2 data nd zero's elsewhere etc.

Combining your data to make inference as a whole may actually lower your error not increase it if you use indicator variables. In excel if search for regression in the help you should be able to figure it out.

3. Thanks for the response. I will look in to it.

What I have is 6-9 stations on various streams within a river basin. We are trying to develop a relation for Turbidity and Sediment concentration. Basically at each site we have an instrument taking continuous turbidity measurements and then we take periodical Suspended sediment samples to correlate the two. Each station will therefor have its own turbidity-concentration equations.

What I am hoping to do, is combine all the data points in to regions, be it state, watershed, sub-basin, etc. and develop a "regional relation."

So I was hoping to use statistics to help me justify this combing of data points.

One thing I thought of was comparing the slopes and the intercepts from each site to see how the significantly differ from one another.

Right now I am just showing the Mean Absolute Error between the equations.

Any help with the best to demonstrate the error or the benefits between the relations would be greatly appreciated.

Thanks
Danny

4. It sounds like this is an excellent case for the use of indicator variables. This is talked about a bit in this website:

http://www.people.vcu.edu/~nhenry/Dummies.htm

What you want to do is perform multiple regression in Excel. You need the data analysis package loaded in the tools section and use the "Regression" option. I figured it out quite quickly. If you have one column of data which is your response, say a2:a101 in the y input range, and then several variables next to it say b2:j101, simple put this in the x range and you can do your multiple regression. You want to plot the residuals and the normal probability plot. The residuals plot should have no trend, the normal probability plot should be linear. Ideally you want a high R^2 as well. If the p value for a variable is greater than 0.1 you should probably chuck the variable or transform it.

Look multiple regression up for sigmastat if you need to. I use the open source software R (which is quite good, if a bit hard to use at first.) If you get the Rcmdr package it is a good GUI to do some more advanced regression models.

 Tweet

Posting Permissions

• You may not post new threads
• You may not post replies
• You may not post attachments
• You may not edit your posts